Preface |
|
xiii | |
|
I Narratives from Film and Literature, from Social Media and Contemporary Life |
|
|
1 | (42) |
|
1 The Correspondence Analysis Platform for Mapping Semantics |
|
|
3 | (22) |
|
1.1 The Visualization and Verbalization of Data |
|
|
3 | (1) |
|
1.2 Analysis of Narrative from Film and Drama |
|
|
4 | (7) |
|
|
4 | (1) |
|
1.2.2 The Changing Nature of Movie and Drama |
|
|
4 | (1) |
|
1.2.3 Correspondence Analysis as a Semantic Analysis Platform |
|
|
5 | (1) |
|
1.2.4 Casablanca Narrative: Illustrative Analysis |
|
|
5 | (1) |
|
1.2.5 Modelling Semantics via the Geometry and Topology of Information |
|
|
6 | (2) |
|
1.2.6 Casablanca Narrative: Illustrative Analysis Continued |
|
|
8 | (1) |
|
1.2.7 Platform for Analysis of Semantics |
|
|
8 | (2) |
|
1.2.8 Deeper Look at Semantics of Casablanca: Text Mining |
|
|
10 | (1) |
|
1.2.9 Analysis of a Pivotal Scene |
|
|
10 | (1) |
|
1.3 Application of Narrative Analysis to Science and Engineering Research |
|
|
11 | (8) |
|
1.3.1 Assessing Coverage and Completeness |
|
|
12 | (2) |
|
|
14 | (1) |
|
1.3.3 Conclusion on the Policy Case Studies |
|
|
15 | (4) |
|
1.4 Human Resources Multivariate Performance Grading |
|
|
19 | (2) |
|
1.5 Data Analytics as the Narrative of the Analysis Processing |
|
|
21 | (1) |
|
1.6 Annex: The Correspondence Analysis and Hierarchical Clustering Platform |
|
|
21 | (4) |
|
|
21 | (1) |
|
1.6.2 Correspondence Analysis: Mapping Χ2 Distances into Euclidean Distances |
|
|
22 | (1) |
|
1.6.3 Input: Cloud of Points Endowed with the Chi-Squared Metric |
|
|
22 | (1) |
|
1.6.4 Output: Cloud of Points Endowed with the Euclidean Metric in Factor Space |
|
|
23 | (1) |
|
1.6.5 Supplementary Elements: Information Space Fusion |
|
|
23 | (1) |
|
1.6.6 Hierarchical Clustering: Sequence-Constrained |
|
|
24 | (1) |
|
2 Analysis and Synthesis of Narrative: Semantics of Interactivity |
|
|
25 | (18) |
|
2.1 Impact and Effect in Narrative: A Shock Occurrence in Social Media |
|
|
25 | (7) |
|
|
25 | (1) |
|
2.1.2 Two Critical Tweets in Terms of Their Words |
|
|
26 | (1) |
|
2.1.3 Two Critical Tweets in Terms of Twitter Sub-narratives |
|
|
26 | (6) |
|
2.2 Analysis and Synthesis, Episodization and Narrativization |
|
|
32 | (1) |
|
2.3 Storytelling as Narrative Synthesis and Generation |
|
|
33 | (2) |
|
2.4 Machine Learning and Data Mining in Film Script Analysis |
|
|
35 | (1) |
|
2.5 Style Analytics: Statistical Significance of Style Features |
|
|
36 | (1) |
|
2.6 Typicality and Atypicality for Narrative Summarization and Transcoding |
|
|
37 | (3) |
|
2.7 Integration and Assembling of Narrative |
|
|
40 | (3) |
|
II Foundations of Analytics through the Geometry and Topology of Complex Systems |
|
|
43 | (42) |
|
3 Symmetry in Data Mining and Analysis through Hierarchy |
|
|
45 | (24) |
|
3.1 Analytics as the Discovery of Hierarchical Symmetries in Data |
|
|
45 | (1) |
|
3.2 Introduction to Hierarchical Clustering, p-Adic and m-Adic Numbers |
|
|
45 | (3) |
|
3.2.1 Structure in Observed or Measured Data |
|
|
46 | (1) |
|
3.2.2 Brief Look Again at Hierarchical Clustering |
|
|
46 | (1) |
|
3.2.3 Brief Introduction to p-Adic Numbers |
|
|
47 | (1) |
|
3.2.4 Brief Discussion of p-Adic and m-Adic Numbers |
|
|
47 | (1) |
|
|
48 | (4) |
|
3.3.1 Ultrametric Space for Representing Hierarchy |
|
|
48 | (1) |
|
3.3.2 Geometrical Properties of Ultrametric Spaces |
|
|
48 | (1) |
|
3.3.3 Ultrametric Matrices and Their Properties |
|
|
48 | (2) |
|
3.3.4 Clustering through Matrix Row and Column Permutation |
|
|
50 | (1) |
|
3.3.5 Other Data Symmetries |
|
|
51 | (1) |
|
3.4 Generalized Ultrametric and Formal Concept Analysis |
|
|
52 | (2) |
|
3.4.1 Link with Formal Concept Analysis |
|
|
52 | (2) |
|
3.4.2 Applications of Generalized Ultrametrics |
|
|
54 | (1) |
|
3.5 Hierarchy in a p-Adic Number System |
|
|
54 | (4) |
|
3.5.1 p-Adic Encoding of a Dendrogram |
|
|
54 | (3) |
|
3.5.2 p-Adic Distance on a Dendrogram |
|
|
57 | (1) |
|
3.5.3 Scale-Related Symmetry |
|
|
58 | (1) |
|
3.6 Tree Symmetries through the Wreath Product Group |
|
|
58 | (4) |
|
3.6.1 Wreath Product Group for Hierarchical Clustering |
|
|
58 | (1) |
|
3.6.2 Wreath Product Invariance |
|
|
59 | (1) |
|
3.6.3 Wreath Product Invariance: Haar Wavelet Transform of Dendrogram |
|
|
60 | (2) |
|
3.7 Tree and Data Stream Symmetries from Permutation Groups |
|
|
62 | (2) |
|
3.7.1 Permutation Representation of a Data Stream |
|
|
62 | (1) |
|
3.7.2 Permutation Representation of a Hierarchy |
|
|
63 | (1) |
|
3.8 Remarkable Symmetries in Very High-Dimensional Spaces |
|
|
64 | (1) |
|
3.9 Short Commentary on This Chapter |
|
|
65 | (4) |
|
4 Geometry and Topology of Data Analysis: in p-Adic Terms |
|
|
69 | (16) |
|
4.1 Numbers and Their Representations |
|
|
69 | (2) |
|
4.1.1 Series Representations of Numbers |
|
|
69 | (1) |
|
|
70 | (1) |
|
4.2 p-Adic Valuation, p-Adic Absolute Value, p-Adic Norm |
|
|
71 | (1) |
|
4.3 p-Adic Numbers as Series Expansions |
|
|
72 | (1) |
|
4.4 Canonical p-Adic Expansion; p-Adic Integer or Unit Ball |
|
|
73 | (1) |
|
4.5 Non-Archimedean Norms as p-Adic Integer Norms in the Unit Ball |
|
|
74 | (1) |
|
4.5.1 Archimedean and Non-Archimedean Absolute Value Properties |
|
|
74 | (1) |
|
4.5.2 A Non-Archimedean Absolute Value, or Norm, is Less Than or Equal to One, and an Archimedean Absolute Value, or Norm, is Unbounded |
|
|
74 | (1) |
|
4.6 Going Further: Negative p-Adic Numbers, and p-Adic Fractions |
|
|
75 | (1) |
|
4.7 Number Systems in the Physical and Natural Sciences |
|
|
76 | (1) |
|
4.8 p-Adic Numbers in Computational Biology and Computer Hardware |
|
|
77 | (1) |
|
4.9 Measurement Requires a Norm, Implying Distance and Topology |
|
|
78 | (1) |
|
4.10 Ultrametric Topology |
|
|
79 | (1) |
|
4.11 Short Review of p-Adic Cosmology |
|
|
80 | (1) |
|
4.12 Unbounded Increase in Mass or Other Measured Quantity |
|
|
81 | (1) |
|
4.13 Scale-Free Partial Order or Hierarchical Systems |
|
|
81 | (2) |
|
4.14 p-Adic Indexing of the Sphere |
|
|
83 | (1) |
|
4.15 Diffusion and Other Dynamic Processes in Ultrametric Spaces |
|
|
83 | (2) |
|
III New Challenges and New Solutions for Information Search and Discovery |
|
|
85 | (46) |
|
5 Fast, Linear Time, m-Adic Hierarchical Clustering |
|
|
87 | (16) |
|
5.1 Pervasive Ultrametricity: Computational Consequences |
|
|
87 | (2) |
|
5.1.1 Ultrametrics in Data Analytics |
|
|
87 | (1) |
|
5.1.2 Quantifying Ultrametricity |
|
|
88 | (1) |
|
5.1.3 Pervasive Ultrametricity |
|
|
88 | (1) |
|
5.1.4 Computational Implications |
|
|
89 | (1) |
|
5.2 Applications in Search and Discovery using the Baire Metric |
|
|
89 | (2) |
|
|
89 | (1) |
|
5.2.2 Large Numbers of Observables |
|
|
89 | (1) |
|
5.2.3 High-Dimensional Data |
|
|
90 | (1) |
|
5.2.4 First Approach Based on Reduced Precision of Measurement |
|
|
90 | (1) |
|
5.2.5 Random Projections in High-Dimensional Spaces, Followed by the Baire Distance |
|
|
91 | (1) |
|
5.2.6 Summary Comments on Search and Discovery |
|
|
91 | (1) |
|
5.3 m-Adic Hierarchy and Construction |
|
|
91 | (1) |
|
5.4 The Baire Metric, the Baire Ultrametric |
|
|
92 | (2) |
|
5.4.1 Metric and Ultrametric Spaces |
|
|
92 | (1) |
|
5.4.2 Ultrametric Baire Space and Distance |
|
|
93 | (1) |
|
5.5 Multidimensional Use of the Baire Metric through Random Projections |
|
|
94 | (1) |
|
5.6 Hierarchical Tree Defined from m-Adic Encoding |
|
|
95 | (1) |
|
5.7 Longest Common Prefix and Hashing |
|
|
96 | (1) |
|
5.7.1 From Random Projection to Hashing |
|
|
96 | (1) |
|
5.8 Enhancing Ultrametricity through Precision of Measurement |
|
|
97 | (2) |
|
5.8.1 Quantifying Ultrametricity |
|
|
97 | (1) |
|
5.8.2 Pervasiveness of Ultrametricity |
|
|
98 | (1) |
|
5.9 Generalized Ultrametric and Formal Concept Analysis |
|
|
99 | (1) |
|
5.9.1 Generalized Ultrametric |
|
|
99 | (1) |
|
5.9.2 Formal Concept Analysis |
|
|
99 | (1) |
|
5.10 Linear Time and Direct Reading Hierarchical Clustering |
|
|
100 | (1) |
|
5.10.1 Linear Time, or O(N) Computational Complexity, Hierarchical Clustering |
|
|
100 | (1) |
|
5.10.2 Grid-Based Clustering Algorithms |
|
|
100 | (1) |
|
5.11 Summary: Many Viewpoints, Various Implementations |
|
|
101 | (2) |
|
6 Big Data Scaling through Metric Mapping |
|
|
103 | (28) |
|
6.1 Mean Random Projection, Marginal Sum, Seriation |
|
|
104 | (4) |
|
6.1.1 Mean of Random Projections as A Seriation |
|
|
105 | (2) |
|
6.1.2 Normalization of the Random Projections |
|
|
107 | (1) |
|
6.2 Ultrametric and Ordering of Rows, Columns |
|
|
108 | (1) |
|
6.3 Power Iteration Clustering |
|
|
108 | (2) |
|
6.4 Input Data for Eigenreduction |
|
|
110 | (1) |
|
6.4.1 Implementation: Equivalence of Iterative Approximation and Batch Calculation |
|
|
110 | (1) |
|
6.5 Inducing a Hierarchical Clustering from Seriation |
|
|
111 | (1) |
|
6.6 Short Summary of All These Methodological Underpinnings |
|
|
112 | (1) |
|
6.6.1 Trivial First Eigenvalue, Eigenvector in Correspondence Analysis |
|
|
112 | (1) |
|
6.7 Very High-Dimensional Data Spaces: Data Piling |
|
|
113 | (1) |
|
6.8 Recap on Correspondence Analysis for Following Applications |
|
|
114 | (3) |
|
6.8.1 Clouds of Points, Masses and Inertia |
|
|
115 | (1) |
|
6.8.2 Relative and Absolute Contributions |
|
|
116 | (1) |
|
6.9 Evaluation 1: Uniformly Distributed Data Cloud Points |
|
|
117 | (1) |
|
6.9.1 Computation Time Requirements |
|
|
118 | (1) |
|
6.10 Evaluation 2: Time Series of Financial Futures |
|
|
118 | (2) |
|
6.11 Evaluation 3: Chemistry Data, Power Law Distributed |
|
|
120 | (4) |
|
6.11.1 Data and Determining Power Law Properties |
|
|
120 | (1) |
|
6.11.2 Randomly Generating Power Law Distributed Data in Varying Embedding Dimensions |
|
|
120 | (4) |
|
6.12 Application 1: Quantifying Effectiveness through Aggregate Outcome |
|
|
124 | (1) |
|
6.12.1 Computational Requirements, from Original Space and Factor Space Identities |
|
|
124 | (1) |
|
6.13 Application 2: Data Piling as Seriation of Dual Space |
|
|
125 | (1) |
|
6.14 Brief Concluding Summary |
|
|
126 | (1) |
|
6.15 Annex: R Software Used in Simulations and Evaluations |
|
|
126 | (5) |
|
6.15.1 Evaluation 1: Dense, Uniformly Distributed Data |
|
|
127 | (1) |
|
6.15.2 Evaluation 2: Financial Futures |
|
|
128 | (1) |
|
6.15.3 Evaluation 3: Chemicals of Specified Marginal Distribution |
|
|
129 | (2) |
|
IV New Frontiers: New Vistas on Information, Cognition and the Human Mind |
|
|
131 | (56) |
|
7 On Ultrametric Algorithmic Information |
|
|
133 | (14) |
|
7.1 Introduction to Information Measures |
|
|
133 | (1) |
|
7.2 Wavelet Transform of a Set of Points Endowed with an Ultrametric |
|
|
134 | (3) |
|
7.3 An Object as a Chain of Successively Finer Approximations |
|
|
137 | (2) |
|
7.3.1 Approximation Chain using a Hierarchy |
|
|
138 | (1) |
|
7.3.2 Dendrogram Wavelet Transform of Spherically Complete Space |
|
|
138 | (1) |
|
7.4 Generating Faces: Case Study Using a Simplified Model |
|
|
139 | (4) |
|
7.4.1 A Simplified Model of Face Generation |
|
|
139 | (4) |
|
7.4.2 Discussion of Psychological and Other Consequences |
|
|
143 | (1) |
|
7.5 Complexity of an Object: Hierarchical Information |
|
|
143 | (1) |
|
7.6 Consequences Arising from This Chapter |
|
|
144 | (3) |
|
8 Geometry and Topology of Matte Blanco's Bi-Logic in Psychoanalytics |
|
|
147 | (16) |
|
8.1 Approaching Data and the Object of Study, Mental Processes |
|
|
147 | (5) |
|
8.1.1 Historical Role of Psychometrics and Mathematical Psychology |
|
|
148 | (1) |
|
8.1.2 Summary of Chapter Content |
|
|
148 | (1) |
|
8.1.3 Determining Depth of Emotion, and Tracking Emotion |
|
|
148 | (4) |
|
8.2 Matte Blanco's Psychoanalysis: A Selective Review |
|
|
152 | (3) |
|
8.3 Real World, Metric Space: Context for Asymmetric Mental Processes |
|
|
155 | (1) |
|
8.4 Ultrametric Topology, Background and Relevance in Psychoanalysis |
|
|
156 | (3) |
|
|
156 | (1) |
|
8.4.2 Inducing an Ultrametric through Agglomerative Hierarchical Clustering |
|
|
157 | (1) |
|
8.4.3 Transitions from Metric to Ultrametric Representation, and Vice Versa, through Data Transformation |
|
|
157 | (1) |
|
8.4.4 Practical Applications |
|
|
158 | (1) |
|
8.5 Conclusion: Analytics of Human Mental Processes |
|
|
159 | (1) |
|
8.6 Annex 1: Far Greater Computational Power of Unconscious Mental Processes |
|
|
160 | (1) |
|
8.7 Annex 2: Text Analysis as a Proxy for Both Facets of Bi-Logic |
|
|
161 | (2) |
|
9 Ultrametric Model of Mind: Application to Text Content Analysis |
|
|
163 | (18) |
|
|
163 | (1) |
|
9.2 Quantifying Ultrametricity |
|
|
164 | (3) |
|
9.2.1 Ultrametricity Coefficient of Lerman |
|
|
164 | (1) |
|
9.2.2 Ultrametricity Coefficient of Rammal, Toulouse and Virasoro |
|
|
164 | (1) |
|
9.2.3 Ultrametricity Coefficients of Treves and of Hartman |
|
|
165 | (1) |
|
9.2.4 Bayesian Network Modelling |
|
|
165 | (1) |
|
9.2.5 Our Ultrametricity Coefficient |
|
|
165 | (1) |
|
9.2.6 What the Ultrametricity Coefficient Reveals |
|
|
166 | (1) |
|
9.3 Semantic Mapping: Interrelationships to Euclidean, Factor Space |
|
|
167 | (3) |
|
9.3.1 Correspondence Analysis: Mapping Χ2 into Euclidean Distances |
|
|
167 | (1) |
|
9.3.2 Input: Cloud of Points Endowed with the Chi-Squared Metric |
|
|
167 | (1) |
|
9.3.3 Output: Cloud of Points Endowed with the Euclidean Metric in Factor Space |
|
|
168 | (1) |
|
9.3.4 Conclusions on Correspondence Analysis and Introduction to the Numerical Experiments to Follow |
|
|
169 | (1) |
|
9.4 Determining Ultrametricity through Text Unit Interrelationships |
|
|
170 | (4) |
|
|
170 | (1) |
|
|
171 | (1) |
|
9.4.3 Air Accident Reports |
|
|
172 | (1) |
|
|
172 | (2) |
|
9.5 Ultrametric Properties of Words |
|
|
174 | (3) |
|
9.5.1 Objectives and Choice of Data |
|
|
174 | (1) |
|
9.5.2 General Discussion of Ultrametricity of Words |
|
|
175 | (1) |
|
9.5.3 Conclusions on the Word Analysis |
|
|
175 | (2) |
|
9.6 Concluding Comments on this Chapter |
|
|
177 | (1) |
|
9.7 Annex 1: Pseudo-Code for Assessing Ultrametric-Respecting Triplet |
|
|
177 | (1) |
|
9.8 Annex 2: Bradley Ultrametricity Coefficient |
|
|
178 | (3) |
|
10 Concluding Discussion on Software Environments |
|
|
181 | (6) |
|
|
181 | (1) |
|
10.2 Complementary Use with Apache Solr (and Lucene) |
|
|
182 | (1) |
|
10.3 In Summary: Treating Massive Data Sets with Correspondence Analysis |
|
|
182 | (3) |
|
10.3.1 Aggregating Similar or Identical Profiles Is Welcome |
|
|
182 | (1) |
|
10.3.2 Resolution Level of the Analysis Carried Out |
|
|
183 | (1) |
|
10.3.3 Random Projections in Order to Benefit from Data Piling in High Dimensions |
|
|
183 | (1) |
|
10.3.4 Massive Observation Cardinality, Moderate Sized Dimensionality |
|
|
184 | (1) |
|
|
185 | (2) |
Bibliography |
|
187 | (16) |
Index |
|
203 | |