Preface |
|
xv | |
1 Introduction |
|
|
|
1 | |
|
1.2 What Is Information Retrieval? |
|
|
2 | |
|
1.3 How Does Information Retrieval Work? |
|
|
5 | |
|
|
6 | |
|
1.3.2 The Database Producer Sequence |
|
|
10 | |
|
1.3.3 System Design and Functioning |
|
|
13 | |
|
1.3.4 Why the Process Is Not Perfect |
|
|
15 | |
|
1.4 Who Uses Information Retrieval? |
|
|
17 | |
|
1.4.1 Information Specialists |
|
|
17 | |
|
1.4.2 Subject Specialist End Users |
|
|
18 | |
|
1.4.3 Non-Subject Specialist End Users |
|
|
18 | |
|
1.5 What Are the Problems in IRS Design and Use? |
|
|
19 | |
|
|
19 | |
|
1.5.2 Understanding User Behavior |
|
|
20 | |
|
1.6 A Brief History of Information Retrieval |
|
|
21 | |
|
1.6.1 Traditional Information Retrieval Methods |
|
|
21 | |
|
1.6.2 Pre-Computer IR Systems |
|
|
23 | |
|
1.6.3 Special Purpose Computer Systems |
|
|
26 | |
|
1.6.4 General Purpose Computer Systems |
|
|
27 | |
|
1.6.5 Online Database Services |
|
|
29 | |
|
|
31 | |
|
|
34 | |
2 Data, Information, and Knowledge |
|
|
|
37 | |
|
|
37 | |
|
|
38 | |
|
|
38 | |
|
|
40 | |
|
|
40 | |
|
|
41 | |
|
|
42 | |
|
|
42 | |
|
2.2.8 Relevance and Value |
|
|
43 | |
|
|
43 | |
|
|
46 | |
|
2.5 Credence, Justified Belief, and Point of View |
|
|
48 | |
|
|
50 | |
3 Representation of Information |
|
|
3.1 Information to Be Represented |
|
|
53 | |
|
3.2 Types of Representation |
|
|
58 | |
|
|
59 | |
|
3.2.2 Restricted Natural Language |
|
|
60 | |
|
3.2.3 Artificial Language |
|
|
61 | |
|
3.2.4 Codes, Measures, and Descriptors |
|
|
62 | |
|
3.2.5 Mathematical Models of Text |
|
|
63 | |
|
3.3 Characteristics of Information Representations |
|
|
64 | |
|
3.3.1 Discriminating Power |
|
|
65 | |
|
3.3.2 Identification of Similarity |
|
|
66 | |
|
|
66 | |
|
|
66 | |
|
|
67 | |
|
3.4 Relationships Among Entities and Attribute Values |
|
|
67 | |
|
|
67 | |
|
|
67 | |
|
3.4.3 Nominal Descriptors |
|
|
69 | |
|
|
70 | |
|
|
70 | |
|
3.4.6 Explicit Pointers and Links |
|
|
70 | |
|
|
71 | |
4 Attribute Content and Values |
|
|
4.1 Types of Attribute Symbols |
|
|
73 | |
|
|
74 | |
|
4.1.2 Character Strings: Names |
|
|
74 | |
|
4.1.3 Other Character Strings |
|
|
75 | |
|
|
75 | |
|
4.2.1 Hierarchical Classification |
|
|
76 | |
|
4.2.2 Network Relationships |
|
|
77 | |
|
4.2.3 Class Membership: Binary, Probabilistic, or Fuzzy |
|
|
78 | |
|
4.3 Transformations of Values |
|
|
81 | |
|
4.3.1 Transformation of Words by Stemming |
|
|
82 | |
|
4.3.2 Sound-Based Transformation of Words |
|
|
85 | |
|
4.3.3 Transformation of Words by Meaning |
|
|
86 | |
|
4.3.4 Transformation of Graphics |
|
|
88 | |
|
4.3.5 Transformation of Sound |
|
|
91 | |
|
|
93 | |
|
4.5 Ambiguity of Attribute Values |
|
|
94 | |
|
|
96 | |
|
4.7 Control of Vocabulary |
|
|
98 | |
|
4.7.1 Elements of Control |
|
|
98 | |
|
4.7.2 Dissemination of Controlled Vocabularies |
|
|
100 | |
|
4.8 Importance of Point of View |
|
|
100 | |
|
|
102 | |
5 Models of Virtual Data Structure |
|
|
5.1 Concept of Models of Data |
|
|
103 | |
|
5.2 Basic Data Elements and Structures |
|
|
106 | |
|
5.2.1 Scalar Variables and Constants |
|
|
106 | |
|
|
107 | |
|
|
107 | |
|
|
107 | |
|
|
107 | |
|
|
109 | |
|
|
109 | |
|
5.3 Common Structural Models |
|
|
111 | |
|
5.3.1 Linear Sequential Model |
|
|
112 | |
|
|
112 | |
|
5.3.3 Hierarchical and Network Models |
|
|
114 | |
|
5.4 Applications of the Basic Models |
|
|
116 | |
|
|
116 | |
|
|
118 | |
|
5.5 Entity-Relationship Model |
|
|
120 | |
|
|
121 | |
6 The Physical Structure of Data |
|
|
6.1 Introduction to Physical Structures |
|
|
123 | |
|
6.2 Record Structures and Their Effects |
|
|
124 | |
|
|
124 | |
|
6.2.2 Space-Time and Transaction Rate |
|
|
127 | |
|
6.3 Basic Concepts of File Structure |
|
|
127 | |
|
6.3.1 The Order of Records |
|
|
128 | |
|
|
128 | |
|
6.4 Organizational Methods |
|
|
129 | |
|
|
129 | |
|
6.4.2 Index-File Structures |
|
|
131 | |
|
|
133 | |
|
|
136 | |
|
6.4.5 Direct-Access Structures |
|
|
138 | |
|
6.5 Parsing of Data Elements |
|
|
141 | |
|
|
142 | |
|
|
143 | |
|
6.5.3 Word and Phrase Parsing |
|
|
143 | |
|
6.6 Combination Structures |
|
|
144 | |
|
|
144 | |
|
6.6.2 Direct Structure with Chains |
|
|
145 | |
|
6.6.3 Indexed Sequential Access Method |
|
|
147 | |
|
|
148 | |
7 Querying the Information Retrieval System |
|
|
|
151 | |
|
|
152 | |
|
|
154 | |
|
|
155 | |
|
7.3.2 Relational Statements |
|
|
155 | |
|
7.3.3 Boolean Query Logic |
|
|
156 | |
|
7.3.4 Ranked and Fuzzy Sets |
|
|
159 | |
|
7.3.5 Similarity Measures |
|
|
162 | |
|
|
162 | |
|
|
162 | |
|
|
164 | |
|
7.4.3 Search the Inverted File or Thesaurus |
|
|
164 | |
|
7.4.4 Create a Subset of the Database |
|
|
167 | |
|
|
168 | |
|
|
170 | |
|
7.4.7 Sort, Display, and Format Records |
|
|
171 | |
|
7.4.8 Handle the Unstructured Record |
|
|
172 | |
|
|
172 | |
|
|
173 | |
|
7.4.11 Save, Recall, and Edit Searches |
|
|
173 | |
|
7.4.12 Current Awareness Search |
|
|
174 | |
|
|
175 | |
|
7.4.14 Terminate a Session |
|
|
175 | |
|
7.5 The Basis for Charging for Searches |
|
|
176 | |
8 Interpretation and Execution of Query Statements |
|
|
8.1 Problems of Query Language Interpretation |
|
|
177 | |
|
8.1.1 Parsing Command Language |
|
|
178 | |
|
8.1.2 Parsing Natural Language |
|
|
181 | |
|
8.1.3 Processing Menu Choices |
|
|
183 | |
|
8.2 Executing Retrieval Commands |
|
|
184 | |
|
|
184 | |
|
8.2.2 Inverted File Search |
|
|
184 | |
|
8.2.3 Set or Subset Creation |
|
|
185 | |
|
8.2.4 Truncation and Universal Characters |
|
|
187 | |
|
8.2.5 Left-Hand Truncation |
|
|
188 | |
|
8.3 Executing Record Analysis and Presentation Commands |
|
|
191 | |
|
8.3.1 Set Analysis Functions |
|
|
191 | |
|
8.3.2 Display, Format, and Sort |
|
|
193 | |
|
|
195 | |
|
8.4 Executing Other Commands |
|
|
196 | |
|
|
196 | |
|
8.4.2 Save, Recall, and Edit Searches |
|
|
196 | |
|
|
197 | |
|
8.4.4 Cost Summation and Billing |
|
|
198 | |
|
8.4.5 Terminate a Session |
|
|
199 | |
|
8.5 Feedback to Users and Error Messages |
|
|
199 | |
|
8.5.1 Response to Command Errors |
|
|
199 | |
|
8.5.2 Set-Size Indication |
|
|
200 | |
|
|
200 | |
|
|
201 | |
|
|
201 | |
|
|
201 | |
9 Text Searching |
|
|
9.1 The Special Problems of Text Searching |
|
|
203 | |
|
9.1.1 A Note on Terminology and Symbols |
|
|
204 | |
|
|
205 | |
|
9.2 Some Characteristics of Text and Their Applications |
|
|
207 | |
|
|
207 | |
|
9.2.2 Significant Words Indexing |
|
|
208 | |
|
9.2.3 Significant SentencesAbstracting |
|
|
209 | |
|
9.2.4 Measures of Complete Texts |
|
|
213 | |
|
9.3 Command Language for Text Searching |
|
|
214 | |
|
9.3.1 Set Membership Statements |
|
|
215 | |
|
9.3.2 Word or String Occurrence Statements |
|
|
215 | |
|
9.3.3 Proximity Statements |
|
|
215 | |
|
9.3.4 Web Based Text Search |
|
|
217 | |
|
|
218 | |
|
9.4.1 Indexing with Weights |
|
|
220 | |
|
9.4.2 Automated Assignment of Weights |
|
|
220 | |
|
|
221 | |
|
9.5 Word Association Techniques |
|
|
221 | |
|
9.5.1 Dictionaries and Thesauri |
|
|
221 | |
|
|
222 | |
|
9.5.3 Word Co-occurrence Statistics |
|
|
223 | |
|
9.5.4 Stemming and Conflation |
|
|
224 | |
|
9.6 Text or Record Association Techniques |
|
|
224 | |
|
9.6.1 Similarity Measures |
|
|
225 | |
|
|
228 | |
|
|
230 | |
|
9.6.4 Discriminant Methods |
|
|
233 | |
|
9.7 Other Processes with Words of a Text |
|
|
234 | |
|
|
234 | |
|
9.7.2 Replacement of Words with Roots or Associated Words |
|
|
235 | |
|
9.7.3 Varying Significance as a Function of Frequency |
|
|
236 | |
|
9.7.4 Comments on the Computation of the Strength of Document Association |
|
|
236 | |
10 System-Computed Relevance and Ranking |
|
|
10.1 The Retrieval Status Value (rsv) |
|
|
241 | |
|
|
241 | |
|
10.3 Methods of Evaluating the rsv |
|
|
242 | |
|
10.3.1 The Vector Space Model |
|
|
242 | |
|
10.3.2 The Probabilistic Model |
|
|
244 | |
|
10.3.3 The Extended Boolean Model |
|
|
245 | |
|
10.4 The rsv in Operational Retrieval |
|
|
247 | |
11 Search Feedback and Iteration |
|
|
11.1 Basic Concepts of Feedback and Iteration |
|
|
249 | |
|
|
251 | |
|
11.3 Information Available as Feedback |
|
|
252 | |
|
11.3.1 File or Database Selection |
|
|
252 | |
|
11.3.2 Term Search or Browsing |
|
|
253 | |
|
11.3.3 Record Search and Set Formation |
|
|
254 | |
|
11.3.4 Record Display and Browsing |
|
|
256 | |
|
11.3.5 Record Acquisition |
|
|
257 | |
|
11.3.6 Requests for Information About the Retrieval System |
|
|
257 | |
|
11.3.7 Establishing Communications Parameters |
|
|
258 | |
|
11.3.8 Trends Over Sequences and Cycles |
|
|
258 | |
|
11.4 Adjustments in the Search |
|
|
259 | |
|
11.4.1 Improve Term Selection |
|
|
260 | |
|
11.4.2 Improve Set Formation Logic |
|
|
260 | |
|
11.4.3 Improve Final Set Size |
|
|
260 | |
|
11.4.4 Improve Precision, Recall, or Total Utility |
|
|
260 | |
|
11.5 Feedback from User to System |
|
|
261 | |
12 Multi-Database Searching and Mapping |
|
|
|
265 | |
|
12.2 Multi-Database Search |
|
|
266 | |
|
12.2.1 Nature of Duplicate Records |
|
|
266 | |
|
12.2.2 Detection of Duplicates |
|
|
269 | |
|
12.2.3 Scanning Multiple Databases |
|
|
271 | |
|
|
273 | |
|
|
275 | |
13 Search Strategy |
|
|
13.1 The Nature of Searching Reconsidered |
|
|
277 | |
|
|
278 | |
|
13.1.2 Specific Information Search |
|
|
278 | |
|
13.1.3 General Information Search |
|
|
278 | |
|
13.1.4 Exploration of the Database |
|
|
279 | |
|
13.2 The Nature of Search Strategy |
|
|
279 | |
|
|
280 | |
|
13.2.2 General Plan of Operation |
|
|
280 | |
|
13.2.3 The Essential Information Elements of a Search |
|
|
281 | |
|
13.2.4 Specific Plan of Operation |
|
|
282 | |
|
|
282 | |
|
13.3.1 Categorizing by Objective |
|
|
283 | |
|
13.3.2 Categorizing by Plan of Operation |
|
|
283 | |
|
|
285 | |
|
13.4.1 Monitoring Tactics |
|
|
286 | |
|
13.4.2 File Structure Tactics |
|
|
286 | |
|
13.4.3 Search Formulation Tactics |
|
|
286 | |
|
|
286 | |
|
|
286 | |
14 The Information Retrieval System Interface |
|
|
14.1 General Model of Message Flow |
|
|
287 | |
|
14.2 Sources of Ambiguity |
|
|
290 | |
|
14.3 The Role of a Search Intermediary |
|
|
291 | |
|
14.3.1 Establishing the Information Need |
|
|
292 | |
|
14.3.2 Development of a Search Strategy |
|
|
269 | |
|
14.3.3 Translation of the Need Statement into a Query |
|
|
292 | |
|
14.3.4 Interpretation and Evaluation of Output |
|
|
293 | |
|
14.3.5 Search Iteration within die Strategic Plan |
|
|
293 | |
|
14.3.6 Change of Strategy When Necessary |
|
|
293 | |
|
14.3.7 Help in Using an IR.S |
|
|
294 | |
|
14.4 Automated Search Mediation |
|
|
294 | |
|
|
294 | |
|
14.4.2 Fully Automatic Intermediary Functions |
|
|
295 | |
|
14.4.3 Interactive Intermediary Functions |
|
|
296 | |
|
14.5 The User Interface as a Component of All Systems |
|
|
298 | |
|
14.6 The User Interface in Web Search Engines |
|
|
299 | |
15 A Sampling of Information Retrieval Systems |
|
|
|
301 | |
|
|
302 | |
|
15.2.1 Command Language Using 13oolean Logic |
|
|
303 | |
|
|
304 | |
|
15.2.3 DIALOGWeb: A Web Adaptation |
|
|
305 | |
|
|
308 | |
|
15.3.1 Default Query Entry Form |
|
|
309 | |
|
15.3.2 Advanced Search Form |
|
|
310 | |
|
|
311 | |
|
|
311 | |
|
|
312 | |
|
15.4.3 Google Advanced Search |
|
|
312 | |
|
|
313 | |
|
|
314 | |
|
|
315 | |
16 Measurement and Evaluation |
|
|
16.1 Basics of Measurement |
|
|
317 | |
|
|
318 | |
|
|
319 | |
|
16.1.3 The Query Composition Process |
|
|
319 | |
|
16.1.4 Deriving the Information Need |
|
|
320 | |
|
|
320 | |
|
|
321 | |
|
16.2 Relevance, Value, and Utility |
|
|
321 | |
|
16.2.1 Relevance as Relatedness |
|
|
322 | |
|
|
322 | |
|
16.2.3 Relevance as Utility |
|
|
323 | |
|
16.2.4 Retaining Two Separate Relevance Measures |
|
|
323 | |
|
16.2.5 The Relevance Measurement Scale |
|
|
325 | |
|
16.2.6 Taking the Measurements |
|
|
326 | |
|
16.2.7 Questions about Relevance as a Measure |
|
|
327 | |
|
16.3 Measures Based on Relevance |
|
|
328 | |
|
|
328 | |
|
|
329 | |
|
16.3.3 Relationship of Recall and Precision |
|
|
330 | |
|
16.3.4 Overall Effectiveness Measures Based on Re and Pr |
|
|
331 | |
|
|
334 | |
|
|
334 | |
|
16.4.2 Errors in a Query Statement |
|
|
334 | |
|
16.4.3 Average Time per Command or per User Decision |
|
|
335 | |
|
16.4.4 Elapsed Time of a Search |
|
|
335 | |
|
16.4.5 Number of Commands or Steps in a Search |
|
|
335 | |
|
|
335 | |
|
16.4.7 Size of Final Set Formed |
|
|
336 | |
|
16.4.8 Number of Records Reviewed by the User |
|
|
336 | |
|
16.4.9 Patterns of Language Use |
|
|
336 | |
|
16.4.10 Measures of Rank Order |
|
|
339 | |
|
|
340 | |
|
|
341 | |
|
|
341 | |
|
|
341 | |
|
16.5.4 Overall User Evaluation |
|
|
341 | |
|
16.6 Measures of Environment |
|
|
342 | |
|
16.6.1 Database Record Selection |
|
|
342 | |
|
|
342 | |
|
|
342 | |
|
|
343 | |
Bibliography |
|
345 | |
Index |
|
357 | |