Atnaujinkite slapukų nuostatas

El. knyga: Text Information Retrieval Systems

2.79/5 (55 ratings by Goodreads)
Kitos knygos pagal šią temą:
Kitos knygos pagal šią temą:

DRM apribojimai

  • Kopijuoti:

    neleidžiama

  • Spausdinti:

    neleidžiama

  • El. knygos naudojimas:

    Skaitmeninių teisių valdymas (DRM)
    Leidykla pateikė šią knygą šifruota forma, o tai reiškia, kad norint ją atrakinti ir perskaityti reikia įdiegti nemokamą programinę įrangą. Norint skaityti šią el. knygą, turite susikurti Adobe ID . Daugiau informacijos  čia. El. knygą galima atsisiųsti į 6 įrenginius (vienas vartotojas su tuo pačiu Adobe ID).

    Reikalinga programinė įranga
    Norint skaityti šią el. knygą mobiliajame įrenginyje (telefone ar planšetiniame kompiuteryje), turite įdiegti šią nemokamą programėlę: PocketBook Reader (iOS / Android)

    Norint skaityti šią el. knygą asmeniniame arba „Mac“ kompiuteryje, Jums reikalinga  Adobe Digital Editions “ (tai nemokama programa, specialiai sukurta el. knygoms. Tai nėra tas pats, kas „Adobe Reader“, kurią tikriausiai jau turite savo kompiuteryje.)

    Negalite skaityti šios el. knygos naudodami „Amazon Kindle“.

This will be the third edition of the highly successful "Text Information Retrieval Systems". The book's purpose is to teach people who will be searching or designing text retrieval systems how the systems work. For designers, it covers problems they will face and reviews currently available solutions to provide a basis for more advanced study. For the searcher its purpose is to describe why such systems work as they do. The book is primarily about computer-based retrieval systems, but the principles apply to nonmechanized ones as well. The book covers the nature of information, how it is organized for use by a computer, how search functions are carried out, and some of the theory underlying these functions. As well, it discusses the interaction between user and system and how retrieved items, users, and complete systems are evaluated. A limited knowledge of mathematics and of computing is assumed. This third edition will be updated to include coverage of the WWW and current search engines. In many cases, examples of non-web searching will be replaced with web-based illustrations. Coverage of interfaces, various features available to assist searchers, and areas in which search assistance is not available will also be covered. In addition, the book will have a web dimension which will include relevant material available online, to be used in conjunction with the text. It is a follow-up to the award winning 2nd Edition. It focuses on computer-based system but basic principles can be applied to any information seeking context.

Dealing with computer-based retrieval systems, this book covers the nature of information, how it is organized for use by a computer, how search functions are carried out, and the theory underlying these functions. It also discusses the interaction between user and system and how retrieved items, users, and complete systems are evaluated.
Preface xv
1 Introduction
1.1 What Is Information?
1
1.2 What Is Information Retrieval?
2
1.3 How Does Information Retrieval Work?
5
1.3.1 The User Sequence
6
1.3.2 The Database Producer Sequence
10
1.3.3 System Design and Functioning
13
1.3.4 Why the Process Is Not Perfect
15
1.4 Who Uses Information Retrieval?
17
1.4.1 Information Specialists
17
1.4.2 Subject Specialist End Users
18
1.4.3 Non-Subject Specialist End Users
18
1.5 What Are the Problems in IRS Design and Use?
19
1.5.1 Design
19
1.5.2 Understanding User Behavior
20
1.6 A Brief History of Information Retrieval
21
1.6.1 Traditional Information Retrieval Methods
21
1.6.2 Pre-Computer IR Systems
23
1.6.3 Special Purpose Computer Systems
26
1.6.4 General Purpose Computer Systems
27
1.6.5 Online Database Services
29
1.6.6 The World Wide Web
31
Recommended Reading
34
2 Data, Information, and Knowledge
2.1 Introduction
37
2.1 Definitions
37
2.2.1 Data
38
2.2.2 Information
38
2.2.3 News
40
2.2.4 Knowledge
40
2.2.5 Intelligence
41
2.2.6 Meaning
42
2.2.7 Wisdom
42
2.2.8 Relevance and Value
43
2.3 Metadata
43
2.4 Knowledge Base
46
2.5 Credence, Justified Belief, and Point of View
48
2.6 Summary
50
3 Representation of Information
3.1 Information to Be Represented
53
3.2 Types of Representation
58
3.2.1 Natural Language
59
3.2.2 Restricted Natural Language
60
3.2.3 Artificial Language
61
3.2.4 Codes, Measures, and Descriptors
62
3.2.5 Mathematical Models of Text
63
3.3 Characteristics of Information Representations
64
3.3.1 Discriminating Power
65
3.3.2 Identification of Similarity
66
3.3.3 Descriptiveness
66
3.3.4 Ambiguity
66
3.3.5 Conciseness
67
3.4 Relationships Among Entities and Attribute Values
67
3.4.1 Hierarchical Codes
67
3.4.2 Measurements
67
3.4.3 Nominal Descriptors
69
3.4.4 Inflected Language
70
3.4.5 Full Text
70
3.4.6 Explicit Pointers and Links
70
3.5 Summary
71
4 Attribute Content and Values
4.1 Types of Attribute Symbols
73
4.1.1 Numbers
74
4.1.2 Character Strings: Names
74
4.1.3 Other Character Strings
75
4.2 Class Relationships
75
4.2.1 Hierarchical Classification
76
4.2.2 Network Relationships
77
4.2.3 Class Membership: Binary, Probabilistic, or Fuzzy
78
4.3 Transformations of Values
81
4.3.1 Transformation of Words by Stemming
82
4.3.2 Sound-Based Transformation of Words
85
4.3.3 Transformation of Words by Meaning
86
4.3.4 Transformation of Graphics
88
4.3.5 Transformation of Sound
91
4.4 Uniqueness of Values
93
4.5 Ambiguity of Attribute Values
94
4.6 Indexing of Text
96
4.7 Control of Vocabulary
98
4.7.1 Elements of Control
98
4.7.2 Dissemination of Controlled Vocabularies
100
4.8 Importance of Point of View
100
4.9 Summary
102
5 Models of Virtual Data Structure
5.1 Concept of Models of Data
103
5.2 Basic Data Elements and Structures
106
5.2.1 Scalar Variables and Constants
106
5.2.2 Vector Variables
107
5.2.3 Structures
107
5.2.4 Arrays
107
5.2.5 Tuples
107
5.2.6 Relations
109
5.2.7 Text
109
5.3 Common Structural Models
111
5.3.1 Linear Sequential Model
112
5.3.2 Relational Model
112
5.3.3 Hierarchical and Network Models
114
5.4 Applications of the Basic Models
116
5.4.1 Hypertext
116
5.4.2 Spreadsheet Files
118
5.5 Entity-Relationship Model
120
5.6 Summary
121
6 The Physical Structure of Data
6.1 Introduction to Physical Structures
123
6.2 Record Structures and Their Effects
124
6.2.1 Basic Structures
124
6.2.2 Space-Time and Transaction Rate
127
6.3 Basic Concepts of File Structure
127
6.3.1 The Order of Records
128
6.3.2 Finding Records
128
6.4 Organizational Methods
129
6.4.1 Sequential Files
129
6.4.2 Index-File Structures
131
6.4.3 Lists
133
6.4.4 Trees
136
6.4.5 Direct-Access Structures
138
6.5 Parsing of Data Elements
141
6.5.1 Phrase Parsing
142
6.5.2 Word Parsing
143
6.5.3 Word and Phrase Parsing
143
6.6 Combination Structures
144
6.6.1 Nested Indexes
144
6.6.2 Direct Structure with Chains
145
6.6.3 Indexed Sequential Access Method
147
6.7 Summary
148
7 Querying the Information Retrieval System
7.1 Introduction
151
7.2 Language Types
152
7.3 Query Logic
154
7.3.1 Sets and Subsets
155
7.3.2 Relational Statements
155
7.3.3 Boolean Query Logic
156
7.3.4 Ranked and Fuzzy Sets
159
7.3.5 Similarity Measures
162
7.4 Functions Performed
162
7.4.1 Connect to an IRS
162
7.4.2 Select a Database
164
7.4.3 Search the Inverted File or Thesaurus
164
7.4.4 Create a Subset of the Database
167
7.4.5 Search for Strings
168
7.4.6 Analyze a Set
170
7.4.7 Sort, Display, and Format Records
171
7.4.8 Handle the Unstructured Record
172
7.4.9 Download
172
7.4.10 Order Documents
173
7.4.11 Save, Recall, and Edit Searches
173
7.4.12 Current Awareness Search
174
7.4.13 Cost Summary
175
7.4.14 Terminate a Session
175
7.5 The Basis for Charging for Searches
176
8 Interpretation and Execution of Query Statements
8.1 Problems of Query Language Interpretation
177
8.1.1 Parsing Command Language
178
8.1.2 Parsing Natural Language
181
8.1.3 Processing Menu Choices
183
8.2 Executing Retrieval Commands
184
8.2.1 Database Selection
184
8.2.2 Inverted File Search
184
8.2.3 Set or Subset Creation
185
8.2.4 Truncation and Universal Characters
187
8.2.5 Left-Hand Truncation
188
8.3 Executing Record Analysis and Presentation Commands
191
8.3.1 Set Analysis Functions
191
8.3.2 Display, Format, and Sort
193
8.3.3 Offline Printing
195
8.4 Executing Other Commands
196
8.4.1 Ordering
196
8.4.2 Save, Recall, and Edit Searches
196
8.4.3 Current Awareness
197
8.4.4 Cost Summation and Billing
198
8.4.5 Terminate a Session
199
8.5 Feedback to Users and Error Messages
199
8.5.1 Response to Command Errors
199
8.5.2 Set-Size Indication
200
8.5.3 Record Display
200
8.5.4 Set Analysis
201
8.5.5 Cost
201
8.5.6 Help
201
9 Text Searching
9.1 The Special Problems of Text Searching
203
9.1.1 A Note on Terminology and Symbols
204
9.1.2 The Semantic Web
205
9.2 Some Characteristics of Text and Their Applications
207
9.2.1 Components of Text
207
9.2.2 Significant Words Indexing
208
9.2.3 Significant Sentences—Abstracting
209
9.2.4 Measures of Complete Texts
213
9.3 Command Language for Text Searching
214
9.3.1 Set Membership Statements
215
9.3.2 Word or String Occurrence Statements
215
9.3.3 Proximity Statements
215
9.3.4 Web Based Text Search
217
9.4 Term Weighting
218
9.4.1 Indexing with Weights
220
9.4.2 Automated Assignment of Weights
220
9.4.3 Improving Weights
221
9.5 Word Association Techniques
221
9.5.1 Dictionaries and Thesauri
221
9.5.2 Mini-Thesauri
222
9.5.3 Word Co-occurrence Statistics
223
9.5.4 Stemming and Conflation
224
9.6 Text or Record Association Techniques
224
9.6.1 Similarity Measures
225
9.6.2 Clustering
228
9.6.3 Signature Matching
230
9.6.4 Discriminant Methods
233
9.7 Other Processes with Words of a Text
234
9.7.1 Stop Words
234
9.7.2 Replacement of Words with Roots or Associated Words
235
9.7.3 Varying Significance as a Function of Frequency
236
9.7.4 Comments on the Computation of the Strength of Document Association
236
10 System-Computed Relevance and Ranking
10.1 The Retrieval Status Value (rsv)
241
10.2 Ranking
241
10.3 Methods of Evaluating the rsv
242
10.3.1 The Vector Space Model
242
10.3.2 The Probabilistic Model
244
10.3.3 The Extended Boolean Model
245
10.4 The rsv in Operational Retrieval
247
11 Search Feedback and Iteration
11.1 Basic Concepts of Feedback and Iteration
249
11.2 Command Sequences
251
11.3 Information Available as Feedback
252
11.3.1 File or Database Selection
252
11.3.2 Term Search or Browsing
253
11.3.3 Record Search and Set Formation
254
11.3.4 Record Display and Browsing
256
11.3.5 Record Acquisition
257
11.3.6 Requests for Information About the Retrieval System
257
11.3.7 Establishing Communications Parameters
258
11.3.8 Trends Over Sequences and Cycles
258
11.4 Adjustments in the Search
259
11.4.1 Improve Term Selection
260
11.4.2 Improve Set Formation Logic
260
11.4.3 Improve Final Set Size
260
11.4.4 Improve Precision, Recall, or Total Utility
260
11.5 Feedback from User to System
261
12 Multi-Database Searching and Mapping
12.1 Basic Concepts
265
12.2 Multi-Database Search
266
12.2.1 Nature of Duplicate Records
266
12.2.2 Detection of Duplicates
269
12.2.3 Scanning Multiple Databases
271
12.3 Mapping
273
12.4 Value of Mapping
275
13 Search Strategy
13.1 The Nature of Searching Reconsidered
277
13.1.1 Known Item Search
278
13.1.2 Specific Information Search
278
13.1.3 General Information Search
278
13.1.4 Exploration of the Database
279
13.2 The Nature of Search Strategy
279
13.2.1 Search Objective
280
13.2.2 General Plan of Operation
280
13.2.3 The Essential Information Elements of a Search
281
13.2.4 Specific Plan of Operation
282
13.3 Types of Strategies
282
13.3.1 Categorizing by Objective
283
13.3.2 Categorizing by Plan of Operation
283
13.4 Tactics
285
13.4.1 Monitoring Tactics
286
13.4.2 File Structure Tactics
286
13.4.3 Search Formulation Tactics
286
13.4.4 Term Tactics
286
13.5 Summary
286
14 The Information Retrieval System Interface
14.1 General Model of Message Flow
287
14.2 Sources of Ambiguity
290
14.3 The Role of a Search Intermediary
291
14.3.1 Establishing the Information Need
292
14.3.2 Development of a Search Strategy
269
14.3.3 Translation of the Need Statement into a Query
292
14.3.4 Interpretation and Evaluation of Output
293
14.3.5 Search Iteration within die Strategic Plan
293
14.3.6 Change of Strategy When Necessary
293
14.3.7 Help in Using an IR.S
294
14.4 Automated Search Mediation
294
14.4.1 Early Development
294
14.4.2 Fully Automatic Intermediary Functions
295
14.4.3 Interactive Intermediary Functions
296
14.5 The User Interface as a Component of All Systems
298
14.6 The User Interface in Web Search Engines
299
15 A Sampling of Information Retrieval Systems
15.1 Introduction
301
15.2 Dialog
302
15.2.1 Command Language Using 13oolean Logic
303
15.2.2 Target
304
15.2.3 DIALOGWeb: A Web Adaptation
305
15.3 Alta Vista
308
15.3.1 Default Query Entry Form
309
15.3.2 Advanced Search Form
310
15.4 Google
311
15.4.1 Web Crawler
311
15.4.2 Searching
312
15.4.3 Google Advanced Search
312
15.5 PubMed
313
15.6 EBSCO Host
314
15.7 Summary
315
16 Measurement and Evaluation
16.1 Basics of Measurement
317
16.1.1 The Data Manager
318
16.1.2 The Query Manager
319
16.1.3 The Query Composition Process
319
16.1.4 Deriving the Information Need
320
16.1.5 The Database
320
16.1.6 Users
321
16.2 Relevance, Value, and Utility
321
16.2.1 Relevance as Relatedness
322
16.2.2 Aspects of Value
322
16.2.3 Relevance as Utility
323
16.2.4 Retaining Two Separate Relevance Measures
323
16.2.5 The Relevance Measurement Scale
325
16.2.6 Taking the Measurements
326
16.2.7 Questions about Relevance as a Measure
327
16.3 Measures Based on Relevance
328
16.3.1 Precision (Pr)
328
16.3.2 Recall (Re)
329
16.3.3 Relationship of Recall and Precision
330
16.3.4 Overall Effectiveness Measures Based on Re and Pr
331
16.4 Measures of Process
334
16.4.1 Query Translation
334
16.4.2 Errors in a Query Statement
334
16.4.3 Average Time per Command or per User Decision
335
16.4.4 Elapsed Time of a Search
335
16.4.5 Number of Commands or Steps in a Search
335
16.4.6 Cost of a Search
335
16.4.7 Size of Final Set Formed
336
16.4.8 Number of Records Reviewed by the User
336
16.4.9 Patterns of Language Use
336
16.4.10 Measures of Rank Order
339
16.5 Measures of Outcome
340
16.5.1 Precision
341
16.5.2 Recall
341
16.5.3 Efficiency
341
16.5.4 Overall User Evaluation
341
16.6 Measures of Environment
342
16.6.1 Database Record Selection
342
16.6.2 Record Content
342
16.6.3 Measures of Users
342
16.7 Conclusion
343
Bibliography 345
Index 357
Charles T. Meadow, professor emeritus, University of Toronto, and has been visiting professor at the Universities of North Carolina and the West Indies. He edited the Journal of the American Society for Information Science and the Canadian Journal of Information Science and was president of the Canadian Association for Information Science. Received Research Award and shared Annual Information Science Book Award from ASIS&T. Bert Boyce has been an Information System Research Analyst, for the Information Systems Office, at the Library of Congress, a faculty member and acting Dean of the School of Library and Information Science, University of Missouri, Columbia, Missouri, and Dean of the School of Library and Information Science, Louisiana State University, where he is now Professor and Dean Emeritus. He is currently Editor of the Academic Press Library and Information Science Series. He received the ASIS&T Outstanding Information Science Teacher Award in 1989, and has shared the Annual Information Science Book Award from ASIS&T. Donald Kraft is professor at LSU and Distinguished Visiting Professor at the U.S. Air Force Academy. He is a fellow of IEEE and AAAS and editor of the Journal of the American Society for Information Science and Technology He received the Research Award, Watson Davis Award, and shared the Annual Information Science Book Award from ASIS&T and the LSU Distinguished Faculty award. Carol Barry is associate professor in the School of Library and Information Science, Louisiana State University. She has received the Best JASIS Paper Award, 1995; the LSU Alumni Association Teaching Award, 1995; and the American Society for Information Science, Doctoral Forum Award, 1993. She is associate editor of JASIS&T, a Member of the Board of ASIS&T, and a member of the LSU Faculty Senate and its vice president in 2000-2001. She has authored or co-authored over 30 research papers.