Corpus Annotation gives an up-to-date picture of this fascinating new area of research, and will provide essential reading for newcomers to the field as well as those already involved in corpus annotation. Early chapters introduce the different levels and techniques of corpus annotation. Later chapters deal with software developments, applications, and the development of standards for the evaluation of corpus annotation. While the book takes detailed account of research world-wide, its focus is particularly on the work of the UCREL (University Centre for Computer Corpus Research on Language) team at Lancaster University, which has been at the forefront of developments in the field of corpus annotation since its beginnings in the 1970s.
This is the first book to survey the growing field of research known as corpus annotation. The computer corpus has become a central resource in the study of language, and it is increasingly seen as essential to annotate a corpus linguistically in order to successfully extract information from it. This is not only a practical task, but it also provides new insight into the nature of language and the most effective means of analyzing it. The text offers an introduction to the field and presents important research carried out by the University Centre for Computer Corpus Research on Language team at Lancaster University, that will be of interest to those already involved in corpus annotation.
Contributors vii(1) Preface viii 1 Introducing corpus annotation 1(18) Geoffrey Leech 2 Grammatical tagging 19(15) Geoffrey Leech 3 Syntactic annotation: treebanks 34(19) Geoffrey Leech Elizabeth Eyes 4 Semantic annotation 53(13) Andrew Wilson Jenny Thomas 5 Discourse annotation: anaphoric relations in corpora 66(19) Roger Garside Steve Fligelstone Simon Botley 6 Further levels of annotation 85(17) Geoffrey Leech Tony McEnery Martin Wynne 7 A hybrid grammatical tagger: CLAWS4 102(20) Roger Garside Nicholas Smith 8 How to generalize the task of annotation 122(15) Steve Fligelstone Mike Pacey Paul Rayson 9 Improving a tagger 137(14) Nicholas Smith 10 Retargeting a tagger 151(15) Fernando Sanchez Leon Amalio F. Nieto Serrano 11 The use of syntactic annotation tools: partial and full parsing 166(13) Jeremy Bateman Jean Forrest Tim Willis 12 Higher-level annotation tools 179(15) Roger Garside Paul Rayson 13 A corpus annotation toolbox 194(15) Tony McEnery Paul Rayson 14 A corpus-based grammar tutor 209(11) Tony McEnery John Paul Baker John Hutchinson 15 The exploitation of multilingual annotated corpora for term extraction 220(11) Tony McEnery Jean-Marc Lange Michael Oakes Jean Veronis 16 Towards cross-linguistic standards or guidelines for the annotation of corpora 231(12) Peter Kahrel Ruthanna Barnett Geoffrey Leech 17 Consistency and accuracy in correcting automatically tagged data 243(8) John Paul Baker Appendix I: Sources for further information 251(3) Appendix II: Glossary of abbreviations and acronyms 254(2) Appendix III: Specimen annotation practices: the C7 and C5 tagsets 256(5) Bibliography 261(16) Index 277
R.G. Garside, Geoffrey Leech, Anthony Mark Mcenery