Atnaujinkite slapukų nuostatas

Crowdsourcing for Speech Processing: Applications to Data Collection, Transcription and Assessment [Kietas viršelis]

3.80/5 (10 ratings by Goodreads)
(Carnegie Mellon University, USA), (Amazon.com, USA), (The Chinese University of Hong Kong), (University of Washington, USA), (Baden-Wuerttemberg Cooperative State University, Germany)
  • Formatas: Hardback, 356 pages, aukštis x plotis x storis: 252x175x20 mm, weight: 680 g
  • Išleidimo metai: 05-Apr-2013
  • Leidėjas: John Wiley & Sons Inc
  • ISBN-10: 1118358694
  • ISBN-13: 9781118358696
Kitos knygos pagal šią temą:
  • Formatas: Hardback, 356 pages, aukštis x plotis x storis: 252x175x20 mm, weight: 680 g
  • Išleidimo metai: 05-Apr-2013
  • Leidėjas: John Wiley & Sons Inc
  • ISBN-10: 1118358694
  • ISBN-13: 9781118358696
Kitos knygos pagal šią temą:
Provides an insightful and practical introduction to crowdsourcing as a means of rapidly processing speech data

Intended for those who want to get started in the domain and  learn how to set up a task, what interfaces are available, how to assess the work, etc. as well as for those who already have used crowdsourcing and want to create better tasks and obtain better assessments of the work of the crowd. It will include screenshots to show examples of good and poor interfaces; examples of case studies in speech processing tasks, going through the task creation process, reviewing options in the interface, in the choice of medium (MTurk or other) and explaining choices, etc.





Provides an insightful and practical introduction to crowdsourcing as a means of rapidly processing speech data. Addresses important aspects of this new technique that should be mastered before attempting a crowdsourcing application. Offers speech researchers the hope that they can spend much less time dealing with the data gathering/annotation bottleneck, leaving them to focus on the scientific issues.  Readers will directly benefit from the books successful examples of how crowd- sourcing was implemented for speech processing, discussions of interface and processing choices that worked and  choices that didnt, and guidelines on how to play and record speech over the internet, how to design tasks, and how to assess workers.

Essential reading for researchers and practitioners in speech research groups involved in speech processing
List of Contributors
xiii
Preface xv
1 An Overview
1(7)
Maxine Eskenazi
1.1 Origins of Crowdsourcing
2(1)
1.2 Operational Definition of Crowdsourcing
3(1)
1.3 Functional Definition of Crowdsourcing
3(1)
1.4 Some Issues
4(2)
1.5 Some Terminology
6(1)
1.6 Acknowledgments
6(2)
References
6(2)
2 The Basics
8(29)
Maxine Eskenazi
2.1 An Overview of the Literature on Crowdsourcing for Speech Processing
8(6)
2.1.1 Evolution of the Use of Crowdsourcing for Speech
9(1)
2.1.2 Geographic Locations of Crowdsourcing for Speech
10(2)
2.1.3 Specific Areas of Research
12(2)
2.2 Alternative Solutions
14(1)
2.3 Some Ready-Made Platforms for Crowdsourcing
15(2)
2.4 Making Task Creation Easier
17(1)
2.5 Getting Down to Brass Tacks
17(12)
2.5.1 Hearing and Being Heard over the Web
18(2)
2.5.2 Prequalification
20(1)
2.5.3 Native Language of the Workers
21(1)
2.5.4 Payment
22(3)
2.5.5 Choice of Platform in the Literature
25(2)
2.5.6 The Complexity of the Task
27(2)
2.6 Quality Control
29(3)
2.6.1 Was That Worker a Bot?
29(1)
2.6.2 Quality Control in the Literature
29(3)
2.7 Judging the Quality of the Literature
32(1)
2.8 Some Quick Tips
33(1)
2.9 Acknowledgments
33(4)
References
33(2)
Further reading
35(2)
3 Collecting Speech from Crowds
37(35)
Ian McGraw
3.1 A Short History of Speech Collection
38(5)
3.1.1 Speech Corpora
38(2)
3.1.2 Spoken Language Systems
40(1)
3.1.3 User-Configured Recording Environments
41(2)
3.2 Technology for Web-Based Audio Collection
43(6)
3.2.1 Silverlight
44(1)
3.2.2 Java
45(1)
3.2.3 Flash
46(2)
3.2.4 HTML and JavaScript
48(1)
3.3 Example: WAMI Recorder
49(3)
3.3.1 The JavaScript API
49(2)
3.3.2 Audio Formats
51(1)
3.4 Example: The WAMI Server
52(7)
3.4.1 PHP Script
52(2)
3.4.2 Google App Engine
54(3)
3.4.3 Server Configuration Details
57(2)
3.5 Example: Speech Collection on Amazon Mechanical Turk
59(6)
3.5.1 Server Setup
60(1)
3.5.2 Deploying to Amazon Mechanical Turk
61(3)
3.5.3 The Command-Line Interface
64(1)
3.6 Using the Platform Purely for Payment
65(2)
3.7 Advanced Methods of Crowdsourced Audio Collection
67(2)
3.7.1 Collecting Dialog Interactions
67(1)
3.7.2 Human Computation
68(1)
3.8 Summary
69(1)
3.9 Acknowledgments
69(3)
References
70(2)
4 Crowdsourcing for Speech Transcription
72(34)
Gabriel Parent
4.1 Introduction
72(1)
4.1.1 Terminology
72(1)
4.2 Transcribing Speech
73(7)
4.2.1 The Need for Speech Transcription
74(1)
4.2.2 Quantifying Speech Transcription
75(3)
4.2.3 Brief History
78(1)
4.2.4 Is Crowdsourcing Well Suited to My Needs?
79(1)
4.3 Preparing the Data
80(3)
4.3.1 Preparing the Audio Clips
80(1)
4.3.2 Preprocessing the Data with a Speech Recognizer
81(1)
4.3.3 Creating a Gold-Standard Dataset
82(1)
4.4 Setting Up the Task
83(8)
4.4.1 Creating Your Task with the Platform Template Editor
83(2)
4.4.2 Creating Your Task on Your Own Server
85(2)
4.4.3 Instruction Design
87(2)
4.4.4 Know the Workers
89(2)
4.4.5 Game Interface
91(1)
4.5 Submitting the Open Call
91(4)
4.5.1 Payment
92(1)
4.5.2 Number of Distinct Judgments
93(2)
4.6 Quality Control
95(7)
4.6.1 Normalization
95(1)
4.6.2 Unsupervised Filters
96(3)
4.6.3 Supervised Filters
99(1)
4.6.4 Aggregation Techniques
100(1)
4.6.5 Quality Control Using Multiple Passes
101(1)
4.7 Conclusion
102(1)
4.8 Acknowledgments
103(3)
References
103(3)
5 How to Control and Utilize Crowd-Collected Speech
106(31)
Ian McGraw
Joseph Polifroni
5.1 Read Speech
107(4)
5.1.1 Collection Procedure
107(1)
5.1.2 Corpus Overview
108(3)
5.2 Multimodal Dialog Interactions
111(9)
5.2.1 System Design
111(1)
5.2.2 Scenario Creation
111(1)
5.2.3 Data Collection
112(3)
5.2.4 Data Transcription
115(3)
5.2.5 Data Analysis
118(2)
5.3 Games for Speech Collection
120(1)
5.4 Quizlet
121(2)
5.5 Voice Race
123(6)
5.5.1 Self-Transcribed Data
124(1)
5.5.2 Simplified Crowdsourced Transcription
124(1)
5.5.3 Data Analysis
125(1)
5.5.4 Human Transcription
126(1)
5.5.5 Automatic Transcription
127(1)
5.5.6 Self-Supervised Acoustic Model Adaptation
127(2)
5.6 Voice Scatter
129(6)
5.6.1 Corpus Overview
130(1)
5.6.2 Crowdsourced Transcription
131(1)
5.6.3 Filtering for Accurate Hypotheses
132(1)
5.6.4 Self-Supervised Acoustic Model Adaptation
133(2)
5.7 Summary
135(1)
5.8 Acknowledgments
135(2)
References
136(1)
6 Crowdsourcing in Speech Perception
137(36)
Martin Cooke
Jon Barker
Maria Luisa
Garcia Lecumberri
6.1 Introduction
137(1)
6.2 Previous Use of Crowdsourcing in Speech and Hearing
138(2)
6.3 Challenges
140(5)
6.3.1 Control of the Environment
140(1)
6.3.2 Participants
141(3)
6.3.3 Stimuli
144(1)
6.4 Tasks
145(4)
6.4.1 Speech Intelligibility, Quality and Naturalness
145(1)
6.4.2 Accent Evaluation
146(1)
6.4.3 Perceptual Salience and Listener Acuity
147(1)
6.4.4 Phonological Systems
147(2)
6.5 BigListen: A Case Study in the Use of Crowdsourcing to Identify Words in Noise
149(18)
6.5.1 The Problem
149(1)
6.5.2 Speech and Noise Tokens
150(1)
6.5.3 The Client-Side Experience
150(1)
6.5.4 Technical Architecture
151(2)
6.5.5 Respondents
153(5)
6.5.6 Analysis of Responses
158(8)
6.5.7 Lessons from the BigListen Crowdsourcing Test
166(1)
6.6 Issues for Further Exploration
167(2)
6.7 Conclusions
169(4)
References
169(4)
7 Crowdsourced Assessment of Speech Synthesis
173(44)
Sabine Buchholz
Javier Latorre
Kayoko Yanagisawa
7.1 Introduction
173(1)
7.2 Human Assessment of TTS
174(3)
7.3 Crowdsourcing for TTS: What Worked and What Did Not
177(16)
7.3.1 Related Work: Crowdsourced Listening Tests
177(1)
7.3.2 Problem and Solutions: Audio on the Web
178(2)
7.3.3 Problem and Solution: Test of Significance
180(3)
7.3.4 What Assessment Types Worked
183(3)
7.3.5 What Did Not Work
186(4)
7.3.6 Problem and Solutions: Recruiting Native Speakers of Various Languages
190(3)
7.3.7 Conclusion
193(1)
7.4 Related Work: Detecting and Preventing Spamming
193(2)
7.5 Our Experiences: Detecting and Preventing Spamming
195(17)
7.5.1 Optional Playback Interface
196(5)
7.5.2 Investigating the Metrics Further: Mandatory Playback Interface
201(9)
7.5.3 The Prosecutor's Fallacy
210(2)
7.6 Conclusions and Discussion
212(5)
References
214(3)
8 Crowdsourcing for Spoken Dialog System Evaluation
217(24)
Zhaojun Yang
Gina-Anne Levow
Helen Meng
8.1 Introduction
217(3)
8.2 Prior Work on Crowdsourcing: Dialog and Speech Assessment
220(1)
8.2.1 Prior Work on Crowdsourcing for Dialog Systems
220(1)
8.2.2 Prior Work on Crowdsourcing for Speech Assessment
220(1)
8.3 Prior Work in SDS Evaluation
221(4)
8.3.1 Subjective User Judgments
221(1)
8.3.2 Interaction Metrics
222(1)
8.3.3 PARADISE Framework
223(1)
8.3.4 Alternative Approach to Crowdsourcing for SDS Evaluation
224(1)
8.4 Experimental Corpus and Automatic Dialog Classification
225(1)
8.5 Collecting User Judgments on Spoken Dialogs with Crowdsourcing
226(4)
8.5.1 Tasks for Dialog Evaluation
227(2)
8.5.2 Tasks for Interannotator Agreement
229(1)
8.5.3 Approval of Ratings
229(1)
8.6 Collected Data and Analysis
230(8)
8.6.1 Approval Rates and Comments from Workers
230(1)
8.6.2 Consistency between Automatic Dialog Classification and Manual Ratings
231(2)
8.6.3 Interannotator Agreement among Workers
233(2)
8.6.4 Interannotator Agreement on the Let's Go! System
235(1)
8.6.5 Consistency between Expert and Nonexpert Annotations
236(2)
8.7 Conclusions and Future Work
238(1)
8.8 Acknowledgments
238(3)
References
239(2)
9 Interfaces for Crowdsourcing Platforms
241(39)
Christoph Draxler
9.1 Introduction
241(1)
9.2 Technology
242(11)
9.2.1 TinyTask Web Page
242(1)
9.2.2 World Wide Web
242(1)
9.2.3 Hypertext Transfer Protocol
243(1)
9.2.4 Hypertext Markup Language
244(2)
9.2.5 Cascading Style Sheets
246(1)
9.2.6 JavaScript
246(2)
9.2.7 JavaScript Object Notation
248(1)
9.2.8 Extensible Markup Language
248(1)
9.2.9 Asynchronous JavaScript and XML
249(1)
9.2.10 Flash
250(1)
9.2.11 SOAP and REST
251(1)
9.2.12 Section Summary
252(1)
9.3 Crowdsourcing Platforms
253(8)
9.3.1 Crowdsourcing Platform Workflow
253(3)
9.3.2 Amazon Mechanical Turk
256(3)
9.3.3 CrowdFlower
259(1)
9.3.4 Clickworker
259(1)
9.3.5 WikiSpeech
260(1)
9.4 Interfaces to Crowdsourcing Platforms
261(17)
9.4.1 Implementing Tasks Using a GUI on CrowdFlower Platform
262(2)
9.4.2 Implementing Tasks Using the Command-Line Interface in MTurk
264(6)
9.4.3 Implementing a Task Using a RESTful Web Service in Clickworker
270(1)
9.4.4 Defining Tasks via Configuration Files in WikiSpeech
270(8)
9.5 Summary
278(2)
References
278(2)
10 Crowdsourcing for Industrial Spoken Dialog Systems
280(23)
David Suendermann
Roberto Pieraccini
10.1 Introduction
280(3)
10.1.1 Industry's Willful Ignorance
280(1)
10.1.2 Crowdsourcing in Industrial Speech Applications
281(1)
10.1.3 Public versus Private Crowd
282(1)
10.2 Architecture
283(4)
10.3 Transcription
287(3)
10.4 Semantic Annotation
290(6)
10.5 Subjective Evaluation of Spoken Dialog Systems
296(4)
10.6 Conclusion
300(3)
References
300(3)
11 Economic and Ethical Background of Crowdsourcing for Speech
303(32)
Gilles Adda
Joseph J. Mariani
Laurent Besacier
Hadrien Gelas
11.1 Introduction
303(1)
11.2 The Crowdsourcing Fauna
304(3)
11.2.1 The Crowdsourcing Services Landscape
304(2)
11.2.2 Who Are the Workers?
306(1)
11.2.3 Ethics and Economics in Crowdsourcing: How to Proceed?
307(1)
11.3 Economic and Ethical Issues
307(9)
11.3.1 What Are the Problems for the Workers?
309(1)
11.3.2 Crowdsourcing and Labor Laws
310(4)
11.3.3 Which Economic Model Is Sustainable for Crowdsourcing?
314(2)
11.4 Under-Resourced Languages: A Case Study
316(6)
11.4.1 Under-Resourced Languages Definition and Issues
317(1)
11.4.2 Collecting Annotated Speech for African Languages Using Crowdsourcing
317(1)
11.4.3 Experiment Description
317(1)
11.4.4 Results
318(3)
11.4.5 Discussion and Lessons Learned
321(1)
11.5 Toward Ethically Produced Language Resources
322(8)
11.5.1 Defining a Fair Compensation for Work Done
323(3)
11.5.2 Impact of Crowdsourcing on the Ecology of Linguistic Resources
326(1)
11.5.3 Defining an Ethical Framework: Some Solutions
326(4)
11.6 Conclusion
330(5)
Disclaimer
331(1)
References
331(4)
Index 335
Maxine Eskenazi, Carnegie Mellon University, USA Dr. Eskenazi is Principal Systems Scientist at the Language Technologies Institute, Carnegie Mellon University, USA. She has authored over 100 scientific papers in the areas of computer assisted language learning and speech and spoken dialog systems. Her work has produced such systems as the Let's Go spoken dialog system and the REAP vocabulary tutor. She is also the founder and CTO of the Carnegie Speech Company.

Gina-Anne Levow, University of Washington, USA Dr. Levow is currently an Assistant Professor in the Department of Linguistics, University of Washington, USA. Prior to joining the faculty at the University of Washington, she served on the faculty at the University of Chicago in the Department of Computer Science and as a Research Fellow at the University of Manchester, UK. She served on the Editorial Board of Computational Linguistics and as Associate Editor of ACM Transactions on Asian Language Processing.

Helen Meng, The Chinese University of Hong Kong, Hong Kong Dr. Meng is Founder and Director of the Human-Computer Communications Laboratory at The Chinese University of Hong Kong, and is also the Founder and Co-Director of the Microsoft-CUHK Joint Laboratory for Human-Centric Computing and Interface Technologies, which was conferred the national status of the Ministry of Education of China (MoE) Key Laboratory in 2008. Prof. Meng also served as an Associate Dean (Research) of the Faculty of Engineering from 2006 to 2010. She serves as Editor-in-Chief of the IEEE Transactions on Audio, Speech and Language Processing.

Gabriel Parent, Amazon.com, USA Gabriel Parent is a Software Development Engineer at Amazon.com working on solving natural language related problems. His main research focuses were human-computer interaction through spoken dialog systems and crowdsourcing.

David Suendermann, Baden-Wuerttemberg Cooperative State University, Germany Dr. Sundermann is currently full Professor of Computer Science at the Baden-Wuerttemberg Cooperative State University, Stuttgart, Germany. He is also the Principal Speech Scientist of SpeechCycle, New York, USA which has been recognized by Deloitte as a "Technology Fast 500" company based on revenue growth. He has authored more than 70 publications and patents, including a book and six book chapters.