|
|
xiii | |
Preface |
|
xv | |
|
|
1 | (7) |
|
|
1.1 Origins of Crowdsourcing |
|
|
2 | (1) |
|
1.2 Operational Definition of Crowdsourcing |
|
|
3 | (1) |
|
1.3 Functional Definition of Crowdsourcing |
|
|
3 | (1) |
|
|
4 | (2) |
|
|
6 | (1) |
|
|
6 | (2) |
|
|
6 | (2) |
|
|
8 | (29) |
|
|
2.1 An Overview of the Literature on Crowdsourcing for Speech Processing |
|
|
8 | (6) |
|
2.1.1 Evolution of the Use of Crowdsourcing for Speech |
|
|
9 | (1) |
|
2.1.2 Geographic Locations of Crowdsourcing for Speech |
|
|
10 | (2) |
|
2.1.3 Specific Areas of Research |
|
|
12 | (2) |
|
2.2 Alternative Solutions |
|
|
14 | (1) |
|
2.3 Some Ready-Made Platforms for Crowdsourcing |
|
|
15 | (2) |
|
2.4 Making Task Creation Easier |
|
|
17 | (1) |
|
2.5 Getting Down to Brass Tacks |
|
|
17 | (12) |
|
2.5.1 Hearing and Being Heard over the Web |
|
|
18 | (2) |
|
|
20 | (1) |
|
2.5.3 Native Language of the Workers |
|
|
21 | (1) |
|
|
22 | (3) |
|
2.5.5 Choice of Platform in the Literature |
|
|
25 | (2) |
|
2.5.6 The Complexity of the Task |
|
|
27 | (2) |
|
|
29 | (3) |
|
2.6.1 Was That Worker a Bot? |
|
|
29 | (1) |
|
2.6.2 Quality Control in the Literature |
|
|
29 | (3) |
|
2.7 Judging the Quality of the Literature |
|
|
32 | (1) |
|
|
33 | (1) |
|
|
33 | (4) |
|
|
33 | (2) |
|
|
35 | (2) |
|
3 Collecting Speech from Crowds |
|
|
37 | (35) |
|
|
3.1 A Short History of Speech Collection |
|
|
38 | (5) |
|
|
38 | (2) |
|
3.1.2 Spoken Language Systems |
|
|
40 | (1) |
|
3.1.3 User-Configured Recording Environments |
|
|
41 | (2) |
|
3.2 Technology for Web-Based Audio Collection |
|
|
43 | (6) |
|
|
44 | (1) |
|
|
45 | (1) |
|
|
46 | (2) |
|
3.2.4 HTML and JavaScript |
|
|
48 | (1) |
|
3.3 Example: WAMI Recorder |
|
|
49 | (3) |
|
|
49 | (2) |
|
|
51 | (1) |
|
3.4 Example: The WAMI Server |
|
|
52 | (7) |
|
|
52 | (2) |
|
|
54 | (3) |
|
3.4.3 Server Configuration Details |
|
|
57 | (2) |
|
3.5 Example: Speech Collection on Amazon Mechanical Turk |
|
|
59 | (6) |
|
|
60 | (1) |
|
3.5.2 Deploying to Amazon Mechanical Turk |
|
|
61 | (3) |
|
3.5.3 The Command-Line Interface |
|
|
64 | (1) |
|
3.6 Using the Platform Purely for Payment |
|
|
65 | (2) |
|
3.7 Advanced Methods of Crowdsourced Audio Collection |
|
|
67 | (2) |
|
3.7.1 Collecting Dialog Interactions |
|
|
67 | (1) |
|
|
68 | (1) |
|
|
69 | (1) |
|
|
69 | (3) |
|
|
70 | (2) |
|
4 Crowdsourcing for Speech Transcription |
|
|
72 | (34) |
|
|
|
72 | (1) |
|
|
72 | (1) |
|
|
73 | (7) |
|
4.2.1 The Need for Speech Transcription |
|
|
74 | (1) |
|
4.2.2 Quantifying Speech Transcription |
|
|
75 | (3) |
|
|
78 | (1) |
|
4.2.4 Is Crowdsourcing Well Suited to My Needs? |
|
|
79 | (1) |
|
|
80 | (3) |
|
4.3.1 Preparing the Audio Clips |
|
|
80 | (1) |
|
4.3.2 Preprocessing the Data with a Speech Recognizer |
|
|
81 | (1) |
|
4.3.3 Creating a Gold-Standard Dataset |
|
|
82 | (1) |
|
|
83 | (8) |
|
4.4.1 Creating Your Task with the Platform Template Editor |
|
|
83 | (2) |
|
4.4.2 Creating Your Task on Your Own Server |
|
|
85 | (2) |
|
|
87 | (2) |
|
|
89 | (2) |
|
|
91 | (1) |
|
4.5 Submitting the Open Call |
|
|
91 | (4) |
|
|
92 | (1) |
|
4.5.2 Number of Distinct Judgments |
|
|
93 | (2) |
|
|
95 | (7) |
|
|
95 | (1) |
|
4.6.2 Unsupervised Filters |
|
|
96 | (3) |
|
|
99 | (1) |
|
4.6.4 Aggregation Techniques |
|
|
100 | (1) |
|
4.6.5 Quality Control Using Multiple Passes |
|
|
101 | (1) |
|
|
102 | (1) |
|
|
103 | (3) |
|
|
103 | (3) |
|
5 How to Control and Utilize Crowd-Collected Speech |
|
|
106 | (31) |
|
|
|
|
107 | (4) |
|
5.1.1 Collection Procedure |
|
|
107 | (1) |
|
|
108 | (3) |
|
5.2 Multimodal Dialog Interactions |
|
|
111 | (9) |
|
|
111 | (1) |
|
|
111 | (1) |
|
|
112 | (3) |
|
|
115 | (3) |
|
|
118 | (2) |
|
5.3 Games for Speech Collection |
|
|
120 | (1) |
|
|
121 | (2) |
|
|
123 | (6) |
|
5.5.1 Self-Transcribed Data |
|
|
124 | (1) |
|
5.5.2 Simplified Crowdsourced Transcription |
|
|
124 | (1) |
|
|
125 | (1) |
|
5.5.4 Human Transcription |
|
|
126 | (1) |
|
5.5.5 Automatic Transcription |
|
|
127 | (1) |
|
5.5.6 Self-Supervised Acoustic Model Adaptation |
|
|
127 | (2) |
|
|
129 | (6) |
|
|
130 | (1) |
|
5.6.2 Crowdsourced Transcription |
|
|
131 | (1) |
|
5.6.3 Filtering for Accurate Hypotheses |
|
|
132 | (1) |
|
5.6.4 Self-Supervised Acoustic Model Adaptation |
|
|
133 | (2) |
|
|
135 | (1) |
|
|
135 | (2) |
|
|
136 | (1) |
|
6 Crowdsourcing in Speech Perception |
|
|
137 | (36) |
|
|
|
|
|
|
137 | (1) |
|
6.2 Previous Use of Crowdsourcing in Speech and Hearing |
|
|
138 | (2) |
|
|
140 | (5) |
|
6.3.1 Control of the Environment |
|
|
140 | (1) |
|
|
141 | (3) |
|
|
144 | (1) |
|
|
145 | (4) |
|
6.4.1 Speech Intelligibility, Quality and Naturalness |
|
|
145 | (1) |
|
|
146 | (1) |
|
6.4.3 Perceptual Salience and Listener Acuity |
|
|
147 | (1) |
|
6.4.4 Phonological Systems |
|
|
147 | (2) |
|
6.5 BigListen: A Case Study in the Use of Crowdsourcing to Identify Words in Noise |
|
|
149 | (18) |
|
|
149 | (1) |
|
6.5.2 Speech and Noise Tokens |
|
|
150 | (1) |
|
6.5.3 The Client-Side Experience |
|
|
150 | (1) |
|
6.5.4 Technical Architecture |
|
|
151 | (2) |
|
|
153 | (5) |
|
6.5.6 Analysis of Responses |
|
|
158 | (8) |
|
6.5.7 Lessons from the BigListen Crowdsourcing Test |
|
|
166 | (1) |
|
6.6 Issues for Further Exploration |
|
|
167 | (2) |
|
|
169 | (4) |
|
|
169 | (4) |
|
7 Crowdsourced Assessment of Speech Synthesis |
|
|
173 | (44) |
|
|
|
|
|
173 | (1) |
|
7.2 Human Assessment of TTS |
|
|
174 | (3) |
|
7.3 Crowdsourcing for TTS: What Worked and What Did Not |
|
|
177 | (16) |
|
7.3.1 Related Work: Crowdsourced Listening Tests |
|
|
177 | (1) |
|
7.3.2 Problem and Solutions: Audio on the Web |
|
|
178 | (2) |
|
7.3.3 Problem and Solution: Test of Significance |
|
|
180 | (3) |
|
7.3.4 What Assessment Types Worked |
|
|
183 | (3) |
|
|
186 | (4) |
|
7.3.6 Problem and Solutions: Recruiting Native Speakers of Various Languages |
|
|
190 | (3) |
|
|
193 | (1) |
|
7.4 Related Work: Detecting and Preventing Spamming |
|
|
193 | (2) |
|
7.5 Our Experiences: Detecting and Preventing Spamming |
|
|
195 | (17) |
|
7.5.1 Optional Playback Interface |
|
|
196 | (5) |
|
7.5.2 Investigating the Metrics Further: Mandatory Playback Interface |
|
|
201 | (9) |
|
7.5.3 The Prosecutor's Fallacy |
|
|
210 | (2) |
|
7.6 Conclusions and Discussion |
|
|
212 | (5) |
|
|
214 | (3) |
|
8 Crowdsourcing for Spoken Dialog System Evaluation |
|
|
217 | (24) |
|
|
|
|
|
217 | (3) |
|
8.2 Prior Work on Crowdsourcing: Dialog and Speech Assessment |
|
|
220 | (1) |
|
8.2.1 Prior Work on Crowdsourcing for Dialog Systems |
|
|
220 | (1) |
|
8.2.2 Prior Work on Crowdsourcing for Speech Assessment |
|
|
220 | (1) |
|
8.3 Prior Work in SDS Evaluation |
|
|
221 | (4) |
|
8.3.1 Subjective User Judgments |
|
|
221 | (1) |
|
8.3.2 Interaction Metrics |
|
|
222 | (1) |
|
|
223 | (1) |
|
8.3.4 Alternative Approach to Crowdsourcing for SDS Evaluation |
|
|
224 | (1) |
|
8.4 Experimental Corpus and Automatic Dialog Classification |
|
|
225 | (1) |
|
8.5 Collecting User Judgments on Spoken Dialogs with Crowdsourcing |
|
|
226 | (4) |
|
8.5.1 Tasks for Dialog Evaluation |
|
|
227 | (2) |
|
8.5.2 Tasks for Interannotator Agreement |
|
|
229 | (1) |
|
8.5.3 Approval of Ratings |
|
|
229 | (1) |
|
8.6 Collected Data and Analysis |
|
|
230 | (8) |
|
8.6.1 Approval Rates and Comments from Workers |
|
|
230 | (1) |
|
8.6.2 Consistency between Automatic Dialog Classification and Manual Ratings |
|
|
231 | (2) |
|
8.6.3 Interannotator Agreement among Workers |
|
|
233 | (2) |
|
8.6.4 Interannotator Agreement on the Let's Go! System |
|
|
235 | (1) |
|
8.6.5 Consistency between Expert and Nonexpert Annotations |
|
|
236 | (2) |
|
8.7 Conclusions and Future Work |
|
|
238 | (1) |
|
|
238 | (3) |
|
|
239 | (2) |
|
9 Interfaces for Crowdsourcing Platforms |
|
|
241 | (39) |
|
|
|
241 | (1) |
|
|
242 | (11) |
|
|
242 | (1) |
|
|
242 | (1) |
|
9.2.3 Hypertext Transfer Protocol |
|
|
243 | (1) |
|
9.2.4 Hypertext Markup Language |
|
|
244 | (2) |
|
9.2.5 Cascading Style Sheets |
|
|
246 | (1) |
|
|
246 | (2) |
|
9.2.7 JavaScript Object Notation |
|
|
248 | (1) |
|
9.2.8 Extensible Markup Language |
|
|
248 | (1) |
|
9.2.9 Asynchronous JavaScript and XML |
|
|
249 | (1) |
|
|
250 | (1) |
|
|
251 | (1) |
|
|
252 | (1) |
|
9.3 Crowdsourcing Platforms |
|
|
253 | (8) |
|
9.3.1 Crowdsourcing Platform Workflow |
|
|
253 | (3) |
|
9.3.2 Amazon Mechanical Turk |
|
|
256 | (3) |
|
|
259 | (1) |
|
|
259 | (1) |
|
|
260 | (1) |
|
9.4 Interfaces to Crowdsourcing Platforms |
|
|
261 | (17) |
|
9.4.1 Implementing Tasks Using a GUI on CrowdFlower Platform |
|
|
262 | (2) |
|
9.4.2 Implementing Tasks Using the Command-Line Interface in MTurk |
|
|
264 | (6) |
|
9.4.3 Implementing a Task Using a RESTful Web Service in Clickworker |
|
|
270 | (1) |
|
9.4.4 Defining Tasks via Configuration Files in WikiSpeech |
|
|
270 | (8) |
|
|
278 | (2) |
|
|
278 | (2) |
|
10 Crowdsourcing for Industrial Spoken Dialog Systems |
|
|
280 | (23) |
|
|
|
|
280 | (3) |
|
10.1.1 Industry's Willful Ignorance |
|
|
280 | (1) |
|
10.1.2 Crowdsourcing in Industrial Speech Applications |
|
|
281 | (1) |
|
10.1.3 Public versus Private Crowd |
|
|
282 | (1) |
|
|
283 | (4) |
|
|
287 | (3) |
|
|
290 | (6) |
|
10.5 Subjective Evaluation of Spoken Dialog Systems |
|
|
296 | (4) |
|
|
300 | (3) |
|
|
300 | (3) |
|
11 Economic and Ethical Background of Crowdsourcing for Speech |
|
|
303 | (32) |
|
|
|
|
|
|
303 | (1) |
|
11.2 The Crowdsourcing Fauna |
|
|
304 | (3) |
|
11.2.1 The Crowdsourcing Services Landscape |
|
|
304 | (2) |
|
11.2.2 Who Are the Workers? |
|
|
306 | (1) |
|
11.2.3 Ethics and Economics in Crowdsourcing: How to Proceed? |
|
|
307 | (1) |
|
11.3 Economic and Ethical Issues |
|
|
307 | (9) |
|
11.3.1 What Are the Problems for the Workers? |
|
|
309 | (1) |
|
11.3.2 Crowdsourcing and Labor Laws |
|
|
310 | (4) |
|
11.3.3 Which Economic Model Is Sustainable for Crowdsourcing? |
|
|
314 | (2) |
|
11.4 Under-Resourced Languages: A Case Study |
|
|
316 | (6) |
|
11.4.1 Under-Resourced Languages Definition and Issues |
|
|
317 | (1) |
|
11.4.2 Collecting Annotated Speech for African Languages Using Crowdsourcing |
|
|
317 | (1) |
|
11.4.3 Experiment Description |
|
|
317 | (1) |
|
|
318 | (3) |
|
11.4.5 Discussion and Lessons Learned |
|
|
321 | (1) |
|
11.5 Toward Ethically Produced Language Resources |
|
|
322 | (8) |
|
11.5.1 Defining a Fair Compensation for Work Done |
|
|
323 | (3) |
|
11.5.2 Impact of Crowdsourcing on the Ecology of Linguistic Resources |
|
|
326 | (1) |
|
11.5.3 Defining an Ethical Framework: Some Solutions |
|
|
326 | (4) |
|
|
330 | (5) |
|
|
331 | (1) |
|
|
331 | (4) |
Index |
|
335 | |