Klientų aptarnavimas: +370 652 87781

Pagalba | Naujas vartotojas | Prisijungti

Crowdsourcing for Speech Processing: Applications to Data Collection, Transcription and Assessment [Kietas viršelis]

3.80/5 (10 ratings by Goodreads)

Maxine Eskenazi (Carnegie Mellon University, USA), Gabriel Parent (Amazon.com, USA), Helen Meng (The Chinese University of Hong Kong), Gina-Anne Levow (University of Washington, USA), David Suendermann (Baden-Wuerttemberg Cooperative State University, Germany)

Kiti formatai

Other digital carrier (Kaina: 104,73 €) - 08-Mar-2013

Formatas: Hardback, 356 pages, aukštis x plotis x storis: 252x175x20 mm, weight: 680 g
Išleidimo metai: 05-Apr-2013
Leidėjas: John Wiley & Sons Inc
ISBN-10: 1118358694
ISBN-13: 9781118358696

Kitos knygos pagal šią temą:

Speech recognition

Kietas viršelis
Kaina: 129,63 €
Knygas pristatysime per 3-4 savaites.
Kiekis:
- - 1
  - 2
  - 3
  - 4
  - 5
  - 6
  - 7
  - 8
  - 9
  - 10
Įdėti į krepšelį
Pristatymas per 4-6 savaites
Įtraukti į pageidavimų sąrašą
Bibliotekoms

Formatas: Hardback, 356 pages, aukštis x plotis x storis: 252x175x20 mm, weight: 680 g
Išleidimo metai: 05-Apr-2013
Leidėjas: John Wiley & Sons Inc
ISBN-10: 1118358694
ISBN-13: 9781118358696

Kitos knygos pagal šią temą:

Speech recognition

Pastovi nuoroda: https://www.kriso.lt/db/9781118358696.html

Raktažodžiai:

Provides an insightful and practical introduction to crowdsourcing as a means of rapidly processing speech data

Intended for those who want to get started in the domain and learn how to set up a task, what interfaces are available, how to assess the work, etc. as well as for those who already have used crowdsourcing and want to create better tasks and obtain better assessments of the work of the crowd. It will include screenshots to show examples of good and poor interfaces; examples of case studies in speech processing tasks, going through the task creation process, reviewing options in the interface, in the choice of medium (MTurk or other) and explaining choices, etc.

Provides an insightful and practical introduction to crowdsourcing as a means of rapidly processing speech data. Addresses important aspects of this new technique that should be mastered before attempting a crowdsourcing application. Offers speech researchers the hope that they can spend much less time dealing with the data gathering/annotation bottleneck, leaving them to focus on the scientific issues. Readers will directly benefit from the books successful examples of how crowd- sourcing was implemented for speech processing, discussions of interface and processing choices that worked and choices that didnt, and guidelines on how to play and record speech over the internet, how to design tasks, and how to assess workers.

Essential reading for researchers and practitioners in speech research groups involved in speech processing

List of Contributors

xiii

Preface

1 An Overview

(7)

Maxine Eskenazi

1.1 Origins of Crowdsourcing

(1)

1.2 Operational Definition of Crowdsourcing

(1)

1.3 Functional Definition of Crowdsourcing

(1)

1.4 Some Issues

(2)

1.5 Some Terminology

(1)

1.6 Acknowledgments

(2)

References

(2)

2 The Basics

(29)

Maxine Eskenazi

2.1 An Overview of the Literature on Crowdsourcing for Speech Processing

(6)

2.1.1 Evolution of the Use of Crowdsourcing for Speech

(1)

2.1.2 Geographic Locations of Crowdsourcing for Speech

(2)

2.1.3 Specific Areas of Research

(2)

2.2 Alternative Solutions

(1)

2.3 Some Ready-Made Platforms for Crowdsourcing

(2)

2.4 Making Task Creation Easier

(1)

2.5 Getting Down to Brass Tacks

(12)

2.5.1 Hearing and Being Heard over the Web

(2)

2.5.2 Prequalification

(1)

2.5.3 Native Language of the Workers

(1)

2.5.4 Payment

(3)

2.5.5 Choice of Platform in the Literature

(2)

2.5.6 The Complexity of the Task

(2)

2.6 Quality Control

(3)

2.6.1 Was That Worker a Bot?

(1)

2.6.2 Quality Control in the Literature

(3)

2.7 Judging the Quality of the Literature

(1)

2.8 Some Quick Tips

(1)

2.9 Acknowledgments

(4)

References

(2)

Further reading

(2)

3 Collecting Speech from Crowds

(35)

Ian McGraw

3.1 A Short History of Speech Collection

(5)

3.1.1 Speech Corpora

(2)

3.1.2 Spoken Language Systems

(1)

3.1.3 User-Configured Recording Environments

(2)

3.2 Technology for Web-Based Audio Collection

(6)

3.2.1 Silverlight

(1)

3.2.2 Java

(1)

3.2.3 Flash

(2)

3.2.4 HTML and JavaScript

(1)

3.3 Example: WAMI Recorder

(3)

3.3.1 The JavaScript API

(2)

3.3.2 Audio Formats

(1)

3.4 Example: The WAMI Server

(7)

3.4.1 PHP Script

(2)

3.4.2 Google App Engine

(3)

3.4.3 Server Configuration Details

(2)

3.5 Example: Speech Collection on Amazon Mechanical Turk

(6)

3.5.1 Server Setup

(1)

3.5.2 Deploying to Amazon Mechanical Turk

(3)

3.5.3 The Command-Line Interface

(1)

3.6 Using the Platform Purely for Payment

(2)

3.7 Advanced Methods of Crowdsourced Audio Collection

(2)

3.7.1 Collecting Dialog Interactions

(1)

3.7.2 Human Computation

(1)

3.8 Summary

(1)

3.9 Acknowledgments

(3)

References

(2)

4 Crowdsourcing for Speech Transcription

(34)

Gabriel Parent

4.1 Introduction

(1)

4.1.1 Terminology

(1)

4.2 Transcribing Speech

(7)

4.2.1 The Need for Speech Transcription

(1)

4.2.2 Quantifying Speech Transcription

(3)

4.2.3 Brief History

(1)

4.2.4 Is Crowdsourcing Well Suited to My Needs?

(1)

4.3 Preparing the Data

(3)

4.3.1 Preparing the Audio Clips

(1)

4.3.2 Preprocessing the Data with a Speech Recognizer

(1)

4.3.3 Creating a Gold-Standard Dataset

(1)

4.4 Setting Up the Task

(8)

4.4.1 Creating Your Task with the Platform Template Editor

(2)

4.4.2 Creating Your Task on Your Own Server

(2)

4.4.3 Instruction Design

(2)

4.4.4 Know the Workers

(2)

4.4.5 Game Interface

(1)

4.5 Submitting the Open Call

(4)

4.5.1 Payment

(1)

4.5.2 Number of Distinct Judgments

(2)

4.6 Quality Control

(7)

4.6.1 Normalization

(1)

4.6.2 Unsupervised Filters

(3)

4.6.3 Supervised Filters

(1)

4.6.4 Aggregation Techniques

100

(1)

4.6.5 Quality Control Using Multiple Passes

101

(1)

4.7 Conclusion

102

(1)

4.8 Acknowledgments

103

(3)

References

103

(3)

5 How to Control and Utilize Crowd-Collected Speech

106

(31)

Ian McGraw

Joseph Polifroni

5.1 Read Speech

107

(4)

5.1.1 Collection Procedure

107

(1)

5.1.2 Corpus Overview

108

(3)

5.2 Multimodal Dialog Interactions

111

(9)

5.2.1 System Design

111

(1)

5.2.2 Scenario Creation

111

(1)

5.2.3 Data Collection

112

(3)

5.2.4 Data Transcription

115

(3)

5.2.5 Data Analysis

118

(2)

5.3 Games for Speech Collection

120

(1)

5.4 Quizlet

121

(2)

5.5 Voice Race

123

(6)

5.5.1 Self-Transcribed Data

124

(1)

5.5.2 Simplified Crowdsourced Transcription

124

(1)

5.5.3 Data Analysis

125

(1)

5.5.4 Human Transcription

126

(1)

5.5.5 Automatic Transcription

127

(1)

5.5.6 Self-Supervised Acoustic Model Adaptation

127

(2)

5.6 Voice Scatter

129

(6)

5.6.1 Corpus Overview

130

(1)

5.6.2 Crowdsourced Transcription

131

(1)

5.6.3 Filtering for Accurate Hypotheses

132

(1)

5.6.4 Self-Supervised Acoustic Model Adaptation

133

(2)

5.7 Summary

135

(1)

5.8 Acknowledgments

135

(2)

References

136

(1)

6 Crowdsourcing in Speech Perception

137

(36)

Martin Cooke

Jon Barker

Maria Luisa

Garcia Lecumberri

6.1 Introduction

137

(1)

6.2 Previous Use of Crowdsourcing in Speech and Hearing

138

(2)

6.3 Challenges

140

(5)

6.3.1 Control of the Environment

140

(1)

6.3.2 Participants

141

(3)

6.3.3 Stimuli

144

(1)

6.4 Tasks

145

(4)

6.4.1 Speech Intelligibility, Quality and Naturalness

145

(1)

6.4.2 Accent Evaluation

146

(1)

6.4.3 Perceptual Salience and Listener Acuity

147

(1)

6.4.4 Phonological Systems

147

(2)

6.5 BigListen: A Case Study in the Use of Crowdsourcing to Identify Words in Noise

149

(18)

6.5.1 The Problem

149

(1)

6.5.2 Speech and Noise Tokens

150

(1)

6.5.3 The Client-Side Experience

150

(1)

6.5.4 Technical Architecture

151

(2)

6.5.5 Respondents

153

(5)

6.5.6 Analysis of Responses

158

(8)

6.5.7 Lessons from the BigListen Crowdsourcing Test

166

(1)

6.6 Issues for Further Exploration

167

(2)

6.7 Conclusions

169

(4)

References

169

(4)

7 Crowdsourced Assessment of Speech Synthesis

173

(44)

Sabine Buchholz

Javier Latorre

Kayoko Yanagisawa

7.1 Introduction

173

(1)

7.2 Human Assessment of TTS

174

(3)

7.3 Crowdsourcing for TTS: What Worked and What Did Not

177

(16)

7.3.1 Related Work: Crowdsourced Listening Tests

177

(1)

7.3.2 Problem and Solutions: Audio on the Web

178

(2)

7.3.3 Problem and Solution: Test of Significance

180

(3)

7.3.4 What Assessment Types Worked

183

(3)

7.3.5 What Did Not Work

186

(4)

7.3.6 Problem and Solutions: Recruiting Native Speakers of Various Languages

190

(3)

7.3.7 Conclusion

193

(1)

7.4 Related Work: Detecting and Preventing Spamming

193

(2)

7.5 Our Experiences: Detecting and Preventing Spamming

195

(17)

7.5.1 Optional Playback Interface

196

(5)

7.5.2 Investigating the Metrics Further: Mandatory Playback Interface

201

(9)

7.5.3 The Prosecutor's Fallacy

210

(2)

7.6 Conclusions and Discussion

212

(5)

References

214

(3)

8 Crowdsourcing for Spoken Dialog System Evaluation

217

(24)

Zhaojun Yang

Gina-Anne Levow

Helen Meng

8.1 Introduction

217

(3)

8.2 Prior Work on Crowdsourcing: Dialog and Speech Assessment

220

(1)

8.2.1 Prior Work on Crowdsourcing for Dialog Systems

220

(1)

8.2.2 Prior Work on Crowdsourcing for Speech Assessment

220

(1)

8.3 Prior Work in SDS Evaluation

221

(4)

8.3.1 Subjective User Judgments

221

(1)

8.3.2 Interaction Metrics

222

(1)

8.3.3 PARADISE Framework

223

(1)

8.3.4 Alternative Approach to Crowdsourcing for SDS Evaluation

224

(1)

8.4 Experimental Corpus and Automatic Dialog Classification

225

(1)

8.5 Collecting User Judgments on Spoken Dialogs with Crowdsourcing

226

(4)

8.5.1 Tasks for Dialog Evaluation

227

(2)

8.5.2 Tasks for Interannotator Agreement

229

(1)

8.5.3 Approval of Ratings

229

(1)

8.6 Collected Data and Analysis

230

(8)

8.6.1 Approval Rates and Comments from Workers

230

(1)

8.6.2 Consistency between Automatic Dialog Classification and Manual Ratings

231

(2)

8.6.3 Interannotator Agreement among Workers

233

(2)

8.6.4 Interannotator Agreement on the Let's Go! System

235

(1)

8.6.5 Consistency between Expert and Nonexpert Annotations

236

(2)

8.7 Conclusions and Future Work

238

(1)

8.8 Acknowledgments

238

(3)

References

239

(2)

9 Interfaces for Crowdsourcing Platforms

241

(39)

Christoph Draxler

9.1 Introduction

241

(1)

9.2 Technology

242

(11)

9.2.1 TinyTask Web Page

242

(1)

9.2.2 World Wide Web

242

(1)

9.2.3 Hypertext Transfer Protocol

243

(1)

9.2.4 Hypertext Markup Language

244

(2)

9.2.5 Cascading Style Sheets

246

(1)

9.2.6 JavaScript

246

(2)

9.2.7 JavaScript Object Notation

248

(1)

9.2.8 Extensible Markup Language

248

(1)

9.2.9 Asynchronous JavaScript and XML

249

(1)

9.2.10 Flash

250

(1)

9.2.11 SOAP and REST

251

(1)

9.2.12 Section Summary

252

(1)

9.3 Crowdsourcing Platforms

253

(8)

9.3.1 Crowdsourcing Platform Workflow

253

(3)

9.3.2 Amazon Mechanical Turk

256

(3)

9.3.3 CrowdFlower

259

(1)

9.3.4 Clickworker

259

(1)

9.3.5 WikiSpeech

260

(1)

9.4 Interfaces to Crowdsourcing Platforms

261

(17)

9.4.1 Implementing Tasks Using a GUI on CrowdFlower Platform

262

(2)

9.4.2 Implementing Tasks Using the Command-Line Interface in MTurk

264

(6)

9.4.3 Implementing a Task Using a RESTful Web Service in Clickworker

270

(1)

9.4.4 Defining Tasks via Configuration Files in WikiSpeech

270

(8)

9.5 Summary

278

(2)

References

278

(2)

10 Crowdsourcing for Industrial Spoken Dialog Systems

280

(23)

David Suendermann

Roberto Pieraccini

10.1 Introduction

280

(3)

10.1.1 Industry's Willful Ignorance

280

(1)

10.1.2 Crowdsourcing in Industrial Speech Applications

281

(1)

10.1.3 Public versus Private Crowd

282

(1)

10.2 Architecture

283

(4)

10.3 Transcription

287

(3)

10.4 Semantic Annotation

290

(6)

10.5 Subjective Evaluation of Spoken Dialog Systems

296

(4)

10.6 Conclusion

300

(3)

References

300

(3)

11 Economic and Ethical Background of Crowdsourcing for Speech

303

(32)

Gilles Adda

Joseph J. Mariani

Laurent Besacier

Hadrien Gelas

11.1 Introduction

303

(1)

11.2 The Crowdsourcing Fauna

304

(3)

11.2.1 The Crowdsourcing Services Landscape

304

(2)

11.2.2 Who Are the Workers?

306

(1)

11.2.3 Ethics and Economics in Crowdsourcing: How to Proceed?

307

(1)

11.3 Economic and Ethical Issues

307

(9)

11.3.1 What Are the Problems for the Workers?

309

(1)

11.3.2 Crowdsourcing and Labor Laws

310

(4)

11.3.3 Which Economic Model Is Sustainable for Crowdsourcing?

314

(2)

11.4 Under-Resourced Languages: A Case Study

316

(6)

11.4.1 Under-Resourced Languages Definition and Issues

317

(1)

11.4.2 Collecting Annotated Speech for African Languages Using Crowdsourcing

317

(1)

11.4.3 Experiment Description

317

(1)

11.4.4 Results

318

(3)

11.4.5 Discussion and Lessons Learned

321

(1)

11.5 Toward Ethically Produced Language Resources

322

(8)

11.5.1 Defining a Fair Compensation for Work Done

323

(3)

11.5.2 Impact of Crowdsourcing on the Ecology of Linguistic Resources

326

(1)

11.5.3 Defining an Ethical Framework: Some Solutions

326

(4)

11.6 Conclusion

330

(5)

Disclaimer

331

(1)

References

331

(4)

Index

335

Maxine Eskenazi, Carnegie Mellon University, USA Dr. Eskenazi is Principal Systems Scientist at the Language Technologies Institute, Carnegie Mellon University, USA. She has authored over 100 scientific papers in the areas of computer assisted language learning and speech and spoken dialog systems. Her work has produced such systems as the Let's Go spoken dialog system and the REAP vocabulary tutor. She is also the founder and CTO of the Carnegie Speech Company.

Gina-Anne Levow, University of Washington, USA Dr. Levow is currently an Assistant Professor in the Department of Linguistics, University of Washington, USA. Prior to joining the faculty at the University of Washington, she served on the faculty at the University of Chicago in the Department of Computer Science and as a Research Fellow at the University of Manchester, UK. She served on the Editorial Board of Computational Linguistics and as Associate Editor of ACM Transactions on Asian Language Processing.

Helen Meng, The Chinese University of Hong Kong, Hong Kong Dr. Meng is Founder and Director of the Human-Computer Communications Laboratory at The Chinese University of Hong Kong, and is also the Founder and Co-Director of the Microsoft-CUHK Joint Laboratory for Human-Centric Computing and Interface Technologies, which was conferred the national status of the Ministry of Education of China (MoE) Key Laboratory in 2008. Prof. Meng also served as an Associate Dean (Research) of the Faculty of Engineering from 2006 to 2010. She serves as Editor-in-Chief of the IEEE Transactions on Audio, Speech and Language Processing.

Gabriel Parent, Amazon.com, USA Gabriel Parent is a Software Development Engineer at Amazon.com working on solving natural language related problems. His main research focuses were human-computer interaction through spoken dialog systems and crowdsourcing.

David Suendermann, Baden-Wuerttemberg Cooperative State University, Germany Dr. Sundermann is currently full Professor of Computer Science at the Baden-Wuerttemberg Cooperative State University, Stuttgart, Germany. He is also the Principal Speech Scientist of SpeechCycle, New York, USA which has been recognized by Deloitte as a "Technology Fast 500" company based on revenue growth. He has authored more than 70 publications and patents, including a book and six book chapters.

Crowdsourcing for Speech Processing: Applications to Data Collection, Transcription and Assessment [Kietas viršelis]

Paskyra ir nustatymai

Paieška

Ieškoti duomenų bazėje

Patikslinti paiešką

Temos Temos anglų kalba

Pasirinkti pirkinių krepšelį