Acknowledgements |
|
9 | (2) |
|
|
11 | (2) |
|
|
13 | (2) |
|
|
15 | (2) |
|
|
17 | (8) |
|
1.1 Background to the study |
|
|
17 | (3) |
|
1.2 Statement of the problem |
|
|
20 | (1) |
|
|
21 | (1) |
|
|
22 | (1) |
|
1.5 Structure of the book |
|
|
23 | (2) |
|
2 Performance assessment of second language speaking |
|
|
25 | (36) |
|
2.1 Introduction to performance assessment |
|
|
26 | (3) |
|
2.2 The speaking construct in performance assessment |
|
|
29 | (20) |
|
2.2.1 Pre-communicative approaches |
|
|
29 | (2) |
|
2.2.2 Models of communicative competence |
|
|
31 | (7) |
|
2.2.3 Approaches to speaking |
|
|
38 | (11) |
|
2.3 Models of performance assessment |
|
|
49 | (8) |
|
|
51 | (1) |
|
2.3.2 Skehan (1998, 2001) |
|
|
52 | (2) |
|
|
54 | (1) |
|
|
55 | (2) |
|
2.4 Rating scales in performance assessment |
|
|
57 | (4) |
|
|
61 | (18) |
|
3.1 General characteristics |
|
|
61 | (2) |
|
3.2 Types of rating scales |
|
|
63 | (2) |
|
3.3 Theoretical and methodological concepts in rating scale development |
|
|
65 | (9) |
|
3.3.1 Intuitive approaches |
|
|
66 | (1) |
|
3.3.2 Theory-based approaches |
|
|
67 | (1) |
|
3.3.3 Empirical approaches |
|
|
68 | (2) |
|
3.3.4 Triangulation of approaches |
|
|
70 | (4) |
|
3.4 Controversy over rating scales |
|
|
74 | (5) |
|
4 Rating scale validation |
|
|
79 | (12) |
|
4.1 Validity and validity evidence |
|
|
79 | (4) |
|
4.2 Rasch-based rating scale validation |
|
|
83 | (3) |
|
|
86 | (2) |
|
|
88 | (3) |
|
|
91 | (38) |
|
5.1 The development process |
|
|
91 | (9) |
|
|
91 | (4) |
|
|
95 | (5) |
|
|
100 | (22) |
|
5.2.1 Lexico-grammatical resources and fluency |
|
|
100 | (8) |
|
5.2.2 Pronunciation and vocal impact |
|
|
108 | (2) |
|
5.2.3 Structure and content |
|
|
110 | (5) |
|
5.2.4 Genre-specific presentation skills: formal presentations |
|
|
115 | (1) |
|
5.2.5 Content and relevance (interaction) |
|
|
116 | (3) |
|
|
119 | (3) |
|
5.3 Descriptor formulation |
|
|
122 | (2) |
|
5.4 ELTT speaking ability |
|
|
124 | (3) |
|
|
127 | (2) |
|
|
129 | (24) |
|
6.1 Validating the ELTT scales |
|
|
129 | (4) |
|
|
133 | (1) |
|
|
134 | (2) |
|
|
134 | (1) |
|
6.3.2 Instruments and procedures |
|
|
134 | (2) |
|
|
136 | (2) |
|
6.5 Results and discussion |
|
|
138 | (8) |
|
6.5.1 Inter-rater reliability |
|
|
138 | (2) |
|
6.5.2 Match between intended and empirical scale |
|
|
140 | (2) |
|
6.5.3 Descriptor analysis |
|
|
142 | (4) |
|
6.6 Preliminary conclusions |
|
|
146 | (5) |
|
|
146 | (1) |
|
6.6.2 Specificity of proficiency levels |
|
|
147 | (1) |
|
|
148 | (3) |
|
6.6.4 Recommendations for scale revision |
|
|
151 | (1) |
|
|
151 | (2) |
|
|
153 | (52) |
|
|
153 | (1) |
|
|
154 | (11) |
|
|
154 | (3) |
|
7.2.2 Specification of a measurement model and FACETS output |
|
|
157 | (1) |
|
7.2.3 Measurement quality control |
|
|
158 | (3) |
|
7.2.4 Descriptor analysis |
|
|
161 | (4) |
|
7.3 Results and discussion |
|
|
165 | (38) |
|
7.3.1 Measurement quality control |
|
|
165 | (5) |
|
7.3.2 Dimensionality of descriptors |
|
|
170 | (6) |
|
7.3.3 The proficiency continuum |
|
|
176 | (8) |
|
7.3.4 Cut-off points and content integrity |
|
|
184 | (19) |
|
|
203 | (2) |
|
8 Descriptor-performance matching |
|
|
205 | (58) |
|
|
205 | (1) |
|
|
206 | (13) |
|
|
206 | (1) |
|
8.2.2 Instruments and procedures |
|
|
207 | (11) |
|
|
218 | (1) |
|
|
219 | (1) |
|
8.3.1 Specification of a measurement model |
|
|
219 | (1) |
|
8.3.2 Measurement quality control |
|
|
219 | (1) |
|
8.4 Results and discussion |
|
|
220 | (34) |
|
8.4.1 Measurement quality control |
|
|
220 | (5) |
|
8.4.2 Dimensionality of descriptors |
|
|
225 | (8) |
|
8.4.3 The proficiency continuum |
|
|
233 | (3) |
|
8.4.4 Cut-off points and content integrity |
|
|
236 | (18) |
|
|
254 | (3) |
|
8.6 Comparison of methods |
|
|
257 | (6) |
|
9 Revision of the ELTT scales |
|
|
263 | (36) |
|
9.1 Establishing a quality hierarchy of descriptor units |
|
|
264 | (7) |
|
9.2 The quality of descriptor units |
|
|
271 | (7) |
|
9.3 Constructing the revised scales |
|
|
278 | (7) |
|
9.4 Common points of reference |
|
|
285 | (5) |
|
9.5 The modified versions of the ELTT scales |
|
|
290 | (9) |
|
|
299 | (30) |
|
|
300 | (4) |
|
10.2 Theoretical implications |
|
|
304 | (10) |
|
10.3 Practical recommendations |
|
|
314 | (7) |
|
10.4 Limitations of the study |
|
|
321 | (4) |
|
10.5 Suggestions for further research |
|
|
325 | (2) |
|
10.6 Concluding statement |
|
|
327 | (2) |
|
|
329 | (20) |
|
|
349 | |
|
12.1 Appendix 1: Original ELTT rating scales |
|
|
349 | (4) |
|
12.2 Appendix 2: Sorting task questionnaire |
|
|
353 | (6) |
|
12.3 Appendix 3: Consensual scales based on descriptor sorting |
|
|
359 | (3) |
|
12.4 Appendix 4: Descriptor unit measurement report (descriptor calibration) |
|
|
362 | (7) |
|
12.5 Appendix 5: All facet vertical ruler (sorting task) |
|
|
369 | (1) |
|
12.6 Appendix 6: Speaking tasks |
|
|
370 | (2) |
|
12.7 Appendix 7: Rating sheets |
|
|
372 | (11) |
|
12.8 Appendix 8: Rater guidelines |
|
|
383 | (3) |
|
12.9 Appendix 9: Student measurement report (descriptor-performance matching) |
|
|
386 | (2) |
|
12.10 Appendix 10: All facets vertical ruler (descriptor-performance matching) |
|
|
388 | (1) |
|
12.11 Appendix 11: Descriptor unit measurement report (descriptor-performance matching) |
|
|
389 | |