Foreword |
|
xi | |
Preface |
|
xv | |
1 Hello Transformers |
|
1 | (20) |
|
The Encoder-Decoder Framework |
|
|
2 | (2) |
|
|
4 | (2) |
|
|
6 | (3) |
|
Hugging Face Transformers: Bridging the Gap |
|
|
9 | (1) |
|
A Tour of Transformer Applications |
|
|
10 | (5) |
|
|
10 | (1) |
|
|
11 | (1) |
|
|
12 | (1) |
|
|
13 | (1) |
|
|
13 | (1) |
|
|
14 | (1) |
|
The Hugging Face Ecosystem |
|
|
15 | (4) |
|
|
16 | (1) |
|
|
17 | (1) |
|
|
18 | (1) |
|
|
18 | (1) |
|
Main Challenges with Transformers |
|
|
19 | (1) |
|
|
20 | (1) |
2 Text Classification |
|
21 | (36) |
|
|
22 | (7) |
|
A First Look at Hugging Face Datasets |
|
|
23 | (3) |
|
From Datasets to DataFrames |
|
|
26 | (1) |
|
Looking at the Class Distribution |
|
|
27 | (1) |
|
|
28 | (1) |
|
|
29 | (7) |
|
|
29 | (2) |
|
|
31 | (2) |
|
|
33 | (2) |
|
Tokenizing the Whole Dataset |
|
|
35 | (1) |
|
Training a Text Classifier |
|
|
36 | (18) |
|
Transformers as Feature Extractors |
|
|
38 | (7) |
|
|
45 | (9) |
|
|
54 | (3) |
3 Transformer Anatomy |
|
57 | (30) |
|
The Transformer Architecture |
|
|
57 | (3) |
|
|
60 | (16) |
|
|
61 | (9) |
|
|
70 | (1) |
|
Adding Layer Normalization |
|
|
71 | (2) |
|
|
73 | (2) |
|
Adding a Classification Head |
|
|
75 | (1) |
|
|
76 | (2) |
|
|
78 | (7) |
|
The Transformer Tree of Life |
|
|
78 | (1) |
|
|
79 | (3) |
|
|
82 | (1) |
|
The Encoder-Decoder Branch |
|
|
83 | (2) |
|
|
85 | (2) |
4 Multilingual Named Entity Recognition |
|
87 | (36) |
|
|
88 | (4) |
|
Multilingual Transformers |
|
|
92 | (1) |
|
A Closer Look at Tokenization |
|
|
93 | (3) |
|
|
94 | (1) |
|
The SentencePiece Tokenizer |
|
|
95 | (1) |
|
Transformers for Named Entity Recognition |
|
|
96 | (2) |
|
The Anatomy of the Transformers Model Class |
|
|
98 | (5) |
|
|
98 | (1) |
|
Creating a Custom Model for Token Classification |
|
|
99 | (2) |
|
|
101 | (2) |
|
|
103 | (2) |
|
|
105 | (1) |
|
|
106 | (2) |
|
|
108 | (7) |
|
|
115 | (6) |
|
When Does Zero-Shot Transfer Make Sense? |
|
|
116 | (2) |
|
Fine-Tuning on Multiple Languages at Once |
|
|
118 | (3) |
|
Interacting with Model Widgets |
|
|
121 | (1) |
|
|
122 | (1) |
5 Text Generation |
|
123 | (18) |
|
The Challenge with Generating Coherent Text |
|
|
125 | (2) |
|
|
127 | (3) |
|
|
130 | (4) |
|
|
134 | (2) |
|
Top-k and Nucleus Sampling |
|
|
136 | (4) |
|
Which Decoding Method Is Best? |
|
|
140 | (1) |
|
|
140 | (1) |
6 Summarization |
|
141 | (24) |
|
The CNN/DailyMail Dataset |
|
|
141 | (2) |
|
Text Summarization Pipelines |
|
|
143 | (3) |
|
|
143 | (1) |
|
|
144 | (1) |
|
|
144 | (1) |
|
|
145 | (1) |
|
|
145 | (1) |
|
Comparing Different Summaries |
|
|
146 | (2) |
|
Measuring the Quality of Generated Text |
|
|
148 | (6) |
|
|
148 | (4) |
|
|
152 | (2) |
|
Evaluating PEGASUS on the CNN/DailyMail Dataset |
|
|
154 | (3) |
|
Training a Summarization Model |
|
|
157 | (6) |
|
Evaluating PEGASUS on SAMSum |
|
|
158 | (1) |
|
|
158 | (4) |
|
Generating Dialogue Summaries |
|
|
162 | (1) |
|
|
163 | (2) |
7 Question Answering |
|
165 | (44) |
|
Building a Review-Based QA System |
|
|
166 | (23) |
|
|
167 | (6) |
|
Extracting Answers from Text |
|
|
173 | (8) |
|
Using Haystack to Build a QA Pipeline |
|
|
181 | (8) |
|
Improving Our QA Pipeline |
|
|
189 | (16) |
|
|
189 | (7) |
|
|
196 | (3) |
|
|
199 | (4) |
|
Evaluating the Whole QA Pipeline |
|
|
203 | (2) |
|
Going Beyond Extractive QA |
|
|
205 | (2) |
|
|
207 | (2) |
8 Making Transformers Efficient in Production |
|
209 | (40) |
|
Intent Detection as a Case Study |
|
|
210 | (2) |
|
Creating a Performance Benchmark |
|
|
212 | (5) |
|
Making Models Smaller via Knowledge Distillation |
|
|
217 | (13) |
|
Knowledge Distillation for Fine-Tuning |
|
|
217 | (3) |
|
Knowledge Distillation for Pretraining |
|
|
220 | (1) |
|
Creating a Knowledge Distillation Trainer |
|
|
220 | (2) |
|
Choosing a Good Student Initialization |
|
|
222 | (4) |
|
Finding Good Hyperparameters with Optuna |
|
|
226 | (3) |
|
Benchmarking Our Distilled Model |
|
|
229 | (1) |
|
Making Models Faster with Quantization |
|
|
230 | (6) |
|
Benchmarking Our Quantized Model |
|
|
236 | (1) |
|
Optimizing Inference with ONNX and the ONNX Runtime |
|
|
237 | (6) |
|
Making Models Sparser with Weight Pruning |
|
|
243 | (5) |
|
Sparsity in Deep Neural Networks |
|
|
244 | (1) |
|
|
244 | (4) |
|
|
248 | (1) |
9 Dealing with Few to No Labels |
|
249 | (50) |
|
Building a GitHub Issues Tagger |
|
|
251 | (9) |
|
|
252 | (1) |
|
|
253 | (4) |
|
|
257 | (2) |
|
|
259 | (1) |
|
Implementing a Naive Bayesline |
|
|
260 | (3) |
|
Working with No Labeled Data |
|
|
263 | (8) |
|
Working with a Few Labels |
|
|
271 | (18) |
|
|
271 | (4) |
|
Using Embeddings as a Lookup Table |
|
|
275 | (9) |
|
Fine-Tuning a Vanilla Transformer |
|
|
284 | (4) |
|
In-Context and Few-Shot Learning with Prompts |
|
|
288 | (1) |
|
Leveraging Unlabeled Data |
|
|
289 | (8) |
|
Fine-Tuning a Language Model |
|
|
289 | (4) |
|
|
293 | (2) |
|
|
295 | (2) |
|
|
297 | (2) |
10 Training Transformers from Scratch |
|
299 | (46) |
|
Large Datasets and Where to Find Them |
|
|
300 | (10) |
|
Challenges of Building a Large-Scale Corpus |
|
|
300 | (3) |
|
Building a Custom Code Dataset |
|
|
303 | (3) |
|
Working with Large Datasets |
|
|
306 | (3) |
|
Adding Datasets to the Hugging Face Hub |
|
|
309 | (1) |
|
|
310 | (13) |
|
|
312 | (1) |
|
Measuring Tokenizer Performance |
|
|
312 | (1) |
|
|
313 | (5) |
|
|
318 | (4) |
|
Saving a Custom Tokenizer on the Hub |
|
|
322 | (1) |
|
Training a Model from Scratch |
|
|
323 | (15) |
|
A Tale of Pretraining Objectives |
|
|
323 | (2) |
|
|
325 | (1) |
|
Implementing the Dataloader |
|
|
326 | (4) |
|
Defining the Training Loop |
|
|
330 | (7) |
|
|
337 | (1) |
|
|
338 | (5) |
|
|
343 | (2) |
11 Future Directions |
|
345 | (26) |
|
|
345 | (9) |
|
|
347 | (2) |
|
|
349 | (2) |
|
|
351 | (1) |
|
|
352 | (1) |
|
|
353 | (1) |
|
|
354 | (7) |
|
|
355 | (4) |
|
|
359 | (2) |
|
|
361 | (9) |
|
|
361 | (3) |
|
|
364 | (6) |
|
|
370 | (1) |
Index |
|
371 | |