Chapter 1 "How do Transformers work"?

Open pogonalog opened this issue 8 months ago • 1 comments

January 2022: [InstructGPT](https://huggingface.co/papers/2203.02155), a version of GPT-3 that was trained to follow instructions better This list is far from comprehensive, and is just meant to highlight a few of the different kinds of Transformer models. Broadly, they can be grouped into three categories:

This list is far from comprehensive, and is just meant to highlight a few of the different kinds of Transformer models. Broadly, they can be grouped into three categories: in huggingface/course/blob/main/chapters/en/chapter1/4.mdx appears to be in the wrong area (in the middle of the list of influential models, where it should presumably be after the list of models.)

...
November 2024: [SmolLM2](https://huggingface.co/papers/2502.02737), a state-of-the-art small language model (135 million to 1.7 billion parameters) that achieves impressive performance despite its compact size, and unlocking new possibilities for mobile and edge devices.

GPT-like (also called auto-regressive Transformer models)

BERT-like (also called auto-encoding Transformer models)

T5-like (also called sequence-to-sequence Transformer models)

May 29 '25 05:05 pogonalog

2022年1月: InstructGPT, 这是一种经过训练的GPT-3 版本，这份清单并非很全面，最好遵从介绍说明，以及这只是重点介绍一些不同类型的Transformer 模型。总的来说，他们可以归纳为三种类型。

May 29 '25 13:05 xiaohongzsy