See all models and checkpoints
๐ŸŽ DistilGPT-2 model checkpoint

The student of the now ubiquitous GPT-2 does not come short of its teacherโ€™s expectations. Obtained by distillation, DistilGPT-2 weighs 37% less, and is twice as fast as its OpenAI counterpart, while keeping the same generative power. Runs smoothly on an iPhone 7. The dawn of lightweight generative transformers? ๐Ÿคฏ

From the paper: DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter by Victor Sanh, Lysandre Debut, Julien Chaumond and Thomas Wolf. The same method was applied to distill GPT-2, and a Medium blogpost describes the process in detail.

Start writing