mirror of
https://github.com/Mintplex-Labs/tiktoken.git
synced 2026-07-01 18:48:04 -04:00
1f098ca4d7d84025e94f2e84795fad713f8e6f3f
⏳ tiktoken
tiktoken is a fast tokeniser.
import tiktoken
enc = tiktoken.get_encoding("gpt2")
print(enc.encode("hello world"))
The open source version of tiktoken can be installed from PyPI:
pip install tiktoken
The tokeniser API is documented in tiktoken/core.py.
Performance
tiktoken is between 3-6x faster than a comparable open source tokeniser:
Performance measured on 1GB of text using the GPT-2 tokeniser, using GPT2TokenizerFast from
tokenizers==0.13.2 and transformers==4.24.0.
Languages
Python
34.4%
TypeScript
33.6%
Rust
31.9%