Tiktokenizer

cl100k_base tokenization visualization tool

Example Texts

Click on an example below to see tokenization results:

About cl100k_base Tokenization

cl100k_base is the tokenizer used by GPT-4 and GPT-3.5 Turbo. It has a vocabulary of 100,000 tokens and efficiently handles multiple languages, special characters, and whitespace with improved accuracy compared to earlier tokenizers.

Token Usage Tips

Shorter prompts use fewer tokens and can reduce API costs
Different languages tokenize differently - some languages use more tokens per word than others
Special characters and whitespace count as tokens
Understanding tokenization can help you optimize your prompts for better results

Built by 1000ai | Home

Input Text

Tokenization Results

Example Texts

About cl100k_base Tokenization

Token Usage Tips