Tiktokenizer

cl100k_base tokenization visualization tool

Input Text

Tokenization Results

Example Texts

Click on an example below to see tokenization results:

About cl100k_base Tokenization

cl100k_base is the tokenizer used by GPT-4 and GPT-3.5 Turbo. It has a vocabulary of 100,000 tokens and efficiently handles multiple languages, special characters, and whitespace with improved accuracy compared to earlier tokenizers.

Token Usage Tips

  • Shorter prompts use fewer tokens and can reduce API costs
  • Different languages tokenize differently - some languages use more tokens per word than others
  • Special characters and whitespace count as tokens
  • Understanding tokenization can help you optimize your prompts for better results

Built by 1000ai | Home