GPT tokenizer

Tokens are the basic unit that generative AI models use to compute the length of a text. They are groups of characters, which sometimes align with words, but not always. In particular, it depends on the number of characters and includes punctuation signs or emojis. This is why the token count is usually different from the word count.
Use the tool provided below to explore how a specific piece of text would be tokenized and the overall count of words, characters and tokens.
Model:
gpt-4oo200k_base
Text:
Tokenized text:
Text
Token IDs

Search tokens

Each row in the table below contains a token ID and its corresponding associated token.
0 tokens
Token IDs
Token

Use LLMs directly in MS Excel or G Sheets

Bring ChatGPT and other AIs directly into spreadsheets to automate bulk text processing and assist you with formulas

Use LLMs directly in MS Word or G Docs

Turn ChatGPT and other AIs into your personal writing assistants available directly in documents