GPT tokenizer playgroundTokens are the basic unit that generative AI models use to compute the length of a text. They are groups of characters, which sometimes align with words, but not always. In particular, it depends on the number of characters and includes punctuation signs or emojis. This is why the token count is usually different from the word count.
Use the tool provided below to explore how a specific piece of text would be tokenized and the overall count of words, characters and tokens.
Each row in the table below contains a token ID and its corresponding associated token.