Skip to main content

Set max response tokens

Set a cut-off limit for your responses measured in tokens. If the reponse is larger than this limit, it will be truncated. Helps control cost and speed.


Tokens can be thought of as pieces of words. During processing, the language model breaks down both the input (prompt) and the output (completion) texts into smaller units called tokens. Tokens generally correspond to ~4 characters of common English text. So 100 tokens are approximately worth 75 words. Learn more with our token guide.

Token limitToken limit is the maximum total number of tokens that can be used in both the input (prompt) and the response (completion) when interacting with a language model.
You have opened a Google document and selected Extensions > GPT for Sheets and Docs > Launch.
  1. In the GPT for Docs sidebar, click Model settings.

    The field indicates the current max response size of your model.
  2. Type a value for Max response tokens.
    Rule: max response tokens + input tokens ≤ token limit.
    This means that when you set Max response tokens, you must make sure there is enough space for your input. Your input includes your prompt, custom instructions, context, and elements sent by GPT for Docs with your input (about 100 extra tokens). You can use OpenAI’s official tokenizer to estimate the number of tokens you need in your response.

You can now submit a prompt in the current document with the new maximum response size. Once a prompt is submitted, the maximum response size is saved along with all Model settings values, and is used for all prompts executed from Google documents.

What's next

Select other settings to customize how the language model operates.