How to use OpenAI GPT models temperature

What is model temperature?

Temperature is a key parameter of OpenAI ChatGPT, GPT-3.5, GPT-4 and other models that governs the randomness and thus the creativity of the responses. Other AI providers like Anthropic, Google and Mistral use it.

It is usually a number between 0 and 1. A temperature of 0 means the responses will be very straightforward and predictable, almost deterministic (meaning you almost always get the same response to a given prompt) A temperature closer to 1 means the responses can vary wildly.

Video tutorial

How exactly does the temperature work?

Under the hood, large language models try to predict the next best word given a prompt. One word at a time. They assign a probability to each word in their vocabulary, and then picks a word among those.

A temperature of 0 means roughly that the model will always select the highest probability word. A higher temperature means that the model might select a word with slightly lower probability, leading to more variation, randomness and creativity. A very high temperature therefore increases the risk of “hallucination”, meaning that the AI can start selecting words that will make no sense or be offtopic.

Rules of thumb for temperature choice

Your choice of temperature should depend on your task.

For non-creative tasks (translation, categorization extraction, standardization, format conversion, grammar fixes) and strict adherence to instructions, prefer a temperature of 0 or up to 0.3. For more creative tasks, you should juice the temperature higher, closer to 0.5. If you want GPT to be highly creative (for marketing or advertising copy for instance), consider values between 0.7 and 1 but be careful and check results for hallucinations.

Update since this article was written in early 2023

In early 2023, the only available model was OpenAI’s text-davinci-003 which was a “base” completion model, meaning that it was trained mostly to continue a sequence of words, unlike the subsequent models that followed (gpt-3.5-turbo, gpt-4, gpt-4-turbo, gpt-4o, claude, gemini, mistral…) that were trained to mostly answer questions and were dubbed “chat” models.

These chat models are much less creative and random than base models, and thus temperature plays a less important role in those.