Compare OpenAI, Anthropic, Gemini model APIs

Easily compare the most popular LLM APIs across tokens limit, price, rate limits, latency, language and training data cutoff.

Provider
OpenAI
OpenAI
OpenAI
OpenAI
OpenAI
Anthropic
Anthropic
Google
Google
Mistral
Model
gpt-4o
gpt-4o-mini
gpt-4-turbo
gpt-4
gpt-3.5-turbo
claude-3.5-sonnet
claude-3-haiku
gemini-1.5-flash
gemini-1.5-pro
mistral-large
Max tokens
128,000
128,000
128,000
8,192
16,384
200,000
200,000
Max input tokens: 1048576. Max output tokens: 8192. Input price over 128K tokens: $0.15 / 1M tokens. Output price over 128K tokens: $0.60 / 1M tokens.
1,048,576
Max input tokens: 1048576. Max output tokens: 8192. Input price over 128K tokens: $7 / 1M tokens. Output price over 128K tokens: $21 / 1M tokens.
1,048,576
128,000
Input price per 1M tokens
$5.00
$0.15
$10.00
$30.00
$0.50
$3.00
$0.25
$0.07
$3.50
$3.00
Output price per 1M tokens
$15.00
$0.60
$30.00
$60.00
$1.50
$15.00
$1.25
$0.30
$10.50
$9.00
Default requests per minute
500
500
500
500
3,500
50
50
1,000
360
1,800
Default tokens per minute
30,000
200,000
30,000
10,000
200,000
40,000
50,000
4,000,000
4,000,000
50,000,000
Avg latency in the last 48h*
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
Training data cutoff
Oct 2023
Oct 2023
Dec 2023
Sep 2021
Sep 2021
Apr 2024
Aug 2023
Nov 2023
Nov 2023
Jan 2024
Languages available
All
All
All
All
All
All
All
All
All
All
*The average latency is calculated by generating a random 512 tokens every 10 minutes across 3 different locations. Check out our latency tracker.

FAQ

What is "Max tokens"?

The maximum number of tokens that the model can process in a single request. This limit includes both the input (prompt) and the output (completion) tokens.

What is "Input price per 1M tokens"?

The cost of processing 1 million input (prompt) tokens using this model.

What is "Output price per 1M tokens"?

The cost of generating 1 million output (completion) tokens using this model.

What is "Default requests per minute"?

The maximum number of requests per minute that can be made within one minute to this API with a default rate limit. Some providers offer the option to increase this limit upon request.

What is "Default tokens per minute"?

The maximum number of tokens (input+output) that can be processed within one minute by this model API. Some providers offer the option to increase this limit upon request.

What is "Avg latency in the last 48h"?

The average response time of the model API, measured by generating a maximum of 512 tokens at a temperature of 0.7 every 10 minutes in 3 locations, during the last 48h. The maximum response time is capped at 60 seconds but could be higher in reality.

What is "Training data cutoff"?

The training data cutoff date is the date of the latest knowledge the model has. The model cannot know anything that happened or was published after this date.

What is "Languages available"?

The languages that the model can be prompted in and generate text in.