Compare OpenAI, Azure, Anthropic, Cohere,
Google Palm model APIs

Easily compare the most popular Gen AI LLM APIs across tokens limit, price, rate limits, latency, language and training data cutoff.

Provider
OpenAI
OpenAI
OpenAI
OpenAI
OpenAI
OpenAI
Azure
Azure
Azure
Azure
Azure
Azure
Anthropic
Anthropic
Google PaLM
Google PaLM
Google PaLM
Cohere
Cohere
Cohere
Cohere
Model
gpt-4-turbo (beta)
gpt-4
gpt-4-32k
gpt-3.5-turbo
gpt-3.5-turbo-instruct
text-embedding-ada-002
gpt-4
gpt-4-32k
gpt-3.5-turbo
gpt-3.5-turbo-instruct
gpt-3.5-turbo-16k
text-embedding-ada-002
claude-instant-1
claude-2
text-bison-001
chat-bison-001
textembedding-gecko-001
command-light
command
base-light
base
Max tokens
128,000
8,192
32,768
16,384
4,096
8,191
8,192
32,768
4,096
4,096
16,384
8,191
100,000
100,000
Max input tokens: 8,192. Max output tokens: 1,024.
9,216
Max input tokens: 4,096. Max output tokens: 1,024.
5,120
3,072
4,096
4,096
2,048
2,048
Input price per 1M tokens
$10.00
$30.00
$60.00
$1.00
$1.50
$0.10
$30.00
$60.00
$1.50
$1.50
$3.00
$0.10
$1.63
$11.02
~$2.00
~$2.00
$0.10
$1.50
$1.50
$1.50
$1.50
Output price per 1M tokens
$30.00
$60.00
$120.00
$2.00
$2.00
N/A
$60.00
$120.00
$2.00
$2.00
$4.00
N/A
$5.51
$32.68
~$2.00
~$2.00
N/A
$2.00
$2.00
$2.00
$2.00
Default requests per minute
20
OpenAI automatically raises your RPM limit
once you reach these consumption tiers:
TierRequirementLimit
Free 3 RPM
Tier 1$5 paid500 RPM
Tier 2$50 paid and 7+ days since first payment5,000 RPM
Tier 3$100 paid and 7+ days since first payment5,000 RPM
Tier 4$250 paid and 14+ days since first payment10,000 RPM
Tier 5$1,000 paid and 30+ days since first payment10,000 RPM
500
OpenAI automatically raises your RPM limit
once you reach these consumption tiers:
TierRequirementLimit
Free 3 RPM
Tier 1$5 paid500 RPM
Tier 2$50 paid and 7+ days since first payment5,000 RPM
Tier 3$100 paid and 7+ days since first payment5,000 RPM
Tier 4$250 paid and 14+ days since first payment10,000 RPM
Tier 5$1,000 paid and 30+ days since first payment10,000 RPM
500
OpenAI automatically raises your RPM limit
once you reach these consumption tiers:
TierRequirementLimit
Free 3 RPM
Tier 1$5 paid3,500 RPM
Tier 2$50 paid and 7+ days since first payment3,500 RPM
Tier 3$100 paid and 7+ days since first payment3,500 RPM
Tier 4$250 paid and 14+ days since first payment10,000 RPM
Tier 5$1,000 paid and 30+ days since first payment10,000 RPM
3,500
OpenAI automatically raises your RPM limit
once you reach these consumption tiers:
TierRequirementLimit
Free 3 RPM
Tier 1$5 paid3,500 RPM
Tier 2$50 paid and 7+ days since first payment3,500 RPM
Tier 3$100 paid and 7+ days since first payment3,500 RPM
Tier 4$250 paid and 14+ days since first payment10,000 RPM
Tier 5$1,000 paid and 30+ days since first payment10,000 RPM
3,500
OpenAI automatically raises your RPM limit
once you reach these consumption tiers:
TierRequirementLimit
Free 3 RPM
Tier 1$5 paid500 RPM
Tier 2$50 paid and 7+ days since first payment500 RPM
Tier 3$100 paid and 7+ days since first payment5,000 RPM
Tier 4$250 paid and 14+ days since first payment10,000 RPM
Tier 5$1,000 paid and 30+ days since first payment10,000 RPM
500
120
360
1,440
1,440
1,440
1,440
N/A
N/A
90
90
1,500
10,000
10,000
10,000
10,000
Default tokens per minute
10,000
OpenAI automatically raises your TPM limit
once you reach these consumption tiers:
TierRequirementLimit
Free 10,000 RPM
Tier 1$5 paid20,000 RPM
Tier 2$50 paid and 7+ days since first payment40,000 RPM
Tier 3$100 paid and 7+ days since first payment80,000 RPM
Tier 4$250 paid and 14+ days since first payment300,000 RPM
Tier 5$1,000 paid and 30+ days since first payment300,000 RPM
20,000
OpenAI automatically raises your TPM limit
once you reach these consumption tiers:
TierRequirementLimit
Free 10,000 RPM
Tier 1$5 paid20,000 RPM
Tier 2$50 paid and 7+ days since first payment40,000 RPM
Tier 3$100 paid and 7+ days since first payment80,000 RPM
Tier 4$250 paid and 14+ days since first payment300,000 RPM
Tier 5$1,000 paid and 30+ days since first payment300,000 RPM
20,000
OpenAI automatically raises your TPM limit
once you reach these consumption tiers:
TierRequirementLimit
Free 20,000 RPM
Tier 1$5 paid40,000 RPM
Tier 2$50 paid and 7+ days since first payment80,000 RPM
Tier 3$100 paid and 7+ days since first payment160,000 RPM
Tier 4$250 paid and 14+ days since first payment1,000,000 RPM
Tier 5$1,000 paid and 30+ days since first payment1,000,000 RPM
40,000
OpenAI automatically raises your TPM limit
once you reach these consumption tiers:
TierRequirementLimit
Free 20,000 RPM
Tier 1$5 paid40,000 RPM
Tier 2$50 paid and 7+ days since first payment80,000 RPM
Tier 3$100 paid and 7+ days since first payment160,000 RPM
Tier 4$250 paid and 14+ days since first payment1,000,000 RPM
Tier 5$1,000 paid and 30+ days since first payment1,000,000 RPM
40,000
OpenAI automatically raises your TPM limit
once you reach these consumption tiers:
TierRequirementLimit
Free 150,000 RPM
Tier 1$5 paid1,000,000 RPM
Tier 2$50 paid and 7+ days since first payment1,000,000 RPM
Tier 3$100 paid and 7+ days since first payment5,000,000 RPM
Tier 4$250 paid and 14+ days since first payment5,000,000 RPM
Tier 5$1,000 paid and 30+ days since first payment10,000,000 RPM
1,000,000
20,000
60,000
240,000
240,000
240,000
240,000
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
Avg latency in the last 48h*
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
Training data cutoff
Apr 2023
Sep 2021
Sep 2021
Sep 2021
Sep 2021
Sep 2021
Sep 2021
Sep 2021
Sep 2021
Sep 2021
Sep 2021
Sep 2021
Late 2021
Early 2023
Feb 2023
Feb 2023
N/A
Feb 2023
Feb 2023
N/A
N/A
Languages available
All
All
All
All
All
All
All
All
All
All
All
All
All
All
All
All
All
English
English
English
English
*The average latency is calculated by generating up to 512 tokens at a temperature of 0.7 every 10 minutes across 3 different locations. Check out our latency tracker.

FAQ

What is “Max tokens”?

The maximum number of tokens that the model can process in a single request. This limit includes both the input (prompt) and the output (completion) tokens.

What is “Input price per 1M tokens”?

The cost of processing 1 million input (prompt) tokens using this model.

What is “Output price per 1M tokens”?

The cost of generating 1 million output (completion) tokens using this model.

What is “Default requests per minute”?

The maximum number of requests per minute that can be made within one minute to this API with a default rate limit. Some providers offer the option to increase this limit upon request.

What is “Default tokens per minute”?

The maximum number of tokens (input+output) that can be processed within one minute by this model API. Some providers offer the option to increase this limit upon request.

What is “Avg latency in the last 48h”?

The average response time of the model API, measured by generating a maximum of 512 tokens at a temperature of 0.7 every 10 minutes in 3 locations, during the last 48h. The maximum response time is capped at 60 seconds but could be higher in reality.

What is “Training data cutoff”?

The training data cutoff date is the date of the latest knowledge the model has. The model cannot know anything that happened or was published after this date.

What is “Languages available”?

The languages that the model can be prompted in and generate text in.

OpenAI

Established in 2015, OpenAI is an American research laboratory dedicated to the advancement of artificial intelligence (AI). With a strong emphasis on the development of safe and beneficial artificial general intelligence (AGI), OpenAI aims to create highly autonomous systems capable of surpassing human performance in economically valuable tasks. The organization was co-founded by Ilya Sutskever, Greg Brockman, Trevor Blackwell, Vicki Cheung, Andrej Karpathy, Durk Kingma, Jessica Livingston, John Schulman, Pamela Vagata, and Wojciech Zaremba, with Sam Altman and Elon Musk serving as the initial board members. In 2019, OpenAI received a $1 billion investment from Microsoft, followed by a $10 billion investment in 2023.

gpt-4-turbo

With 128k context, fresher knowledge and the broadest set of capabilities, GPT-4 Turbo is more powerful than GPT-4 and offered at a lower price.

gpt-4

More capable than any GPT-3.5 model, able to do more complex tasks, and optimized for chat.

gpt-4-32k

Same capabilities as the base gpt-4 mode but with 4x the context length.

gpt-3.5-turbo-16k

Most capable GPT-3.5 model and optimized for chat at 1/10th the cost of text-davinci-003.

gpt-3.5-turbo-instruct

gpt-3.5-turbo-instruct is an Instruct model and only supports a 4K context window.

text-embedding-ada-002

OpenAI’s second generation embedding model, text-embedding-ada-002 is designed to replace the previous 16 first-generation embedding models at a fraction of the cost.

Azure AI

Azure OpenAI Service gives customers advanced language AI with OpenAI GPT-4, GPT-3, Codex, and DALL-E models with the security and enterprise promise of Azure. Azure OpenAI co-develops the APIs with OpenAI, ensuring compatibility and a smooth transition from one to the other.

gpt-3.5-turbo

Most capable GPT-3.5 model and optimized for chat at 1/10th the cost of text-davinci-003.

gpt-3.5-turbo-16k

Same capabilities as the standard gpt-3.5-turbo model but with 4 times the context.

gpt-4

More capable than any GPT-3.5 model, able to do more complex tasks, and optimized for chat.

gpt-4-32k

Same capabilities as the base gpt-4 mode but with 4x the context length.

text-embedding-ada-002

OpenAI’s second generation embedding model, text-embedding-ada-002 is designed to replace the previous 16 first-generation embedding models at a fraction of the cost.

Anthropic

Anthropic PBC, a US-based startup and public-benefit corporation, was founded by former members of OpenAI. Specializing in the development of general AI systems and language models, Anthropic operates with a strong commitment to responsible AI usage. As of July 2023, Anthropic has successfully raised $1.5 billion in funding.

claude-2

Their most powerful model, which excels at a wide range of tasks from sophisticated dialogue and creative content generation to detailed instruction.

claude-instant-1

A faster, cheaper yet still very capable model, which can handle a range of tasks including casual dialogue, text analysis, summarization, and document comprehension.

Google Vertex AI

Developed by Google AI, PaLM (Pathways Language Model) is an advanced large-scale transformer-based language model with an impressive parameter count of 540 billion. Researchers have also trained smaller versions of PaLM, including 8 and 62 billion parameter models, to investigate the impact of model scale. PaLM exhibits exceptional performance across various tasks, such as commonsense reasoning, arithmetic reasoning, joke explanation, code generation, and translation. By incorporating chain-of-thought prompting, PaLM notably outperforms other models in multi-step reasoning challenges, such as word problems and logic-based questions. Initially unveiled in April 2022, PaLM remained private until March 2023, when Google introduced an API for PaLM and other associated technologies. While initially limited to select developers through a waitlist, the API will eventually be made available to the public.

text-bison-001

Fine-tuned to follow natural language instructions and is suitable for a variety of language tasks.

chat-bison-001

Fine-tuned for multi-turn conversation use cases.

textembedding-gecko-001

Returns model embeddings for text inputs.

Cohere

Headquartered in Toronto, Canada, Cohere is a dynamic startup focused on providing natural language processing models that enhance human-machine interactions. Founded in 2019 by Aidan Gomez, Ivan Zhang, and Nick Frosst, Cohere offers cutting-edge solutions to empower companies in improving their communication with AI systems. In addition to their Toronto base, Cohere also maintains offices in Palo Alto and London.

command

An instruction-following conversational model that performs language tasks with high quality, more reliably and with a longer context than our base generative models.

command-light

A smaller, faster version of command. Almost as capable, but a lot faster.

base

A model that performs generative language tasks.

base-light

A smaller, faster version of base. Almost as capable, but a lot faster.