OpenAI ChatGPT, Anthropic Claude, Google Gemini models guide

What is a model?

A model is a prediction engine, usually specific to a certain kind of problem. You will find models for guessing the weather, the stock market, championships, what is contained in a picture etc. What they have in common is that you provide them with an input such as today’s weather, and you get a prediction as an output, such as tomorrow’s weather, usually with some kind of confidence score.

Predictions made by models can be more or less accurate and reliable. Until recently, there was no good model to predict “what comes next” after a “text input”.

OpenAI invented a new kind of models, called generative pre-trained transformers (GPT) or large language models (LLMs) that have changed the game: they are capable to “continue” input text in most situations, on par or even better, and certainly faster than the average human could do.

Since “text input” is such a broad scope, when trained correctly, these GPT models cover a wide array of tasks such as Q&A, following instructions, or even writing code.

So a GPT model is a prediction engine for text.

Update on Sep 1st, 2024: GPT models are now multi-modal which means they can also take images, and sometimes audio or video as input, and output images.

What is the difference between a model and an AI?

There is no difference. A model is a technical word to say an “AI”. So basically, choosing a model is equivalent to choosing an AI. Unlike the human brain, models or AIs tend to be highly specialized for a specific set of tasks or inputs. Depending on your task at hand (whether you’re working with images, audio, video or text), you will want to choose a different AI or model.

We’ve had AI for decades now, what’s the big deal with generative AI?

Generative AI models (LLMs) such as OpenAI GPT, Anthropic Claude, Google Gemini, Meta Llama bring game-changing features that previous generations of models didn’t have:

they can follow instructions. Try specifying tone or style in Google Translate or DeepL. You can’t.
One model can do everything (translation, summarization, rephrasing and editing, content generation, categorization, extraction, reformatting). You don’t need one model per type of task.
They are highly multi-lingual (Try translating to Finnish in Google Translate and compare with ChatGPT translation)
They can process markup language (html, markdown, xml)

What parameters should be considered when you choose a model?

You will generally want to consider the following parameters:

Quality: how good the model is at the task you want it to complete, which can vary greatly depending on your specific context. See livebench.ai for a good up-to-date benchmark
Speed: how fast you get an output from the model
Cost: how much each task you give it costs
Reliability: how often the model refuses to answer because it thinks the task is harmful even though it isn’t.
Multi-linguality: some models are much better than others at reading and writing languages that are not English
Reading capacity or context length: a model has a limited “reading” capacity. GPT-4o can read up to 128K tokens which roughly equals 100 pages of English text. Claude 3.5 Sonnet can read up to 200K tokens which is around 200 pages of English text. And Gemini 1.5 Flash can read up to 1M tokens which is around 1000 pages of English text! All of that in under a minute!
Writing capacity: the writing capacity is usually much lower. The most recent models generally have between 4K and 16K tokens of writing capacity.
Training data cutoff: GPT models (or more broadly transformer models) are pre-trained, as the acronym PT indicates. This means that it will learn a certain amount of world knowledge that will stop being updated once training is finished.

💡

Please see our models comparator for more detail or Artificial Analysis for an extensive, in-depth comparison tool

What’s the difference between GPT-3.5 and GPT-4 models?

GPT-3 models are “instruct” models that are meant to generate text with a clear instruction. They are not optimized for conversational chat. The best original GPT-3 model was text-davinci-003 but it has been deprecated in January 2024.

GPT-3.5 models (ChatGPT) were first released on March 1st, 2023. They are built on top of GPT-3 models and optimized for conversational chat. GPT-3.5 results can be too “chatty” or “creative” in some cases and will require a bit more prompt engineering to get crisp results.

GPT-4 models are the latest breed of OpenAI models, released on March 14th, 2023, with the latest one GPT-4-Turbo, released on November 11th, 2023.

GPT-4 models are multimodal: they can take both text and image inputs.
GPT-4 models can solve much more complex problems thanks to advance reasoning capabilities, and are typically much better at maths than previous models.
GPT-4 models can use twice to 32 times more tokens in their context than GPT-3.5 models.
GPT-4 models are however significantly more expensive than GPT-3.5.

Compare exact model specifications on our model comparator.

Which model to choose in GPT for Work?

Note: this video is outdated. A new video will be made soon. We are however not removing it because it can serve as a template for reproducing this kind of experiment.

The answer is gpt-4o in the vast majority of cases. It is currently the best compromise between speed, cost, quality, reliability, multi-linguality.

For that reason it is the default model in GPT for Sheets. You should always start experimenting with this one at first.

What is a fine-tuned model?

A fine-tuned model is a base model that was trained (fine-tuned) for a specific task by providing it some examples of inputs and expected outputs. You usually need between a few hundreds and a few thousands of examples to fine-tune a model.

You can learn how to fine-tune a model here.

When should I use a fine-tuned model?

Fine-tuned model can typically do only one thing, so you should use a fine-tuned model only and only if you need a specific task to be performed in very high volumes.

If you are in such as situation, then using a fine-tuned model will reduce costs, increase speed and rate limits.

A typical use-case is if you want the format of the output to follow very strict guidelines that are best explained by examples.