OpenAI has launched a preview of OpenAI o1, formerly known as Project Strawberry, which can think through problems and break them down step-by-step.
According to the company:
“OpenAI o1 series models are new large language models trained with reinforcement learning to perform complex reasoning. o1 models think before they answer, and can produce a long internal chain of thought before responding to the user. o1 models excel in scientific reasoning, ranking in the 89th percentile on competitive programming questions (Codeforces), placing among the top 500 students in the US in a qualifier for the USA Math Olympiad (AIME), and exceeding human PhD-level accuracy on a benchmark of physics, biology, and chemistry problems (GPQA).”
OpenAI has two reasoning models available:
- o1-preview: An early preview of their o1 model which is designed to reason about hard problems using broad general knowledge about the world.
- 01-mini: A faster and cheaper version of o1 which is especially adept at coding, math, and science tasks where extensive general knowledge isn’t required.
While Sam Altman has called them OpenAI’s “most capable and aligned models yet”, OpenAI has also clarified that while o1 models offer significant advancements in reasoning, they are not intended to replace GPT-4o in all use-cases. They are specifically meant for developing applications that demand deep reasoning and can accommodate longer response times.
OpenAI has given viewers a brief rundown on its o1 models’ accomplishments:
“In our tests, the next model update performs similarly to PhD students on challenging benchmark tasks in physics, chemistry, and biology. We also found that it excels in math and coding. In a qualifying exam for the International Mathematics Olympiad (IMO), GPT-4o correctly solved only 13% of problems, while the reasoning model scored 83%. Their coding abilities were evaluated in contests and reached the 89th percentile in Codeforces competitions.”
For a quicker breakdown about ChatGPT o1:
View this post on Instagram
The o1 models are currently in beta with limited features and their own fair share of constraints. They are more expensive than ChatGPT Plus, and their new programme at 30 prompts a week only be accessed by paying users of ChatGPT Plus and Teams. Moreover, they cane slightly longer to answer prompts because, “like a human”, it is analyzing and finding answers. It cannot browse the web for information, nor can documents and images be uploaded to it.
Thus, it will likely take longer for OpenAI to perfect it completely.