OpenAI has introduced a new model GPT-4o, that can reason across audio, vision, and text in real time.
Takeaway points
- OpenAI has introduced a new model GPT-4o, that can reason across audio, vision, and text in real time.
- The announcement came a few days after the company said in a post on X that it would go live on Monday to announce some updates.
- GPT-4o’s language tokenization was 20 languages, and it was chosen as representative of the new tokenizer’s compression across different language families.
- GPT-4o’s text and image capabilities are starting to roll out today in ChatGPT, and they are available in the free tier and to Plus users with up to 5x higher message limits.
Why did OpenAI introduce a new model GPT-4o?
The ChatGPT maker, OpenAI, announced on Monday the introduction of a new model GPT 4o, that can reason across audio, vision, and text in real time.
The announcement came a few days after the company said in a post on X that it would go live on Monday to show ChatGPT and GPT-4 updates.
OpenAI said that “GPT-4o (“o for “omni”) is a step towards much more natural human-computer interaction—it accepts as input any combination of text, audio, and image and generates any combination of text, audio, and image outputs. It can respond to audio inputs in as little as 232 milliseconds, with an average of 320 milliseconds, which is similar to human response time (opens in a new window) in a conversation. It matches GPT-4 Turbo performance on text in English and code, with significant improvement on text in non-English languages, while also being much faster and 50% cheaper in the API. GPT-4o is especially better at vision and audio understanding compared to existing models.”
What is the language tokenization and availability of GPT-4o?
According to the report, the language tokenization was in 20 languages, and it was chosen as representative of the new tokenizer’s compression across different language families.
OpenAI said that starting from the day of the announcement, which is Monday, they are publicly releasing text and image inputs and text outputs, and in the upcoming weeks and months, they’ll be working on the technical infrastructure, usability via post-training, and safety necessary to release the other modalities.
GPT-4o’s text and image capabilities are starting to roll out today in ChatGPT, and we are making GPT-4o available in the free tier and to Plus users with up to 5x higher message limits. We’ll roll out a new version of Voice Mode with GPT-4o in alpha within ChatGPT Plus in the coming weeks, according to the report.
Other features
The ChatGPT maker said that the GPT-4o has gone through extensive external red teaming with 70+ external experts in domains such as social psychology, bias and fairness, and misinformation to identify risks that are introduced or amplified by the newly added modalities. These learnings are used to build out our safety interventions in order to improve the safety of interacting with GPT-4o. We will continue to mitigate new risks as they’re discovered.
OpenAI said that GPT-4o is 2x faster, half the price, and has 5x higher rate limits compared to GPT-4 Turbo and Developers can also now access GPT-4o in the API as a text and vision model. It also plans to launch support for GPT-4o’s new audio and video capabilities to a small group of trusted partners in the API in the coming weeks.