GPT-4o: OpenAI's Latest AI Model Powering ChatGPT, Launched
Keneci Network @kenecifeed
Keneci Network @kenecifeed
OpenAI introduced on Monday, GPT-4omni a new generative AI model that can handle text, speech, and video. It accepts as input any combination of text, audio, and image and generates any combination of text, audio, and image outputs, the company said in a blog post announcing the new AI model.
According to the company, GPT-4o can respond to audio inputs in as little as 232 milliseconds, with an average of 320 milliseconds, which is similar to human response time in a conversation. The new model is also more multilingual, the company claims, with enhanced performance in around 50 languages.
GPT-4 Turbo, the company's previous “leading “most advanced” model, was trained on a combination of images and text and could analyze images and text to accomplish tasks like extracting text from images or even describing the content of those images. But GPT-4o adds speech to the its capabilities.
OpenAI CTO Mira Murati said that GPT-4o provides “GPT-4-level” intelligence but improves on GPT-4’s capabilities across multiple modalities and media. “GPT-4o reasons across voice, text and vision,” she said during a streamed presentation at the company's offices in San Francisco on Monday. “And this is incredibly important, because we’re looking at the future of interaction between ourselves and machines.”
GPT-4o supposedly improves the experience in OpenAI’s AI-powered chatbot, ChatGPT. The platform has long offered a voice mode that transcribes the chatbot’s responses using a text-to-speech model, but GPT-4o supercharges this, allowing users to interact with ChatGPT more like an assistant.
The model delivers “real-time” responsiveness, OpenAI says, and can even pick up on nuances in a user’s voice, in response generating voices in “a range of different emotive styles” (including singing).
GPT-4o also improves ChatGPT’s vision capabilities. Given a photo -- or a desktop screen -- ChatGPT can now quickly answer related questions, from topics ranging from “What’s going on in this software code?” to “What brand of shirt is this person wearing?”
“We know that these models are getting more and more complex, but we want the experience of interaction to actually become more natural, easy, and for you not to focus on the UI at all, but just focus on the collaboration with ChatGPT,” Murati said. “For the past couple of years, we’ve been very focused on improving the intelligence of these models … But this is the first time that we are really making a huge step forward when it comes to the ease of use.”
OpenAI says GPT-4o is available in the free tier of ChatGPT starting today and to subscribers to premium ChatGPT Plus and Team plans with “5x higher” message limits. (OpenAI notes that ChatGPT will automatically switch to GPT-3.5, an older and less capable model, when users hit the rate limit). The improved ChatGPT voice experience underpinned by GPT-4o will arrive in alpha for Plus users in the next month or so, alongside enterprise-focused options.
The GPT Store, OpenAI’s library of and creation tools for third-party chatbots built on its AI models, is now available to users of ChatGPT’s free tier. And free users can take advantage of ChatGPT features that were formerly paywalled.
WATCH OpenAI's presentation of GPT-4o