OpenAI’s latest upgrade essentially lets users livestream with ChatGPT

A major ChatGPT upgrade, dubbed GPT Omni, allows the chatbot to interpret video and audio in real-time and speak more convincingly like a human.

ChatGPT creator OpenAI has announced its latest AI model, GPT-4o, a chattier, more humanlike AI chatbot, which can interpret a user’s audio and video and respond in real time.

A series of demos released by the firm shows GPT-4 Omni helping potential users with things like interview preparation — by making sure they look presentable for the interview — as well as calling a customer service agent to get a replacement iPhone.

Other demos show it can share dad jokes, translate a bilingual conversation in real time, be the judge of a rock-paper-scissors match between two users, and respond with sarcasm when asked. One demo even shows how ChatGPT reacts to being introduced to the user’s puppy for the first time.

“Well hello, Bowser! Aren’t you just the most adorable little thing?” the chatbot exclaimed.

Say hello to GPT-4o, our new flagship model which can reason across audio, vision, and text in real time: https://t.co/MYHZB79UqN

Text and image input rolling out today in API and ChatGPT with voice and video in the coming weeks. pic.twitter.com/uuthKZyzYx

— OpenAI (@OpenAI) May 13, 2024

“It feels like AI from the movies; and it’s still a bit surprising to me that it’s real,” said the firm’s CEO, Sam Altman, in a May 13 blog post.

“Getting to human-level response times and expressiveness turns out to be a big change.”

A text and image-only input version was launched on May 13, with the full version set to roll out in the coming weeks, OpenAI said in a recent X post.

GPT-4o will be available to both paid and free ChatGPT users and will be accessible from ChatGPT’s API.

OpenAI said the “o” in GPT-4o stands for “omni” — which seeks to mark a step toward more natural human-computer interactions.

Introducing GPT-4o, our new model which can reason across text, audio, and video in real time.

It’s extremely versatile, fun to play with, and is a step towards a much more natural form of human-computer interaction (and even human-computer-computer interaction): pic.twitter.com/VLG7TJ1JQx

— Greg Brockman (@gdb) May 13, 2024

GPT-4o’s ability to process any input of text, audio and image at the same time is a considerable advancement compared with OpenAI’s earlier AI tools, such as ChatGPT-4, which often “loses a lot of information” when forced to multi-task.

Related: Apple finalizing deal with OpenAI for ChatGPT iPhone integration: Report

OpenAI said “GPT-4o is especially better at vision and audio understanding compared to existing models,” which even includes picking up on a user’s emotions and breathing patterns.

It is also “much faster” and “50% cheaper” than GPT-4 Turbo in OpenAI’s API.

The new AI tool can respond to audio inputs in as little as 2.3 seconds, with an average time of 3.2 seconds, OpenAI claims, which it says is similar to human response times in an ordinary conversation.

Related Articles

Responses