OpenAI has released a Hyperrealistic voice, also called the Advanced Voice Mode for ChatGPT. This uses the GPT-4o model to give hyperrealistic audio to premium users. In other words, ChatGPT can now talk and listen realistically with this new feature, making the app captivating for users. It was officially released on July 30, 2024, and enables more natural and real-time conversations with the AI. It recognizes and responds to emotions and even interrupts you mid-sentence.
However, not all users will gain access to this new feature. A select group of ChatGPT Plus members has already received early access to this voice mode feature. The alpha will roll out to all Plus members by fall 2024, and a wider release will follow.
When OpenAI first showed off GPT-4o’s voice in May, it blew everyone away with its speed and how human-like it sounded. Infact it had a striking likeness with the voice of Scarlett Johansson’s character from the movie Her. Given the conversation on the likeness, Johansson took legal action to protect her image. However, OpenAI denied using her voice but then removed the voice and postponed the release to add more safety measures.
Features and Differences: Newest ChatGPT Improvements
The Advanced Voice Mode is different from the old voice in ChatGPT, featuring the newest model improvements. The old system used three (3) separate models for voice-to-text, processing the prompt and text-to-voice. With this new model, GPT-4o combines all these into one multimodal model. This reduces latency and gives you smoother, more responsive interactions. Reinforcement learning techniques have been employed to refine the model, enhancing the quality and safety of AI outputs. GPT-4o can also detect emotional tones in your voice, like sadness, excitement, or even singing.
OpenAI will monitor the rollout to gather feedback and ensure the feature is used responsibly. Alpha group users will receive a notification in the ChatGPT app and an email with instructions on how to use the new voice mode.
Since the demo, OpenAI has tested GPT-4o’s voice with over 100 external red teamers who speak 45 different languages. In early August, they will release a report on the safety efforts.
For now, Advanced Voice Mode will have four preset voices—Juniper, Breeze, Cove, and Ember—created with professional voice actors. The voice from the May demo, Sky, will not be available as OpenAI has added safeguards to prevent the impersonation of real or public figures. Lindsay McCallum, an OpenAI spokesperson, said, “ChatGPT cannot mimic other people’s voices and will block outputs that deviate from the preset voices.”
To prevent abuse of voice technology, OpenAI is blocking certain requests like music or copyrighted audio. This is a proactive measure to mitigate legal risks as hyperrealistic audio models like GPT-4o will attract complaints especially from litigious entities like record labels.
OpenAI is rolling out ChatGPT’s Advanced Voice Mode gradually and is prioritizing user safety and legal compliance. As this is rolled out to more users, it will change the way we interact with AI for a more human-like experience, providing professional input for tasks like brainstorming marketing copy or developing business plans.
Found this article helpful? Subscribe to our newsletter to get instant updates.
Read More:
Is AI Taking Over the UK Tech Industry? Tech Nation Report 2024
CyberSecurity: How to Protect Your Brand and Customers from Cyber Attacks