As Amazon invests $4b in a potential OpenAI competitor Anthropic thereby owning a majority stake in the AI company, OpenAI comes up with some big updates to ChatGPT – its flagship service. The arguably hottest text-based AI company at least at the moment is now letting users interact with the AI-powered bot through voice commands and image prompts. This significant update is set to roll out to ChatGPT subscribers within the next two weeks, with accessibility expanding to all users shortly after.
As voice interaction with AI powered chatbots becomes a thing, OpenAI is now hopping on the bandwagon and doesn’t want to be left out although the number of users who prefer a voice prompt over text remains unclear. Users can simply tap a button, speak their queries, and ChatGPT processes the voice input, returning a spoken response. This feature like the one in the Google owned Bard promises according to OpenAI a more accurate response as a result of improved technology.
OpenAI leverages its powerful Whisper model to facilitate speech-to-text conversion. Additionally, the company is introducing a new text-to-speech model capable of generating human-like audio from mere text and a brief audio sample. Users will have the flexibility to choose from five voice options. Beyond ChatGPT, OpenAI’s collaboration with Spotify for podcast translation showcases the broader potential of synthetic voices.
ChatGPT can now see, hear, and speak. Rolling out over next two weeks, Plus users will be able to have voice conversations with ChatGPT (iOS & Android) and to include images in conversations (all platforms). https://t.co/uNZjgbR5Bm pic.twitter.com/paG0hMshXb
— OpenAI (@OpenAI) September 25, 2023
However, OpenAI acknowledges the new risks accompanying these capabilities, such as impersonation of public figures and potential fraudulent activities. As a countermeasure, the model’s usage will be stringently controlled and limited to specific scenarios and partnerships.
The image interaction aspect functions similarly to Google Lens. Users can snap a picture of their subject, and ChatGPT will analyse the image to provide relevant responses. You can further clarify your query by utilizing the app’s drawing tool or by speaking and typing accompanying questions. This approach supports a dynamic and iterative interaction, minimizing the need for repetitive searches, in alignment with Google’s multimodal search concept.
To ensure responsible use, OpenAI has deliberately constrained ChatGPT’s ability to analyze and provide direct statements about individuals, both to safeguard privacy and maintain accuracy.
With this update, OpenAI wants to continue its dominance of the text based AI field and doesn’t want to be left out of the voice prompt race as well. Image and voice interactions can easily become a complex thing even for big platforms like ChatGPT and the user base grows, it will only become complex to manage. Well good thing this feature is for Plus and Enterprise users for now. I don’t see them expanding this to the free tier anytime soon but then again, let’s see what the competition says a few months from now.