ChatGPT Evolves: Now with Voice and Vision Capabilities

ChatGPT, OpenAI’s renowned chatbot, is about to undergo a groundbreaking transformation. OpenAI has announced a major update to its artificial intelligence chatbot, which will now be able to engage in spoken conversations with users, marking a significant shift from its previous text-based interactions.

Voice Support Arrives for ChatGPT

The addition of audio support to ChatGPT will be available through its iOS and Android applications. Users accessing the new version will have the exciting opportunity to converse directly with the AI chatbot, with the ability to choose from five different voices.

OpenAI explained that this feature of ChatGPT is built upon a new text-to-speech conversion model, enabling it to generate spoken responses from short audio samples. Coupled with collaboration with professional voice actors, this development makes interactions with the artificial intelligence feel more human-like and engaging.

Furthermore, developers revealed that ChatGPT’s new version utilizes the Whisper speech recognition system to convert users’ spoken questions into text. A demonstration of this spoken conversation with the chatbot can be viewed in a recent Twitter post.

Voice Options Customization

It’s important to note that the voice options for ChatGPT will not be enabled by default. Users desiring to converse with the chatbot using this feature will need to manually activate it in the Settings > New Features section. Once enabled, users can select their preferred voice for the chatbot.

Also Read: ChatGPT Introduces Custom Instructions

Gradual Rollout and Target Audience

The rollout of voice features will be a gradual process, meaning that not all ChatGPT mobile app users will have immediate access. OpenAI has provided reassurance to users that this new functionality will be introduced in a phased manner, commencing with ChatGPT Plus and Enterprise subscribers over the next two weeks.

OpenAI’s strategy is to confine this technology exclusively to in-app conversations, a measure designed to mitigate the risk of potential misuse. While acknowledging the innovative and accessibility-enhancing aspects of the new voice technology, OpenAI is also cognizant of associated risks, including the potential for malicious actors to impersonate public figures or engage in fraudulent activities.

ChatGPT: Seeing, Listening, and Speaking

In addition to voice capabilities, ChatGPT is gaining the ability to “see.” Users of the iOS and Android apps will now be able to interact with the chatbot using images. This means users can take a picture and request AI assistance with specific tasks.

An intriguing feature is the ability to “draw” on a section of an image to direct the AI’s focus. OpenAI explains that this new feature is powered by a multimodal platform, combining GPT-3.5 and GPT-4.

Developers at OpenAI highlight that the vision capability of ChatGPT is aimed at enhancing users’ daily lives, particularly when the AI can understand what the user sees. However, OpenAI has implemented limitations to protect privacy, for instance, this feature won’t work with photos of individuals, respecting the privacy of individuals.

Also Read: OpenAI Announces ChatGPT App for Android


With the introduction of voice and vision capabilities, ChatGPT is poised to offer more human-like and versatile interactions, while also prioritizing user safety and privacy. This evolution is set to reshape how users engage with AI, making it more accessible and responsive than ever before.

Leave a Reply