December 23, 2024
Digital Media News

ChatGPT can now ‘see, hear and speak’, OpenAI announces

You can speak to ChatGPT and hear it talk back

OpenAI, the startup behind the wildly successful chatbot ChatGPT, has announced a significant update for its AI models that will transform user experience forever.

Users will now be able to ‘see, hear and speak’ to the AI chatbot, which will be able to respond back to them due to their new voice and image capabilities.

In a blog post, OpenAI said: “Voice and image give you more ways to use ChatGPT in your life. Snap a picture of a landmark while traveling and have a live conversation about what’s interesting about it. When you’re home, snap pictures of your fridge and pantry to figure out what’s for dinner (and ask follow up questions for a step by step recipe). After dinner, help your child with a math problem by taking a photo, circling the problem set, and having it share hints with both of you.”

What are the new capabilities?

ChatGPT’s new ‘Voice’ capability can be used to “engage in a back-and-forth conversation with your assistant. Speak with it on the go, request a bedtime story for your family, or settle a dinner table debate.” It is powered a new text-to-speech model that is capable of generating human-like audio from just text and a few seconds of sample speech. Additionally, OpenAI has also made use of Whisper, their open-source speech recognition system, to transcibe spoken words into text.

On its blog post, OpenAI provides a promotional video that displays a hypothetical exchange with ChatGPT where a user asks how to raise a bicycle seat, providing the chatbot with instruction manual and an image of the user’s toolbox. ChatGPT reacts and advises the user on how to finish the process.

OpenAI has also announced a new Image capability that lets ChatGPT respond to prompts featuring an image. For instance, you can click a picture of the contents in your fridge and ask ChatGPT to help you come up with a meal plan using those specific ingredients.

Their image capability is powered by multimodal GPT-3.5 and GPT-4. These models will be able to apply their language reasoning skills to a wide range of images, such as screenshots, photographs, and text or image centered documents.

When are they rolling out?

OpenAI will be rolling out their new voice and image features in ChatGPT to Plus and Enterprise users over the next two weeks. Voice will be coming on iOS and Android, and images will be available on all platforms.

How to use them?

To get started with voice, users must head to Settings → New Features on the mobile app and opt into voice conversations. After that, they must tap the headphone button located in the top-right corner of the home screen and choose their preferred voice out of five different voices.

To get started with images, users must tap the photo button to choose or capture an image. If they are on iOS or Android, tap the plus button first. They can also discuss multiple images. If they want to focus on a specific part of an image, they can use the drawing tool in the mobile app.

OpenAI has additionally reaffirmed its commitment towards providing its users with “safe and beneficial” upgrades, and have stated: “We believe in making our tools available gradually, which allows us to make improvements and refine risk mitigations over time while also preparing everyone for more powerful systems in the future. This strategy becomes even more important with advanced models involving voice and vision.”

    Leave feedback about this

    • Quality
    • Price
    • Service

    PROS

    +
    Add Field

    CONS

    +
    Add Field
    Choose Image
    Choose Video