OpenAI Dev Day Podcast
Oct 3 2024
Length: 14 mins
Podcast

Failed to add items

Sorry, we are unable to add the item because your shopping cart is already at capacity.

Add to basket failed.

Please try again later

Add to Wish List failed.

Please try again later

Remove from Wish List failed.

Please try again later

Follow podcast failed

Unfollow podcast failed

OpenAI Dev Day Podcast

Listen for free

View show details

Summary
OpenAI has recently launched a number of new features to its API. The Realtime API enables developers to build speech-to-speech experiences within their applications. The Vision Fine-tuning API enables developers to fine-tune GPT-4o with images and text to improve its visual understanding capabilities. Model Distillation lets developers create cost-effective models by using the outputs of more powerful models like GPT-4o to train smaller models. Prompt Caching helps developers reduce costs and latency by automatically caching input tokens, thereby reducing the amount of computation needed for frequently repeated inputs.

OpenAI's new Realtime API:
Low-latency, multimodal experiences: The Realtime API enables developers to build applications with fast speech-to-speech conversations, similar to ChatGPT’s Advanced Voice Mode.
Natural conversational experiences with a single API call: Developers no longer need to use multiple models for speech recognition, text processing, and text-to-speech. The Realtime API handles the entire process with one call.
Streaming audio inputs and outputs: This allows for more natural conversations compared to previous approaches that resulted in noticeable latency and loss of emotion and emphasis.
Automatic interruption handling: The Realtime API, much like Advanced Voice Mode in ChatGPT, can manage interruptions smoothly.
Persistent WebSocket connection to exchange messages with GPT-4o: This underlies the Realtime API's functionality.
Function calling: Voice assistants built with the Realtime API can respond to user requests by triggering actions or accessing new information.
Six preset voices: The Realtime API utilizes the same six preset voices already available in the API.
The sources also discuss new features and capabilities in the Chat Completions API:
Audio input and output in the Chat Completions API: This will allow developers to build applications that use audio without needing the low-latency of the Realtime API.
Input and receive text or audio: Developers can choose to have GPT-4o respond with text, audio, or both.
Join our community: getcoai.com
Follow us on Twitter or watch us on Youtube
Get our newsletter!
Show More Show Less

Show More Show Less

What listeners say about OpenAI Dev Day Podcast

Average Customer Ratings

Reviews - Please select the tabs below to change the source of reviews.

Audible.com.au reviews

Amazon Reviews

No Reviews are Available

Report a review on Amazon