Oct. 2, 2024
Yesterday, OpenAI had their annual DevDay. There were four major announcements all to with developers and the OpenAI API:
- Realtime API – Build fast speech-to-speech experiences
- Vision – Fine-tune GPT-4o with images
- Prompt Caching – Easier to re-use prompts
- Distillation – Use the output of a large model to train a smaller/cheaper/more efficient smaller model
The Realtime API is the most exciting. It allows for true real-time voice interaction with the LLM. You establish a websockets connection with the LLM. The LLM can accept both text (so that you can give it a starting prompt) and audio. According to the documentation, “The models have a natural inflection and can laugh, whisper, and adhere to tone direction.” Currently the documentation does not specify the voice itself including, gender or regional accent.
The vision training is interesting. You can send the model up to 50,000 images to train on and it will create a model specific to that training. You can then test between an untrained model and your newly trained model as to which supplies better answers. You can even iteratively train your trained model.
The prompt caching and distillation are solid improvements to the service and should help provide faster responses and even less expensive usage.
#AI #chatgpt #LLM #openai