How Google Gemini 1.5 Pro learned to hear and Vertex AI created a bot designer
Google announced significant updates to its language and generative models: Gemini 1.5 Pro will now be able to process audio and video without the need for transcription, and Imagen 2 will gain image editing capabilities and invisible tagging capabilities.
Google AI
Gemini 1.5 Pro
Google's largest language model Gemini 1.5 Pro has been updated and now Capable of recognizing speech from audio and video without downloading text transcripts. This will allow users to communicate directly with the model and receive responses based on audio data. Gemini 1.5 Pro was presented in February and is superior in performance to other models from the company. Its main advantage is the ability to process a huge amount of context from 128,000 to a million tokens, which is many times more than competitors like GPT-4 from OpenAI.
Imagen 2
Google has also improved its generative model Imagen, which is responsible for creating images based on text requests. The new version of Imagen 2 has received "inpainting" and "outpainting" functions, allowing you to add or remove elements from the generated images. In addition, all images generated by the model can now be marked with an invisible SynthID watermark, indicating their artificial origin.
Vertex AI
Updated models will be available on the new Vertex AI cloud platform designed for Google business clients. With its help, companies will be able to create their own chatbots and integrate them into their products and services.
Glossary
- Google is the largest technology company, developer of a search engine and various services
- Gemini - Google's line of language models for natural language processing
- Imagen - Google's generative model for creating images from text descriptions
- Vertex AI - cloud platform for creating and deploying AI- models
Links
Answers to questions
What's new in Gemini 1.5 Pro?
What improvements has Imagen 2 received?
Where will the updated Gemini and Imagen models be available?
How is Gemini 1.5 Pro different from other language models?
What are the key terms associated with Google's new models?
Hashtags
Save a link to this article
Discussion of the topic – How Google Gemini 1.5 Pro learned to hear and Vertex AI created a bot designer
At the Google Next conference, the company announced that Gemini 1.5 Pro now supports speech recognition from audio, video and phone calls without a transcript. The Vertex AI platform for creating bots was also presented.
Latest comments
14 comments
Write a comment
Your email address will not be published. Required fields are checked *
Михаил
Gemini 1.5 Pro is another breakthrough in the field of natural language processing. The ability to understand audio without the need for transcription opens new horizons for voice assistants and chatbots. 🎉
Катя
Yes, that's impressive! But I'm more interested in the new inpainting feature in Imagen 2. Imagine, now you can easily remove or add elements to images. This will be useful for creative projects and photo editing. 🖼️
Ян
Great news for developers! With the amount of context that Gemini 1.5 Pro can handle, creating more complex and advanced applications will become much easier. I can't wait to try it out in practice. 💻
Анна
SynthID watermark is a good idea for generative AI images. This will help distinguish them from real photographs and protect copyright. But I hope that it will not be too noticeable and will not spoil the overall impression of the picture. 🖌️
Виктор
These updates are another step towards greater integration of artificial intelligence into our daily lives. I'm looking forward to using Gemini 1.5 Pro and Imagen 2 to automate routine tasks and create unique content. 🚀
Ганс
Like an old curmudgeon, I'm skeptical of all these newfangled trends. Why do we need artificial intelligence if we have people who can perform the same tasks? It's just another useless toy for developers. 🙄
София
Hans, I understand your concern, but progress cannot be stopped. With tools like Gemini 1.5 Pro and Imagen 2, we can automate routine tasks and focus on more creative and intelligent work. This is an opportunity for humanity, not a threat. 🌟
Лукаш
I can already imagine how Gemini 1.5 Pro will be used in customer service. The ability to understand voice queries and provide relevant information in real time is a real breakthrough. Customers will be pleased with the fast and efficient service. 🤖
Мария
I'm looking forward to using Imagen 2 to create unique illustrations for my projects. The inpainting and outpainting functions open up so many creative possibilities! 🎨
Давид
I can't help but agree with Mikhail. Processing audio without transcription is a huge step forward. Imagine how this will make it easier to interact with voice assistants in cars or smartwatches. Technology is truly changing our lives! ⌚
Елена
I like the SynthID watermark idea. This will help distinguish generative images from real ones and avoid confusion. Of course, it would be great if it were as invisible to the eye as possible. 🔍
Якуб
I can already see how Gemini 1.5 Pro and Imagen 2 will be used in education. Imagine how much more interesting your lessons will be with these tools! Students will be able to better understand the material thanks to clarity and interactivity. 👩🏫
Франческа
As a developer, I'm looking forward to working with Gemini 1.5 Pro and Imagen 2. Their powerful capabilities will enable the creation of truly innovative applications and services. This is a real breakthrough in technology! 💻🚀
Андрей
I can't help but notice that all these updates are just the tip of the iceberg. Google continues to actively develop its AI technologies, and I am confident that even more exciting announcements await us in the near future. The era of artificial intelligence is just beginning! ⚡