Meta introduces a breakthrough in generative AI and speech-to-text generation technology with a versatile tool Voicebox. The new AI model can perform tasks such as editing, sampling, and stylizing.
The ability to generate top-notch audio clips and modify pre-recorded audio is a notable feature of Voicebox. It can effectively eliminate disruptive sounds such as car horns or barking dogs while maintaining the original essence and tone of the audio. Moreover, this model is proficient in multiple languages, enabling it to deliver speech in six different languages with equal proficiency.
generative AI models like Voicebox have the potential to provide lifelike voices to virtual assistants and non-player characters within the metaverse. The company shared its vision in a blog it shared announcing the new tool stating, “In the future, multipurpose generative AI models like Voicebox could give natural-sounding voices to virtual assistants and non-player-characters in the metaverse. They could allow visually impaired people to hear written messages from friends read by AI in their voices, give creators new tools to easily create and edit audio tracks for videos, and much more.”
The new tool by Meta can perform the following tasks for you:
- In-context text-to-speech synthesis: Voicebox has the remarkable capability to analyze and replicate the audio samples, even when the sample is as small as two seconds. This means that Voicebox can effectively employ the identified audio style to generate text-to-speech results that closely align with the desired sound and tone.
- Speech editing and noise reduction: Voicebox can reconstruct sections of speech that have been disrupted by noise or change misspoken words, all without requiring a complete re-recording of the entire speech. This allows for efficient and precise corrections to be made, saving time and effort in the process.
- Cross-lingual style transfer: Voicebox showcases its remarkable versatility by easily adapting to various languages. By utilizing a speech sample and a text passage in English, French, German, Spanish, Polish, or Portuguese, the tool can generate an audio reading of the text in any of these languages.
- Diverse speech sampling: Thanks to extensive exposure to diverse datasets, Voicebox has acquired the ability to generate speech that closely emulates natural conversational patterns found in real-world interactions.
As generative AI technology continues to progress, many corporations are bringing new and improved tools to give better experiences to users. Now that Meta introduces the Voicebox speech generation AI tool, the company hopes its versatile features will allow it to become a stepping stone in the revolutionary technology.
Source: Meta Newsroom