AI Creates Sounds You’ve Never Heard Before

If you’ve been keeping up with AI research, you’ve probably heard of models that can generate things like speech or music from just a text prompt. But Nvidia’s latest AI model, called “Fugatto,” is pushing the boundaries even further. It doesn’t just create music or voices – it can mix different sounds in ways that have never been done before, including creating entirely new sounds that don’t exist in real life.

AI Creates Sounds You've Never Heard Before 1

What Makes Fugatto So Special?

Fugatto works by combining new training methods with advanced techniques that allow it to generate sounds and music based on text descriptions. Imagine asking it to create something like a saxophone barking people speaking underwater, or even a choir of ambulance sirens! While the results might not always be perfect, the variety of sounds it can create is truly impressive. Nvidia calls Fugatto a “Swiss Army knife for sound” because it can do so much with audio.

AI Creates Sounds You've Never Heard Before 2
OK, Fugatto, can we get a little more barking and a little less saxophone in the monitors? Credit: Getty Images

The Hard Part: Teaching the AI to Understand Sound

One of the biggest challenges in making Fugatto work is getting the AI to understand the relationship between words and sounds. Standard AI models can usually figure out what a word or sentence means just from the text, but sounds are more complex. The researchers had to teach Fugatto how to associate words like “happy voice” or “sad music” with actual audio traits like pitch, tone, or emotion.

To do this, the team used a huge amount of data. They created over 20 million audio samples (about 50,000 hours of sound) from different open-source audio collections, labeling the sounds with descriptions like “happy voice” or “jazz saxophone.” By doing this, Fugatto could learn how to combine these sounds in new ways.

Creating New Sounds with ComposableART

What’s even more exciting about Fugatto is its ability to create entirely new sounds using a system called ComposableART (Audio Representation Transformation). This system takes text or audio prompts and mixes different sound traits in ways that have never been heard before.

AI Creates Sounds You've Never Heard Before 3

Fugatto’s generated audio (magenta) matches the melody of an input MIDI file (Cyan) very closely. Credit: Nvidia Research

For example, Fugatto can combine a violin sound with the sound of a laughing baby, or turn a banjo into something that sounds like it’s playing in a rainstorm. It can even make factory machinery sound like it’s screaming in metallic agony. Some of these combinations are more convincing than others, but the fact that Fugatto can even attempt them is groundbreaking.

Tuning Sounds to Be Exactly What You Want

Fugatto doesn’t just create sounds randomly – it gives you control over how the sound is mixed. For example, you can decide how much of one trait, like a guitar, should be present in a mix with water sounds. It can also adjust things like the degree of sorrow in a voice, or make a French accent sound stronger or lighter. This flexibility is a huge step forward in how AI can create and manipulate audio.

More Than Just Fun Sounds: Practical Uses for Fugatto

Fugatto’s abilities go beyond creating wacky or surreal sounds. It can also do more traditional audio tasks, like isolating vocal tracks from music or changing the emotion in a piece of spoken text. It can even detect individual notes in a music file and replace them with a variety of vocal performances or sound effects that match the rhythm.

Nvidia has big plans for Fugatto’s future. They see it being used in everything from prototyping new songs to dynamically changing music in video games or even helping create more targeted ads across different countries. But Nvidia is quick to point out that Fugatto is a tool, not a replacement for creative artists. Just like electric guitars or samplers revolutionized music in the past, AI like Fugatto is opening up new possibilities for artists to experiment with sound in ways they never could before.

Why This Matters

In short, Nvidia’s Fugatto is an incredible step forward in the world of AI and sound. It can generate, mix, and transform audio in ways that were once only imaginable in science fiction. While we’re still a long way from fully understanding all the complex math behind it, Fugatto is already showing us just how far AI can go in reshaping the future of music, sound design, and beyond.

So, whether you’re a musician, sound engineer, or just someone who’s fascinated by the possibilities of AI, Fugatto is definitely something to keep an eye on. The future of sound is here, and it’s full of surprises.

Daily Counter-Intelligence Briefing Newsletter

We will send you just one email per day.

We don’t spam! Read our privacy policy for more info.

 
Do you have a tip or sensitive material to share with GGI? Are you a journalist, researcher or independent blogger and want to write for us? You can reach us at [email protected].

Leave a Reply