Foley-VAE

Foley-VAE: generating film sound effects with AI

Mateo Cámara, José Luis Blanco

Universidad Politécnica de Madrid

Abstract

We present an interface based on Variational Autoencoders, trained on a wide range of natural sounds, for the creative generation of Foley effects. The model runs in real time, transferring new sonic characteristics onto pre-recorded audio or live microphone input, and exposes its latent variables for precise, personalized artistic control. Building on our earlier VAE study and on the RAVE architecture, we trained a model specifically for sound-effect production — generating everything from electromagnetic and science-fiction textures to water and more.

A first in Spanish cinema. Foley-VAE was used to create the sound effects of the first Spanish short film with AI-assisted Foley — a concrete demonstration of how generative audio can open new creative possibilities for film sound.

Watch

How it works

The system extends RAVE, a real-time variational autoencoder, trained here on a large library of natural sounds. Because the model is generative and operates on a latent space, you can:

Listen

The first grid pairs original footstep recordings with their reconstructions. The second presents new effects generated by blending the latent characteristics of two materials.

Reconstructions

Footstep Foley recordings on different surfaces, each passed through the VAE and reconstructed.

ExampleOriginalReconstructed
Wood 1
Wood 2
Metal 1
Metal 2
Stone 1
Stone 2
Fabric 1
Fabric 2
Earth 1
Earth 2
Other 1
Other 2

Generated material mixes

New Foley textures created by blending the latent characteristics of two materials.

ExampleGenerated mix
Asphalt + wood
Asphalt + mud
Asphalt + wood
Carpet + grass
Carpet + wood
Carpet + water
Mud + rocks
Mud + wood
Grass + gravel
Grass + wood
Wood + snow
Wood + puddle
Wood + linoleum
Marble + wood
Metal + concrete
Metal + wood
Metal + puddle
Wood + rocks