Mateo Cámara

Parameter Optimization for a Physical Model of the Vocal System

Authors: Mateo Cámara, José Luis Blanco and Joshua D. Reiss

Abstract

We evaluated black-box and grey-box optimization techniques on the parameters of a simplified physical synthesizer, the Pink Trombone, to emulate both male and female vocal tract characteristics for vocalic and non-speech sounds. Leveraging prior research, we utilized a genetic optimizer and Mel-spectrogram representations to infer the articulatory parameters from human recordings through a direct spectral comparison with the synthesizer’s output. Optimization was carried out over temporal windows, introducing variations to the state-of-the-art objective metric to ensure temporal coherence across the synthesized signal. We also investigated grey-box approaches, utilizing algorithms such as pYIN for fundamental frequency prediction and developing a ResNet-based neural network as a foundation for the optimization process. Subjective tests validated our methodology and confirmed that the synthesizer effectively mimics human vocal sounds. Results demonstrate a marked superiority over previous state-of-the-art techniques, proving its practical utility and accuracy in real-world conditions. Moreover, these subjective evaluations allowed us to fine-tune the established perceptual metric ViSQOL, providing a calibrated tool for future research to assess auditory metrics in the domain of physical synthesizer modeling.

Video

Demo Video

ViSQOL Pink Trombone SVR Model

To use the Pink Trombone - ViSQOL model, first you should download the model:

Download SVR model

Next, you should check out the ViSQOL user guide. Once ViSQOL is installed, to use the model you should run the following code:

This will render as a formatted code block in the Markdown file:

./bazel-bin/visqol --reference_file ref1.wav --degraded_file deg1.wav --similarity_to_quality_model libsvm_svr_pinktrombone_model.txt

Link to Go Listen app

The questionnaire is available here

Pink Trombone online demo

A demo with all the yawns and some of the vowels is available online. This is just a tiny demonstration; all the audio samples used in the study are presented below.

Link to the interactive non-exhaustive demo

Audio Samples

/A/ sound (original files not presented in test)

Masculine original Femenine original Old method femenine New method femenine Old method masculine New method masculine

/E/ sound (original files not presented in test)

Masculine original Femenine original Old method femenine New method femenine Old method masculine New method masculine

/I/ sound (original files not presented in test)

Masculine original Femenine original Old method femenine New method femenine Old method masculine New method masculine

/O/ sound (original files not presented in test)

Masculine original Femenine original Old method femenine New method femenine Old method masculine New method masculine

/U/ sound (original files not presented in test)

Masculine original Femenine original Old method femenine New method femenine Old method masculine New method masculine

/A-E-I-O-U/ sound (original files not presented in test)

Original 0 penalization 1 penalization 2 penalization 4 penalization

/E-I-U/ sound (original files not presented in test)

Original 0 penalization 1 penalization 2 penalization 4 penalization

/ROY/ sound (original files not presented in test)

Original 0 penalization 1 penalization 2 penalization 4 penalization

Femenine /ROY/ sound (original files not presented in test)

Original 0 penalization 1 penalization 2 penalization 4 penalization

p10

Original No initialization Using multiespectral error Using Neural Network Using pYIN

p11

Original No initialization Using multiespectral error Using Neural Network Using pYIN

p12

Original No initialization Using multiespectral error Using Neural Network Using pYIN

p13

Original No initialization Using multiespectral error Using Neural Network Using pYIN

p14

Original No initialization Using multiespectral error Using Neural Network Using pYIN

p15

Original No initialization Using multiespectral error Using Neural Network Using pYIN

p16

Original No initialization Using multiespectral error Using Neural Network Using pYIN

p17

Original No initialization Using multiespectral error Using Neural Network Using pYIN

p18

Original No initialization Using multiespectral error Using Neural Network Using pYIN

p19

Original No initialization Using multiespectral error Using Neural Network Using pYIN

p20

Original No initialization Using multiespectral error Using Neural Network Using pYIN

p21

Original No initialization Using multiespectral error Using Neural Network Using pYIN

References

If you use this project in your research, please include the following citation:

[Once published! :)]