This appendix documents the questionnaire given to participants in the subjective evaluation. It ran online as a MUSHRA test; the listening questions themselves are available here. The test was originally written in Spanish — this is a faithful English translation of each section.
Introduction & control questions
Introduction
Welcome to our subjective test on intelligibility and the synthesizer’s ability to imitate the human voice. Thank you for taking part.
The aim is to assess the automatic control of an articulatory synthesizer, the Pink Trombone — a tool that mimics human speech by controlling the tongue, lips, vocal cords, and other articulators. Before you begin, get a feel for it at dood.al/pinktrombone.
We want to verify how accurately the Pink Trombone reproduces human vowel sounds that can be correctly interpreted, and how well it imitates a given recording. Please note:
- Some questions ask you to evaluate the interpretability of certain vowels or vowel sequences. Judge the vowels by Spanish phonology (five vowels). Do not judge the synthesizer’s quality — sounding robotic is normal and acceptable.
- Other questions present a human recording as reference and ask you to evaluate Pink-Trombone syntheses. Rate the ability to imitate (as an actor would), not the literal resemblance — as if a person were imitating another without being able to change their own vocal cords, throat, or tongue.
- Rate each attempt from 0 (not interpretable / very poor imitation) to 100 (completely interpretable / very good imitation).
- There are 21 tests in total, taking about 10–15 minutes. Use headphones in a quiet environment if you can.
Questions? Contact mateo.camara@upm.es.
Screening questions
- Age group: 18–24 · 25–34 · 35–44 · 45–54 · 55–64.
- Acoustic environment right now: quiet room · noisy room · public space.
- Hearing: normal · suspected impairment in one or both ears · diagnosed impairment.
- Audio-engineering background: professional in the field · enthusiast outside the field · no special attention to it.
- Training example. A sample of what you will encounter, to check you can hear the sounds. Take as long as you need; read each prompt carefully. The slider records your rating, 100 best and 0 worst.
Set 1 — interpret a static vowel
Rate how strongly each sound resembles the indicated vowel, from 0 (not that vowel at all) to 100 (perfectly interpreted). (Questions available online.)
Set 2 — interpret a sequence of vowels
Rate how well you perceive the indicated sequence of vowels, from 0 (not at all) to 100 (perfectly interpreted). (Questions available online.)
Set 3 — interpret imitations
Evaluate how well each synthetic sound imitates a human reference. The sounds can not be identical — judge coherence, as if a person were imitating another without being able to change their own vocal tract. Listen to the reference first, then rate from 0 (very poor imitation) to 100 (very credible). (Questions available online.)