Reality (Sound)bites: Sound Tricks from the Film and TV Studio

Jay Rose

On October 7th, the Boston AES was visited by Jay Rose, from the Digital Playroom, to discuss tricks in audio production used in television and film, and how those techniques can be applied to all of the different ends of our industry. This is one of his favorite topics, due in no small part to his more than thirty years of experience in audio post-production. He's a member of the Cinema Audio Society, a past section officer of AES, has written two books on audio production, and writes a monthly column in DV Magazine. In essence, Jay's message was that what you hear in the soundtrack of a television show or movie is very rarely what happened. With dialogue-replacement, Foley sound effects, and synthesized elements, most of the audio is not produced at the same time as the visual elements. By thinking of audio as replaceable, editable events, we can use many of the same production techniques in our studios, whether we're working on music or voice.

Phonetics, the branch of linguistics that deals with the sounds of speech and their production, allows us to think of speech as a combination of discrete events, almost sub-syllables, known as phonemes. By editing at the phoneme level, rather than the word level, we can alter a recording in much finer detail, creating words that didn't exist in the original. It's comparable to editing music at the beat or sub-beat, rather than editing at the end of the 4-bar phrase, and thinking of music at the level of those sub-beats can allows us much finer control over the production.

Other principles of phonetics can apply to more than just voice, too. Jay had us listen to a real world example of editing a female voice using the unvoiced consonants from a male voice. Because the unvoiced consonants are essentially bursts of noise with no inherent tone, the vocal range of the speaker doesn't matter, and we can ignore what the actual source of the sound was. By thinking more in terms of noise generators and filters - the epiglottis and the mouth and sinus cavities, for instance - we can focus on the tones that make up the sound, rather than the mechanism by which they were created. This frees us from getting stuck in conventional ways of production - say, just using EQ and compression on an instrument - so that we can think about the tones themselves and come up with new ways to modify them.

One example of this Jay showed us was the technique of formant shifting. In pitch shifting, frequencies are multiplied. Moving up an octave means doubling all of the frequencies of the sine waves that make up the sound (moving tones at 100 Hz and 500 Hz to 200 Hz and 1 kHz respectively). What results is the same sound, an octave higher, but retaining the same characteristics of the original due to the ratios between the tones remaining the same (1:5). In formant shifting, frequencies are modified by addition. This changes the ratios, drastically changing the character of the sound (adding 100 Hz to the tones above moves them to 200 Hz and 600 Hz, changing the ratio to 1:3). With voice, this completely removes the vowel sounds (which are really just chords made up of different sine waves), and the result sounds like a whisper. By using our knowledge of mathematics and physics and the related fields in audio, acoustics and psychoacoustics, we can modify sounds by thinking about how sound works, rather than how processing works.

Jay showed us several other examples: turning equalizers into oscillators by increasing the Q until the input excites the filter enough to ring - his example included turning a voice into a talking harp, like a vocoder with no pitch input; modifying the envelope of a gunshot to shorten it to a small caliber shot or increasing it to an explosion - just by modifying the attack and decay times of a compressor or expander; and creating a digital processor (both in a hardware unit like the Orban Orville and in a software processor on his laptop) to make an audio clip sound like it came from a classroom film projector. This last, he accomplished using the same techniques - by thinking about how the audio is generated in the film projector, he was able to create processing to accomplish the same thing: audio is on the film optically, and is picked up by shining a light through the film to a photovoltaic sensor - it's low-bandwidth, compressed, and noisy. A noise generator, a few filters, and a compressor, and it was sounding almost right. Then, add in the wow of the uneven reels, the flutter of the film being stopped by the gate every 24th of a second, the gate noise itself, and the 60 Hz hum of the lamp, and you've got a processor - all from thinking about the sound from the physics that created it.

Jay Rose reminds us that we're audio engineers for a reason - knowing the engineering and physics of how sound works makes us better operators than those that just twist knobs until it 'sounds right'. While the audio varies from voice to music to sound effects, the physics behind the sound waves and how we hear doesn't change, and the techniques can be applied anywhere.

--Dan Rose