Novel Sound Synthesis Software Development for Researchers (and Musicians)

Our laboratory’s interest in general principles of perception requires sound synthesis tools that can extend the same precise control to both speech and nonspeech stimuli. Yet, existing speech synthesizers have generally been quite complex (with respect to the sheer number of controllable parameters) and closely tied to articulatory constraints, which make them less-than-optimal for nonspeech/musical synthesis. Likewise, commercial music synthesizers often have included only a limited number of filters that do not permit appropriate modeling of many natural sound sources, and also have not permitted the kind of exacting control that one would want for scientific use.

Formant filter concepts (resulting in distinct/intense peaks in the auditory spectrum), which have long been associated with speech synthesis and perception, also have been shown to be applicable to the synthesis and perception of musical tones. As a result, the laboratory has gradually developed software tools that could be used by psychoacoustic researchers to generate simplified speech and musical stimuli that reflect identical synthesis architecture while retaining tight experimental control. Furthermore, these software tools have been realized as virtual musical instruments that can be expressively played. As a result, these tools may also be of interest to electronic musicians.

More information about specific software devices (and related projects), including downloads.

Spectral Contributions to Timbre and Categorization

A traditional theory simplifies speech production to two processes: 1) vibration from a sound source that is passed through 2) a filter, specifically, one or more cavities with resonant frequencies determined by their size and shape. A primary thread of research within the laboratory over the past several years has been motivated by wanting to fully understand the effects of source and filter interactions in the production of both speech phonemes and musical tones. Much of this work has focused on either determining how minimal spectral (i.e., frequency-based) information that still reflects filter resonances (e.g., a few harmonics at critical frequencies) can be sufficient to permit sound identification, or alternatively, revealing how perception can suffer when energy from the source does not coincide with filter resonances. A related area of interest is in showing how the perception of other auditory attributes often is dependent upon spectral aspects of timbre. A few examples follow.

Perceptual Interaction of Timbre and Pitch

Musical instrument timbre has long been known to interact with pitch (e.g., poorer pitch identification or discrimination accuracy when timbre varies) within a variety of phenomena and paradigms (e.g., speeded classification given variation along both pitch and timbre dimensions, pitch of the missing fundamental, and the tritone paradox), particularly for listeners who have not received much training in performance on a musical instrument. Yet, not much work has been done to isolate the aspects of timbre that are responsible for such interactions.

The lab has sought to address this problem by directly manipulating certain aspects of timbre to see if they impact pitch judgments. Work thus far has revealed that changes in the “spectral centroid” (i.e., the average frequency from an amplitude-weighted spectrum), which often are implicated in the perceived brightness of tones that, like pitch, vary from low to high frequency, are sufficient to alter reported pitch in predictable ways.

Ongoing efforts are extending this approach to other common demonstrations of timbre-pitch interaction in order to determine if there is a possible general perceptual explanation for their occurrence. Also, we are closely examining aspects of musical training that may be particularly relevant to minimizing the effect of timbre on pitch, which includes needed improvements in the rapid assessment of a listener’s musical experience/training.

Factors Influencing Phoneme Categorization

Research in phoneme perception in speech has traditionally focused on perceptual effects of the vocal tract/filter, whereas relatively little work has examined how the pattern of energy from the vocal folds/source interacts with the filter. Our laboratory has been specifically interested in the perceptual consequences of this interaction. Software tools developed in-house have improved synthesis control over source parameters in the generation of vowel stimuli. Data from experiments involving vowel identification and goodness-ratings suggests that listeners might specifically listen for the resonant frequencies of the vocal tract (indicated by formant center frequencies, narrow bands of intense frequencies) rather than entire spectral shapes, and that such areas will only be conveyed when harmonic energy happens to occur at or close to those regions.

Categorical Perception

The laboratory’s experiments on categorical perception, one of the oldest and most studied effects in speech, has indicated that consonant perception often depends critically on the relation between formant information from consonant and subsequent vowel. Specifically, the direction of rapid frequency changes can be sufficient to determine the consonant, representing a potential psychoacoustic explanation of some common demonstrations of categorical perception.

Collaborative Efforts

Distinguishing Change Deafness from Other Phenomena
This collaboration with Jeremy Gaston of the Army Research Laboratory and Kelly Dickerson explores whether the processing mechanisms involved in change deafness (in which above-threshold stimuli remain undetected, at least for an unexpectedly long time) are distinct from corresponding demonstrations of change blindness in vision. We are developing paradigms that can distinguish between change deafness and other perceptual issues, such as problems with encoding events in complex arrays of sounds. Initial results have indicated that stimulus encoding difficulty is a heavy contributor to what has previously been labeled as change deafness, and that true instances of change deafness do not seem to follow the constraints that have been observed regularly in vision, at least not for changes in perceived location. Subsequent work has examined how change deafness can be effectively distinguished from the related phenomenon of inattentional deafness.

Acoustic Attributes Involved in the Perception of Foreign Accent
Factors argued to be predictive of perceived foreign accent in speech have typically involved perceptual judgments about naturally utterances in the absence of true experimental control. This cross-laboratory project with Kit-Ying Chan (City University of Hong Kong) and Lily Assgari (University of Louisville) has instead re-synthesized vowel, consonant-vowel, consonant-vowel-consonant, and word stimuli to isolate and manipulate particular characteristics (e.g., formant center frequencies in vowels, as well as a variety of temporal parameters), and then present the resulting tokens to listeners for a variety of perceptual judgments to verify/disconfirm which parameters contribute to the perception of foreign accent. Findings have confirmed the role of only a subset of acoustic characteristics that were previously identified as important from previous studies of speech production, and also have shown that the perception of foreign-accented vowels reflect not only unusual spectral characteristics, but also can often be due to productions in overlapping acoustic regions between adjacent vowel categories.

Influence of Skull Resonances on Listening Preferences
Michael Gordon (William Paterson University, New Jersey) initially found some support for the notion that listeners prefer music that matches the resonant frequencies of their own skull (by manipulating musical key). Our laboratory, in conjunction with Michael Gordon and the Army Research Laboratory, has helped to further test this concept through direct manipulation of timbre information by increasing or decreasing intensities in frequencies that either matched or didn’t match the listener’s skull resonances. So far we have been able to establish that there are quite large individual differences in the responses of the skull across frequencies. Perceptual data obtained has confirmed that these individual patterns correlate with bone conduction thresholds, and has suggested that there may be a preference against listening to music samples that reinforce the listener’s resonant frequencies from his/her own skull.

Back to Top