This
The laryngeal contrast of the stop obstruents /p t k/ vs. / b d ɡ/ in English is usually expressed with the help of two phonetic features: aspiration and voicing. These features are not equally active in cueing the contrast in every position. Phonetic voicing (phonation), i.e., the vibration of the vocal folds, plays a more limited role, it is usually only present when the stops occur between vowels or sonorant consonants (e.g., rebel, elbow, rider, bandit, cargo, ugly, etc.). In all other environments, stops are typically unphonated. In these positions, stops with the same place of articulation are contrastive mainly due to the presence or absence of aspiration (e.g., pike–bike, try–dry, cool–ghoul, etc.) – hence the classification of English as an “aspirating” language (Jansen 2004). We will follow the traditional English phonetic and phonological literature in referring to the two contrastive obstruent classes as fortis (‘voiceless aspirated’) and lenis (‘voiceless unaspirated’).
Since the classic work by Lisker and Abramson (1964) on the typology of laryngeal contrast in word-initial stops, aspiration has been frequently defined with the help of the timing relation between the release of the stop and the onset of phonation of a following voiced sound (a vowel or one of the approximants /l r j w/), which is referred to as Voice Onset Time (VOT). English stops can be grouped into two VOT categories: (i) long-lag VOT (aspirated-voiceless, or fortis) vs. (ii) short-lag or zero VOT (unaspirated-voiceless, or lenis). The cutoff point between long-lag vs. short-lag/zero VOT is conventionally placed at 35 ms (Keating 1984). Thus, a stop is said to be aspirated when the time between its release and the phonation of the following sound is more than 35 ms.
The VOT measurement usually includes the release noise. The release noise of a stop is acoustically different from the noise of aspiration, however. In its spectrum, a long range of frequencies have high intensity,
The well-known distribution of aspiration of fortis stops in standard British English can be summed up as (1) below (see, among many others, Nádasdy 2003; 2006):
Thus, the underlined stops in (2) are aspirated as they are word-initial (these and the examples below are from Nádasdy 2003, 20):
Note that here we treat word-initial stops followed by an unstressed vowel (políte, togéther, collápse etc.) just as aspirated as those followed by a stressed vowel.
The underlined stops in (3) are also all aspirated as they are in the initial position of a stressed syllable:
However, the underlined stops in (4) are not aspirated because they are neither at the beginning of the word nor at the beginning of a stressed syllable:
Every textbook on English phonetics and phonology mentions one regular exception to the above:
Therefore, the following underlined stops are not aspirated even though they are word-initial or in a medial stressed syllable:
The stops standing after /s/ in (6) can only be voiceless and unaspirated (i.e., lenis), therefore, the contrast between fortis and lenis stops is neutralized here (see, for instance, Cruttenden 2014, 47). This means that the stop after /s/ in speak for instance is supposed to be the same phonetically as the b in beak.
Note that it is possible for stops to be aspirated after /s/, but when this is the case, it usually signals a word boundary (thus, aspiration can be thought of as one way to cue word boundaries for speakers). All the stops following /s/ in these words are usually aspirated:
The well-known textbooks and dictionaries only mention the post-/s/ position as neutralizing but the question arises whether it is a “specialty” of /s/ only, or perhaps there is a general incompatibility between aspiration and all fricatives, not just the alveolar sibilant. To find an answer to this question is, however, not easy as the only other fricative+voiceless stop morpheme-internal cluster in English is /ft/, and even this cluster occurs in just a few words:
There are no words beginning with /ft/. Most of the words have /ft/ in word-final position or followed by an unstressed syllable – in these we do not expect aspiration. The only words really (marked with bold in (8)) which we may use to test the presence vs. absence of post-/f/ aspiration are fifteen and the British English pronunciation of lieutenant (Gimson-IPA: /lefˈtenənt/), as they are the only words that have /ft/ followed by a stressed syllable, and therefore may potentially be aspirated. According to Wells (2000), the word caftan (alternative spelling kaftan) has the primary stress on the first syllable in standard British English (Gimson-IPA /ˈkæftæn/ or /ˈkæftɑːn/ ), but he gives the alternative transcription /kæfˈtæn/ for General American English (the other, supposedly mainstream, pronunciation in GA is /ˈkæftən/). Consequently, any investigation into aspiration after fricatives other than /s/ can only really use two test words: fifteen and lieutenant.
The experiment presented in this paper was a pilot experiment, as such, its main purpose was to test the feasibility of future, larger-scale research of the aspiration of English stops after fricatives other than /s/. The experiment only included one subject, and therefore, the results must be treated with caution as they cannot be used to estimate the true value of aspiration in the population of English speakers because no inferential statistics can be calculated based on only one speaker. Nonetheless, the results from this speaker can be used to describe tendencies and signal ways of research in this area in the future.
The subject of the experiment was a young male (in his early 20s), a university student, a native speaker of what we may call contemporary standard Southern British English. He was not aware of the purpose of the study, was not paid for it, and he agreed to using the material for phonetic research.
The pilot experiment discussed in this paper investigated the VOTs of the following sounds in the following positions (uppercase “V” stands for a stressed vowel, while lowercase “v” stands for an unstressed vowel):
The following test words were used:
The test words were placed in carrier sentences, whose lengths were kept relatively stable for the three main positions listed above. The vowels following the stops were also the same per position: /eɪ/, /aɪ/, and /iː/. The words with the word-initial target stops were all utterrance-initial (9); all the other test words were in absolute-final position (10)–(11). The sentences were all declarative, with a similar intonation structure. All these aimed to ensure that sentence length, vowel quality, and sentence type did not potentially influence the target variable of the experiment, the VOT.
The subject read the test sentences and additional filler sentences from a monitor screen in a randomized order, which was generated by SpeechRecorder.
The experiment contained 20 test words, which were repeated 4 times, and so there were altogether 80 observation scores that could be used for analysis.
The segmentation of the test words and the VOT measurements were carried out in Praat, version 5.4.18 (Boersma and Weenink 2015). The VOT intervals were created manually with the help of the waveforms and spectrograms. The left boundary of each VOT interval was between the silent stop-phase and the start of the release noise. The right boundary of each VOT interval ended just before the first period of the following vowel’s periodic waveform and its formants were visible (Figure 1). A Praat-script was used to automatically write out the VOT values in milliseconds into a data set table. The descriptive statistical analyses of the VOT scores, including graphs (boxplots and barcharts), were carried out using R, version 3.3.1 (R Development Core Team 2008).
Table 1 below summarizes the mean VOT values and standard deviations from each mean for each environment, as well as the medians.
Env’ment | Test word | Mean VOT (ms) | St. dev. | Median |
---|---|---|---|---|
#pV | páces | 54.25 | 4.03 | 54.0 |
#bV | básic | 11.25 | 2.22 | 11.0 |
#tV | táble | 75.00 | 14.88 | 72.0 |
#dV | dáta | 11.25 | 2.75 | 11.5 |
#kV | cáble | 55.75 | 6.18 | 53.0 |
#gV | Gáble | 16.00 | 2.16 | 16.5 |
Vpv | píper | 35.75 | 12.55 | 33.5 |
Vbv | bríber | 14.75 | 8.18 | 14.5 |
Vtv | wríter | 79.75 | 3.20 | 81.0 |
Vdv | ríder | 11.25 | 5.56 | 11.0 |
Vkv | híker | 55.25 | 6.45 | 54.5 |
Vgv | tíger | 19.25 | 4.86 | 20.0 |
vtV | fourtéen | 87.75 | 5.62 | 87.0 |
Vtv2 | fórty | 90.75 | 6.13 | 90.5 |
ftV | fiftéen | 38.50 | 24.53 | 28.5 |
Vft | fífty | 30.75 | 7.63 | 29.5 |
stV | sixtéen | 17.75 | 6.65 | 17.5 |
stv | síxty | 30.75 | 2.50 | 32.0 |
ntV | seventéen | 85.50 | 14.01 | 84.0 |
vntv | séventy | 85.50 | 14.80 | 85.5 |
Figure 2 shows the range of VOT duration values for each environment (dots represent extreme values). The interrupted lines separate the data into three environments: (i) word-initial before a stressed vowel (/p b t d k ɡ/), (ii) medial after a stressed vowel (/p b t d k ɡ/), and (iii) medial-/t/ before or after a stressed vowel in 4 positions: intervocalic, after the voiceless fricatives /f/ and /s/, and after /n/. As discussed above, the conventional cutoff point between zero or short-lag VOT (no aspiration) and long-lag VOT (aspiration) is 35 ms (Keating 1984; Jansen 2004), this is shown by the horizontal blue line.
The bar charts below show the mean VOT durations for each environment (error bars represent the standard deviations from the means):
Based on the two figures, the word-initial fortis stops /p t k/ can be considered aspirated, all three have relatively long VOT scores (well above the 35-ms threshold). The lenis stops /b d g/ are not aspirated, all having relatively low VOT durations.
In the medial, posttonic position the subject articulated the fortis stops /t/ and /k/ with a similar VOT as in word-initial position. The length of VOT in the case of the labial fortis stop /p/ was shorter, values ranged between 24 ms and 52 ms (mean: 35.75 ms, standard deviation: 12.55), thus his aspiration of the second /p/ in píper showed some variation between unaspirated and aspirated tokens. Note that most descriptions of standard British English describe fortis stops in an unstressed syllable as unaspirated. The subject in this experiment was clearly aspirating (the non-labial) fortis stops in unstressed syllables, too. Figure 4 shows the spectrogram of one of the articulations of the test word híker, in which the medial /k/ has a long-lag VOT:
Let us turn now to medial /t/. In intervocalic position, both before and after a stressed vowel, /t/ had a long VOT. As expected, /t/ exhibited a short-lag VOT after /s/ (both before and after a stressed vowel); thus, we can characterize the stop here as unaspirated.
The other post-fricatival position showed variation, especially the ftV position (fifteen). Most scores had a relatively short-lag VOT (around or below 35 ms), this suggests that /t/ after /f/ has a very similar realization as post-/s/ /t/: it is unaspirated. However, in one realization of fifteen, the VOT of /t/ was rather long: 75 ms (this score is the extreme value in the boxplot above), thus, this /t/ was clearly aspirated. Figure 5 shows two realizations of fifteen: short-lag VOT/unaspirated and long-lag VOT/aspirated:
The post-nasal position (seventeen–seventy) behaved just like the intervocalic position: the /t/ had a long-lag VOT here, it was aspirated, both before the stressed and the unstressed vowel.
There is one important note in order here. The post-release realization of the voiceless portion before the onset of voicing of the following vowel showed acoustic differences between the fortis stops: in the case of /t/, the noise often indicated the presence of affrication, an [s]-like spectrum, rather than a [h]-like spectrum. This [s]-like frication noise was often relatively long, which is probably the reason for the VOT values of /t/ being relatively long as well. As Figures 2 and 3 above show, it was always /t/ that had the longest VOT in all the non-pre-fricatival environments, /t/ clearly showed a positively skewed VOT. It is debatable whether this noise portion should be at all considered as aspiration or as part of a VOT rather than a [s]-like frication release following stop closure, hence an affricate [ts]. In some cases even the stop closure phase seemed to be absent and only a long, [s]-like fricative was observable (Figure 6). More thorough acoustic research should uncover the spectral properties of the post-release/pre-voicing noise: if it turns out to be affrication, it should be treated differently from aspiration. The fact that the subject always articulated /t/ with a relatively long noisy portion regardless of the stressing of the following vowel suggests that we are perhaps dealing with a general affrication of /t/ here. It is noteworthy, however, that just like aspiration, this supposed general affrication is very short after /s/ (and to a certain extent, after /f/) – which indicates that aspiration and affrication behave similarly after fortis fricatives. Again, further acoustic experiments must uncover if the release noise of /t/ vs. /p/ and /k/ has different spectral properties and length after /s/.
The results of the pilot experiment presented in this paper have shown that a larger-scale experiment on post-fricatival aspiration of stops in English is a worthy endavour to pursue. First of all, the results indicate that stops tend to be unaspirated after /f/, too, just like after /s/. The variation may be greater after /f/ than after /s/, however. A possible reason for this variation may be the very low number of words that contain (non-final) /ft/: unlike in the case of /s/+stop clusters, there is no established pattern to follow for speakers in words with /ft/, and so, there is no clear categorization (aspirated vs. unaspirated) of the stop here, although the results of this pilot experiment indicate that deaspiration is perhaps the preferred choice after /f/, too.
Secondly, the question still remains why it is /s/ after which aspiration is generally not allowed, stops in this position must be lenis, with a short-lag VOT. If a larger-scale experiment shows that this also includes /f/, then the question may be rephrased like this: what makes fricatives and aspiration incompatible with each other? Most explanations of the distribution of aspiration have relied on syllable structure (see our own definition in (1)). This paper does not wish to evaluate those approaches in detail here, we just mention one problem with the syllabic approach. The syllabic explanation claims that stops are not aspirated after /s/ because they are in fact not word- or syllable-initial: it is /s/ which is inital, in other words, /s/ and the stop are tautosyllabic: port vs. sport, en.cóu.rage vs. di.scóver (syllable boundaries are indicated by dots). This syllabification – even though it violates the sonority sequencing principle: /s/ is more sonorous than the stop – may be supported by the fact that /s/ + stop clusters do occur word-initially, too. However, the syllabification fi.ftéen is difficult to back on these grounds as there are no words beginning with /ft/.
Non-syllabic phonological models, such as phonetically-grounded approaches, especially those relying on perception of contrast, exemplar models and analogical models seem to fare better at explaining why aspiration and fricatives are incompatible with each other. A perception-based approach that seems to be worth pursuing hypothesizes that the turbulent noise that acoustically characterizes fricatives, especially sibilants like [s], is similar to that which appears during aspiration, and consequently, as Silverman (2006) writes, “it is probably not so easy to reliably distinguish [st] from [stʰ] in running speech, and languages may tend to eliminate this contrast should it arise, especially within a word, rather than between words” (Silverman 2006, 176–177). The result of the present experiment that aspirated /t/ is often affricated as [ts] and even replaced as [s] supports the idea that fricative noise, especially that of sibilants, is similar to aspiration noise.
Some of the future research questions that are worth pursuing within these, “usage-based” frameworks are the following: why is it the sibilant fricatives that seem to be most incompatible perceptually with aspiration?,
Boersma, Paul and David Weenink. 2015. “Praat: Doing Phonetics by Computer.” www.praat.org/.
Cruttenden, Alan. 2014. Gimson’s Pronunciation of English (8th Edition). London & New York: Routledge.
Docherty, Gerard J. 1992. The Timing of Voicing in British English Obstruents. Berlin & New York: Foris.
Gefferth, Katalin. 1997. “The Distribution of Aspiration in English.” The Odd Yearbook 4: 3–11.
González, José Antonio Monpeán. 2006. “The Phonological Status of English Oral Stops After Tautosyllabic /s/: Evidence from Speakers’ Classificatory Behaviour.” Language Design: Journal of Theoretical and Experimental Linguistics 8: 69–101.
Gussmann, Edmund. 2002. Phonology: Theory and Analysis. Cambridge: Cambridge University Press.
Harris, John. 1994. English Sound Structure. Oxford & Cambridge, MA: Blackwell.
Jansen, Wouter. 2004. “Laryngeal Contrast and Phonetic Voicing: A Laboratory Phonology Approach to English, Hungarian, and Dutch.” Doctoral dissertation, Rijksuniversiteit Groningen.
Kaye, Jonathan D. 1992. “Do You Believe in Magic? The Story of s+C Sequences.” SOAS Working Papers in Linguistics & Phonetics 2: 293–313.
Keating, Patricia A. 1984. “Phonetic and Phonological Representation of Stop Consonant Voicing.” Language 60: 286–319.
Ladefoged, Peter and Ian Maddieson. 1996. The Sounds of the World’s Languages. camox: Blackwell.
Liberman, Alvin M., Pierre Delattre, and Franklin S. Cooper. 1952. “The Role of Selected Stimulus-Variables in the Perception of the Unvoiced Stop Consonants.” The American Journal of Psychology 66: 497–516.
Lisker, Leigh and Arthur Abramson. 1964. “A Cross-Language Study of Voicing in Initial Stops: Acoustical Measurements.” Word 20: 384–422.
Nádasdy, Ádam. 2003. Practice Book in English Phonetics and Phonology. Budapest: Nemzeti Tankönyvkiadó.
Nádasdy, Ádám. 2006. Background to English Pronunciation. Budapest: Nemzeti Tankönyvkiadó.
R Development Core Team. 2008. R: A Language and Environment for Statistical Computing. Vienna: Foundation for Statistical Computing. www.R-project.org.
Silverman, Daniel. 2006. A Critical Guide to Phonology: Of Sound, Mind, and Body (Contiuum Critical Introductions to Linguistics). London & New York: Continuum.
Wells, John C. 2000. Longman Pronunciation Dictionary. Harlow: Longman/Pearson Education.
Wingate, Anne H. 1982. “A Phonetic Answer to a Phonological Problem.” UCLA Working Papers in Phonetics 54: 1–27.
Wright, Richard. 2004. “A Review of Perceptual Cues and Cue Robustness.” In Phonetically Based Phonology, edited by Bruce Hayes, Robert Kirchner, and Donca Steriade, 34–57. Cambridge: Cambridge University Press.
Zuraw, Kie and Sharon Peperkamp. 2015. “Aspiration and the Gradient Structure of English Prefixed Words.” In Proceedings of the 18th International Congress of Phonetic Sciences, edited by The Scottish Consortium for ICPhS 2015, Paper number 0382.1–5. Glasgow: University of Glasgow.