Workshop on practical and instrumental phonetics

Banto1d, 23 March 2022
Universität Hamburg

Matthew Faytak (Univ. at Buffalo)
Katie Franich (Univ. of Delaware)

Overview

Part 1:

Part 2:

Part 3:

Regular breaks will be interspersed

About the slides

These slides are a web page

The slides are hosted here on GitHub

References

Nearly all references mentioned during the workshop are linked at the end of the slides

Introduction

The situation

Africa has long been regarded as central to phonetic description Ladefoged (1968), Maddieson & Sands (2019)

But relatively little of this scholarship involves African scholars

Participants

We presume that you, the participants, are:

If you are not in this group, we ask that you prioritize those in this group for questions and feedback

All of us

Let’s assume that we are all committed to:

Definitions

This workshop is an introduction to instrumental phonetics

“Phonetic” work may also refer to non-contrastive subphonemic detail

Why instrumental phonetics?

Several practical advantages over impressionistic approaches

Neutrality

In impressionistic phonological description, all presentation of data is filtered through the worker’s theoretical analysis

Phonetic recordings allow better testing of hypotheses about phonological structure

Neutrality

Even a trained phonetic ear is prone to making occasional mistakes based on perceptual bias

Precision

Not all contrasts can be easily described by the analyst’s ear, and especially transcribed quickly in the moment

Recordings allow for careful listening later

Community use

Recordings are required for instrumental phonetic work: many incidental benefits

Complementary methods

The aim is not to displace impression-based methods, but to complement them

Recording acoustic data

Desired qualities

We always want acoustic speech data to be:

Certain details of format are also important:

Good recording

Here is an example of a good recording

The following slides contain recordings which fail on one of the points above

Too noisy

Recordings should not contain excessive background noise

Any noise, however quiet to your ears in the moment, will be much louder in the recording later

How to improve

Listen carefully to your surroundings, and avoid:

Speaker should also minimize non-speech noise:

Too much echo

If echo is strong, speech ends up overlapping itself; problem for listening and analysis later

How to improve: listen for echo and choose surroundings which have less

Too loud (clipped)

If the speaker is too loud and/or too close to the microphone, the device cannot respond enough; clipping results

How to improve: make test recordings after you position your microphones

Equipment

At the most basic, you only need something to make recordings on. One of the below:

A way of transferring files off of the device and backing up for future analysis:

Headphones, to check recording quality

Equipment

External microphones can increase the quality of acoustic data by recording less echo and background noise

An acoustic baffle can reduce echo

Other tips

Do not use computer noise reduction/filtering in general

Recording over Zoom works surprisingly well, if all else fails Ge, Mok, & Xiong (2021); Sanker et al (2021)

BREAK

Basic Praat (tutorial)

Downloading and configuring

Download Praat from fon.hum.uva.nl or praat.org

Windows and manual

Both the object window (shown below) and picture window appear when you open Praat (we’ll ignore the picture window for now)

Basic issues can often be solved using the manual

First recording

Let’s record ourselves saying a “good day, good afternoon” greeting in whichever language you would like

Stored sound

You should now have a Sound in your objects window

Saving your recording

We’ll end this by saving our work as a .WAV file (the standard format for phonetics work)

Importing a sound file

We might also import sounds which have already been recorded

Making and using TextGrids

Viewing a sound file

We’ll start by viewing Babanki “hills of bushes”

Viewer window has button and keyboard controls

We may wish to know where the words and segments are, but we lack useful landmarks at this point

Making a TextGrid

TextGrids are one of the most useful features of Praat: annotate and organize your audio files

Interval tiers

Let’s use the interval tier “sentence” and use it to transcribe the utterance

Point tiers

Let’s use the point tier “stop” to mark off where each [t] release happens

Adding and removing tiers

We might be dissatisfied with how the tone marks are displaying; we could make a new tier for tones (autosegmental style)

Amending the TextGrid using the new tier:

Saving TextGrids

Using CTRL+S or the menu shown below, you must save your TextGrid when you are done

Reading and interpreting data

Data displays

We may wish to provide further details in our TextGrids, but we encounter another problem here: how to interpret the data?

Waveforms

Waveforms show sound pressure (the pressure that sound waves make on the microphone) versus time

Sonority

Sounds produced with a more open mouth are louder and more sonorous Parker (2008) ; these are thicker on the waveform

Spectrograms

We can also show the spectrogram for our recording

Spectrograms are three-dimensional, and show time vs. frequency vs. sound pressure (color)

Interpreting spectrograms

Vowels, semivowels, and approximants have characteristic striping, horizontally and vertically

Interpreting spectrograms

Nasals look somewhat like vowels, with “smudged” formants and less energy (darkness)

Interpreting spectrograms

Fricatives have high-frequency or low-frequency noise

Interpreting spectrograms

Stops show an absence of (most) energy followed by a burst across the whole spectrum

Affricates look similar, but as if they were followed by a fricative

Prenasalized segments

Prenasalized stops and affricates often have a long nasal closure followed by a short oral closure which we can see clearly in the spectrogram

Drawing figures

Why make figures?

Figures are an easy way to present a small amount of phonetic data in scientific papers

This section: professional-looking and informative data displays

Picture window

Waveforms and spectrograms can be “drawn” or “painted” (respectively) in the picture window

Drawing a waveform

“Draw” (in the object window) is for any line-based drawings, including waveforms

The result: a waveform drawn within the plot area

Drawing a TextGrid

TextGrids can be drawn as well, using the same menu as Sounds

Combining TextGrids and waveforms

A Sound and TextGrid can be drawn together very easily: simply select both and choose the Draw menu as before

Extracting a spectrogram

“Paint” is for drawing spectrograms and other objects, but: we need the right object to do this

Painting a spectrogram

This sends a Spectrogram object to the object list; when we select this we get a “Paint” option under Draw

The result: as expected, but a bit too tall for its width (we might make the plot area wider/shorter)

Spectrograms and TextGrids

Drawing a spectrogram and a TextGrid at the same time is a bit more complicated

  1. Paint the spectrogram, but uncheck “Garnish”
  2. Add the Y axis marks using the “Margins” menu

  1. Add a Y axis label, usually “Frequency (Hz)”

Spectrograms and TextGrids

  1. Resize plot area to be taller than spectrogram (pictured), and Draw the TextGrid

The result: TextGrid annotations on top of the spectrogram

Saving

As with everything else in Praat, you must save before closing the picture window or you will lose your work

BREAK

Measuring phonetic properties

Why numerical measurements?

Now we’ll turn to taking numerical measurements in Praat

More than showing an entire sound file as a waveform or spectrogram, focusing on a specific phonetic property can be useful

Why numerical measurements?

Also lets us measure many utterances and summarize, which also allows us to handle phonetic variation

Because of this it’s best to collect many observations and average or model the data to remove noise and variation

Duration

One of the simplest measures: duration of segments or words

Model use

An example from Babanki (bbk-prenas.wav): proportion of prenasalized consonants which is nasal Faytak & Akumbu (2021)

Model use

Other phenomena studied with duration (not exhaustive):

Pitch (fundamental frequency, f0)

Pitch (and all other measurements which we’ll discuss) have a dedicated menu

Resulting pitch track:

Pitch settings

Settings for pitch are important

Pitch settings

A quick change to pitch range fixes the jump issue

Model use

All tonal phenomena involve f0

Babanki example in :

Model use

Sub-phonemic effects of intonation on tonemes can be examined figure from Rialland & Aborobongui (2016)

Other tone topics (not exhaustive):

Intensity

A measure of loudness, measured in decibels (dB)

Model use

Intensity is useful for measuring degree of constriction

Babanki example: stressed (stems) and unstressed (prefixes) don’t seem to be differentiated by intensity

Model use

Implosive vs. non-implosive voiced stops have different intensity profiles figure from Nagano-Madsen & Thornell (2012)

f0 and intensity figures

f0 (pitch) and intensity tracks can easily be Drawn to figures as seen above

Formant frequencies

Formant frequencies provide vowel quality and other contrasts

Formant frequencies

Turn on “show formants”, and formant tracks for the first three formants (F1, F2, F3) appear

Formant settings

Estimating formant frequencies requires calibration for every individual speaker: low pitched voices need different settings compared to high pitched voices

The default settings work well for higher-pitched voices

Formant tracking

The default settings of 5 formants in 5500 Hz (for higher-pitched voices) don’t work well for the very low-pitched voice of the Babanki speaker:

Better: changed to 4.5 formants, 4200 Hz; much lower frequency range

Formant figures

Formant tracks work best in a plot of F1 against F2 (which is quite hard to make in Praat), but formant tracks can be drawn like any other measure

F1-F2 scatterplots

F1-F2 scatterplots have F2 on the x axis, F1 on the y axis, with both axes reversed

Model use

Better scatterplot figures can be made using tabular data (we’ll discuss later) figure from Faytak & Akumbu (2021)

Model use

Formant data is useful in figuring out harmony systems; ATR or otherwise

Other uses for formant measures (non-exhaustive):

Voicing (“Pulses”)

Praat also detects measures relating to voicing: these are grouped under the unintuitive name “Pulses”

The Pulses menu contains the same “showing”, measure-getting, and drawing functions as other menus

The result: each detected “voice pulse”, shown over waveform (not spectrogram)

Display pulses

The pulses themselves can be plotted with a TextGrid like any other similar object

Voicing report

If voicing has a predictable timing but varies in extent, the voicing report may be useful (access in the Pulses menu)

Model use

Voicing in unexpected places is common for labial-velars; this can be confirmed by looking for pulses figure from Connell (1994)

Other voicing-related topics:

Text output and tabular data

Getting out of Praat

Praat is useful, but it has important limitations

Because of this, we often need to export data from Praat into other programs

Tabular data

The most effective way to export numerical measurements: tabular data, that is, spreadsheets

Google Drive or Excel both work well (.xls, .txt, or .csv format)

Data construction: duration

We’ll use duration to demonstrate the most basic spreadsheet construction

  1. Select an interval
  2. Generate text output in Praat (“Query” menu)
  3. Copy-paste output into spreadsheet
  4. Add metadata

Repeat this process for each of your observations, starting a new row each time

Praat text output

Text output is generated using the menus above the data display

For any measure other than duration, menus behave slightly differently depending on whether your cursor has selected an interval or not

A shortcut exists to get the midpoint of any interval

“Get…” versus “Listing…”

“Get…” provides a single measurement, or a single mean measurement across an interval

“Listing…” provides every single measure within an interval

Exceptions:

Data construction: single pitch

Mean, max, and min f0 measures (from “Get…”) can be copy-pasted into their own cell just like our duration example

Listings (f0 over time) are trickier to paste:

Data construction: pitch listing

We’ll take pitch as an example, but these instructions will work for any listing

  1. Select interval
  2. Generate text output (“Pitch listing”) and copy entire contents of text box
    • If you’ve already got the header line, you don’t need the first line

Data construction: pitch listing

Continued:

  1. Paste into spreadsheet
  2. In paste menu, split columns using “Custom”; use two spaces ( )

  1. Fill in metadata

Times for listings

Unlike before, we need to contextualize the time values and normalize them

The best way to do this:

  1. Count rows in your series and add the total number of samples into each row in a total column
  2. Add an index value in a column indicating which number sample we are on (1, 2, 3 … total)
  3. Calculate percent of total duration by dividing index / total and adding this to a third column norm
  4. Be sure to include –undefined– values as empty cells

Data construction: formant listing

Same as pitch listing, but note there are four columns for F1-F4, plus one for times

Much like above, but with more measurement columns:

Value of listings

Working with listings is more difficult than working with mean measures, but having time series of measurements is valuable

Coding

Beyond the scope of this tutorial, but more efficient in a number of areas: basic coding

While there is a steeper learning curve, the improvement to the process may pay off in the long run

BREAK

Articulatory data

Acoustics vs. articulation

Acoustics gives us an indirect idea of the movements of the articulators

Sometimes, though, we need to look directly at the articulators

Articulation measurement

We point out here that characterizing and measuring some articulations is much easier than usually expected

Lip articulation

Lips are easily seen moving since they are on the exterior of the face

Example: Babanki lip articulation during vowels Faytak & Akumbu (2021)

Model use

Lip activity during Medumba [ʉ]: compressed in the direction of a bilabial stop Olson & Meynadier (2015)

Subtle lip posture differences also known to occur on various fricatives and approximants

Model use

For faster articulations, short videos can also be useful

Palatographs

A “tongue-print” on the roof of the mouth which shows where articulatory activity takes place Anderson (2008)

The method is more involved than simple photography:

  1. Paint tongue with mixture of oil and dark edible powder (chocolate or charcoal)
  2. Speaker produces a single token of a word with one lingual consonant
  3. Open mouth, insert mirror, photograph

Model use

Kom exhibits a contrast between two vowels which I will discuss further in my regular talk

Palatography reveals aspects of the lingual articulation of these two vowels

Model use

More canonically used for clear evidence of lingual place contrasts

Equipment

Everything that we’ve talked about in this section involves minimal equipment

Optional:

Figures vs. tabular data

The value of photographic evidence as a figure should be obvious: simply include the image

If you have larger numbers of images, you might convert these to tabular data for further analysis

Other articulations

It should be mentioned that certain articulations produced further back are not discussed here

However, ultrasound technology is gradually making it easier to image these articulations Miller, Gick et al. (2006); Namaseb & Iskarous (2007); Allen, Pulleyblank, & Ajíbóyè (2013); Hudu (2014)

Practical considerations

File naming and metadata

Name your recording files according to the same logical pattern

To identify further details, speak them during the recording itself

File backups

Back up every recording in multiple locations if possible, to avoid technological problems or theft destroying your work

Processing tabular data

Tabular data, of whatever sort, needs to be averaged, summarized (mean/standard deviation), or submitted to a statistical model

Statistics

Simple statistical analysis and modeling are standard in phonetics for analysis of numerical data (t-testing, linear models, curve fitting, etc)

If you want to try stats yourself, it’s best to find software for handling tabular data:

Statistical power

If you plan to model the data, it’s important to have enough statistical power

For the average phonetics study, good power is obtained at 15-20 speakers (rule of thumb)

Conclusions

Summary

We’ve covered:

What remains to be seen:

An audit

In compiling these slides and the references they contain (as many papers which present phonetic evidence in African languages as possible), I reflected upon:

In the references provided here, it is clear that African linguists are underrepresented, regardless of how “African linguist” is defined

Open discussion

Some potential topics for discussion:

References

Akumbu, P. (2019). A featural analysis of mid and downstepped high tone in Babanki. In Clem, E., Jenks, P., & Sande, H., eds., Theory and description in African linguistics: Selected papers from the 47th Annual Conference on African Linguistics, 3–20. PDF

Allen, B., Pulleyblank, D., & Ajíbóyè, Ọ. (2013). Articulatory mapping of Yoruba vowels: an ultrasound study. Phonology, 30(2), 183-210. Abstract & PDF

Anderson, V. (2008). Static palatography for language fieldwork. Language Documentation & Conservation, 2(1), 1-27. PDF

Bendjaballah, S. & Le Gac, D. (2021). The acoustics of word-initial and word-internal voiced stops in Somali. Journal of the International Phonetic Association, first view. Abstract

Boyer, O., & Zsiga, E. (2013). Phonological devoicing and phonetic voicing in Setswana. In Ọla Orie, Ọ. and Sanders, K., eds., Selected Proceedings of the Annual Conference on African Linguistics, 43, 82-89. PDF

Connell, B. (2007). Mambila fricative vowels and Bantu spirantisation. Africana Linguistica, 13(1), 7-31. Article

Connell, B. (1994). The structure of labial-velar stops. Journal of Phonetics, 22(4), 441-476. Abstract

Danis, N. (2020). Yorùbá vowel deletion involves compensatory lengthening: Evidence from phonetics. Stellenbosch Papers in Linguistics Plus, 60(1), 1-12. Abstract & PDF

Faytak, M., & Akumbu, P. W. (2021). Kejom (Babanki). Journal of the International Phonetic Association, 51(2), 333-354. Article

Fransen, M. (1995). A Grammar of Limbum: A Grassfields Bantu Language Spoken in the North-West Province of Cameroon. PhD dissertation, Vrije Universitet Amsterdam.

Gahl, S. (2008). Time and Thyme Are not Homophones: The Effect of Lemma Frequency on Word Durations in Spontaneous Speech. Language 84(3), 474-496. Abstract

Ge, C., Xiong, Y., & Mok, P. (2021). How reliable are phonetic data collected remotely? Comparison of recording devices and environments on acoustic measurements. In Proc Interspeech 2021, 1683-1687. PDF

Genzel, S. & Kügler, F. (2011). Phonetic realization of automatic (downdrift) and non-automatic downstep in Akan. Proceedings of ICPhS 17, Hong Kong. PDF

Gick, B., Pulleyblank, D., Campbell, F., & Mutaka, N. (2006). Low vowels and transparency in Kinande vowel harmony. Phonology, 23(1), 1-20. PDF

Gjersøe, S., Nformi, J., & Paschen, L. (2019). Hybrid falling tones in Limbum. In Clem, E., Jenks, P. & Sande, H., eds., Theory and Description in African Linguistics: Selected Papers from the 47th Annual Conference on African Linguistics, 95-118. PDF

Hamlaoui, F. & Makasso, E. (2019). Downstep and recursive phonological phrases in Bàsàá (Bantu A43). In Clem, E., Jenks, P. & Sande, H., eds., Theory and Description in African Linguistics: Selected Papers from the 47th Annual Conference on African Linguistics, 155-175. PDF.

Hudu, F. (2014). [ATR] feature involves a distinct tongue root articulation: Evidence from ultrasound imaging. Lingua, 143, 36-51. Abstract

Hudu, F. (2016). A phonetic inquiry into Dagbani vowel neutralisations. Journal of African Languages and Linguistics, (37)1, 59-89. Abstract

Hyman, L. (2014). How to study a tone language. Language Documentation & Conservation, 8, 525-562. PDF

Koffi, E. (2018). The acoustic vowel space of Anyi in light of the cardinal vowel system and the Dispersion Focalization Theory. In Kandybowicz, J., Major, T., Torrence, H., & Duncan, P., eds., African linguistics on the prairie: Selected papers from the 45th Annual Conference on African Linguistics. PDF

Ladefoged, P. (1990). What do we symbolize? Thoughts prompted by bilabial and labiodental fricatives. Journal of the International Phonetic Association, 20(2), 32-36. Abstract     Preprint PDF

Ladefoged, P. (1968). A phonetic study of West African languages: An auditory-instrumental survey. Cambridge University Press.

Lewis, D., & Shittu, S. (2014). Phonemic Status of Len Fricative-Vowels. Ibadan Journal of Humanities Studies, 24, 27-45. PDF

Lotven, S. & Berkson, K. (2019). The phonetics and phonology of depressor consonants in Gengbe. In Clem, E., Jenks, P. & Sande, H., eds., Theory and Description in African Linguistics: Selected Papers from the 47th Annual Conference on African Linguistics, 249-268. PDF

Maddieson, I. & Sands, B. (2019). The sounds of the Bantu languages. In Van de Velde, M., Bostoen, K., Nurse, D., & Philippson, G., eds., The Bantu Languages: Second Edition, 79-127. Routledge. Preprint PDF

Mathes, T. & Chebanne, A. (2018). High tone lowering and raising in Tsua. Stellenbosch Papers in Linguistics Plus, 54, 1-16. Abstract & PDF

McCollum, A. & Essegbey, J. (2020). Initial prominence and progressive vowel harmony in Tutrugbu Phonological Data and Analysis 2(3), 1-37. Abstract & PDF

McKinney, N. (1990). Temporal characteristics of fortis stops and affricates in Tyap and Jju. Journal of Phonetics, Abstract

McPherson, L. (2020). Seenku. Journal of the International Phonetic Association, 50(2), 220-239. Abstract

Miller, A., Namaseb, L., & Iskarous, K. (2007). Tongue body constriction differences in click types. Laboratory Phonology, 9, 643-656. PDF

Monaka, K. (2005). Shekgalagari stops and theories of phonological representation. Lwati: A Journal of Contemporary Research, 2, 24-42. Abstract & PDF

Myers, S., Namyalo, S., & Kiriggwajjo, A. (2019). F0 timing and tone contrasts in Luganda. Phonetica, 76(1), 55-81. Abstract

Nabirye, M., de Schryver, G., & Verhoeven, J. (2016). Lusoga (Lutenga). Journal of the International Phonetic Association, 46(2), 219-228. Abstract & PDF

Nagano-Madsen, Y. & Thornell, C. (2012). Acoustic properties of implosives in Bantu Mpiemo. In Eriksson, A. & Abelin, Å., eds., Proceedings of FONETIK 2012, Gothenburg, 73-76. PDF

Naidoo, S. (2012). A re-evaluation of the Zulu implosive [ɓ]. South African Journal of African Languages, 30(1), 1-10. Abstract

Nakagawa, H. (2008). Aspects of the phonetic and phonological structure of the G|ui language (Doctoral dissertation). Synopsis

Naumann, C. (2016). The phoneme inventory of Taa (West !Xoon dialect). In Vossen, R. & Haacke, W., eds., Lone Tree: Scholarship in Service of the Koon. Essays in memory of Anthony Traill. Köln: Rüdiger Köppe Velag. PDF

Nforgwei, S. (2004). A study of the phonological and syntactic processes in the standardisation of Limbum. PhD dissertation, Université de Yaoundé. PDF

Olson, K. & Meynadier, Y. (2015) On Medumba bilabial trills and vowels. Proceedings of ICPhS 18, Glasgow. PDF

Oppong, O. (2021). Pitch reset in Asante Twi, a dialect of Akan. MA Thesis, University of Helsinki. Abstract & PDF

Parker, S. (2008). Sound level protrusions as physical correlates of sonority. Journal of Phonetics, 36(1), 55-90. Abstract

Rialland, A. & Aborobongui, M. (2016). How intonations interact with tones in Embosi (Bantu C25), a two-tone language without downdrift. In Downing, L. & Rialland, A., eds., Intonation in African tone languages 195-xxx. Berlin: Mouton de Gruyter. Abstract     PDF

Ritchart, A. & Rose, S. (2015). Schwas in Moro Vowel Harmony. In Kramer, R., Zsiga, E., & Tlale Boyer, O., eds., Selected Proceedings of the 44th Annual Conference on African Linguistics, 231-242. PDF

Sanker, C., Babinski, S., Burns, R., Evans, M., Johns, J., Kim, J., Smith, S., Weber, N., & Bowern, C. (2021). (Don’t) try this at home! The effects of recording devices and software on phonetic analysis. Language, 97(4), e360-e382. PDF

Solé, M. J., Hyman, L. M., & Monaka, K. C. (2010). More on post-nasal devoicing: The case of Shekgalagari. Journal of Phonetics, 38(4), 604-615. Abstract Preprint

Starwalt, C. (2008). The acoustic correlates of ATR harmony in seven-and nine-vowel African languages: A phonetic inquiry into phonological structure. PhD dissertation, The University of Texas at Arlington. PDF

Traill, A., Khumalo, J., & Fridjhon, P. (1987). Depressing facts about Zulu. African Studies 46(2), 255-274. Abstract & PDF

Utman, J., & Blumstein, S. (1994). The influence of language on the acoustic properties of phonetic features: A study of the feature [strident] in Ewe and English. Phonetica, 51(4), 221-238. Abstract

Whalen, D. H., DiCanio, C., & Dockum, R. (2020). Phonetic documentation in three collections: Topics and evolution. Journal of the International Phonetic Association, 52(1), 1-27. Abstract

Zee, E. (1981). Effect of vowel quality on perception of post–vocalic nasal consonants in noise. Journal of Phonetics, 9(1), 35-48. Abstract