Speech Recognition and Accents

I don’t have time for a lengthy post today, which is unfortunate: the way software deals with regional accents deserves a longer discussion (led by someone more knowledgeable than me, frankly).  Regardless, I want to share a Slate article apropos of the passing of Steve Jobs. Apparently, Apple’s speech recognition software has a unique way of processing regional accents (thanks to Twitter friend ‘@opedr‘ for pointing it out for me):

Take, for example, the plosive consonant T…British people tend to pronounce the T sound in butter much more clearly than Americans, who swallow it. Eventually, the program establishes a kind of bell curve for the phoneme, and it will interpret any sound whose frequencies and other physical characteristics fall within the parameters of that curve as a possible attempt to produce that phoneme.

You can read the whole thing here. Forgiving the inexactitude of the descriptor ‘swallow’ (how do Americans ‘swallow’ t’s?), it’s an interesting, if brief, read.

If Apple’s speech-recognition program creates a bell curve for the accents of English, this prompts an obvious question: what accent does the ‘middle’ of the curve resemble? The program must start off with some type of ‘standard’ set of pronunciations and treat other accents as deviations from it. Is the ‘standard’ accent American? British? Or some kind of computated ‘average?’


About Ben

Ben T. Smith launched his dialect fascination while working in theatre. He has worked as an actor, playwright, director, critic and dialect coach. Other passions include linguistics, urban development, philosophy and film.
This entry was posted in English Phonetics and tagged . Bookmark the permalink.

8 Responses to Speech Recognition and Accents

  1. Danny Ryan says:

    “Swallow” is just one of those inaccurate laypersons’ linguistic terms, comparable with “guttural” used for anything an English speaker consider “harsh” or unusual.
    The typical “American” /t/ in ‹butter› is usually a flap [ɾ].

  2. Sravana says:

    This approach is not novel or unique to the iPhone. Speech recognition in general models phonemes as probability distributions over sounds, with the distribution being something like a bell curve, with the distribution found by training over many examples of that phoneme. These prior distributions can be adapted to a different dialect (or speaker) by training on sentences in that dialect. I’m guessing that Apple’s systems start off with some general American dialect and modify to non-American accents.

    There is also an issue of whether it is the phoneme’s acoustic characteristics that vary between dialects, or the word’s pronunciation itself. Examples like t-flapping can be interpreted either way — you can modify the acoustic model for ‘t’, or modify the pronunciation of ‘butter’.

  3. Ellen K. says:

    The article says towards the end that there’s settings for 5 different kinds of English. So sounds to me like there’s 5 different standards, not one. Including separate British and US standards. Why it then gives earlier that example of British versus US t’s I don’t know.

    • trawicks says:

      I believe they’re referring to Google’s Voice Search in that paragraph. The article is confusing in this regard, because it implies in paragraphs 2 and 3 that they’re discussing features of Apple’s Voice Recognition software, then follows this by saying that Apple is ‘tight lipped’ about the project. You’re right, though, it more or less negates everything written before.

  4. Eric Armstrong says:

    Apple isn’t releasing Siri in Canada (yet) as far as I can tell because they haven’t developed a model for Canadian speech yet. My understanding is that, though many Canadians would probably have success with the US version of the software, Apple (and Nuance, who is behind the Dragon engine for speech recognition that Siri uses now) isn’t releasing it to Canadians because they haven’t standardized for our regional variations. Whether we’ll be able to try Siri out (when the iPhone 4S is released) with the US model at all is yet to be seen.

  5. Andrej Bjelaković says:

    And what about all the British speakers who pronounce ‘butter’ with a glottal stop?

  6. I saw firsthand who was making money and who wasn’t, what worked
    and what didn’t, what people wanted that
    wasn’t available and what was to easy to find but wasn’t wanted.
    Had you always been thinking to upgrade the pavers of the patio.

    With its thick, glossy needles and dense, upward-reaching branches,
    the yew is useful as both a shrub and tree.