Speech Recognition and Accents

I don’t have time for a lengthy post today, which is unfortunate: the way software deals with regional accents deserves a longer discussion (led by someone more knowledgeable than me, frankly). Regardless, I want to share a Slate article apropos of the passing of Steve Jobs. Apparently, Apple’s speech recognition software has a unique way of processing regional accents (thanks to Twitter friend ‘@opedr‘ for pointing it out for me):

Take, for example, the plosive consonant T…British people tend to pronounce the T sound in butter much more clearly than Americans, who swallow it. Eventually, the program establishes a kind of bell curve for the phoneme, and it will interpret any sound whose frequencies and other physical characteristics fall within the parameters of that curve as a possible attempt to produce that phoneme.

You can read the whole thing here. Forgiving the inexactitude of the descriptor ‘swallow’ (how do Americans ‘swallow’ t’s?), it’s an interesting, if brief, read.

If Apple’s speech-recognition program creates a bell curve for the accents of English, this prompts an obvious question: what accent does the ‘middle’ of the curve resemble? The program must start off with some type of ‘standard’ set of pronunciations and treat other accents as deviations from it. Is the ‘standard’ accent American? British? Or some kind of computated ‘average?’

About Ben

Ben T. Smith launched his dialect fascination while working in theatre. He has worked as an actor, playwright, director, critic and dialect coach. Other passions include linguistics, urban development, philosophy and film.

View all posts by Ben →

8 Responses to Speech Recognition and Accents

Danny Ryan says:

October 8, 2011 at 5:39 pm

“Swallow” is just one of those inaccurate laypersons’ linguistic terms, comparable with “guttural” used for anything an English speaker consider “harsh” or unusual.
The typical “American” /t/ in ‹butter› is usually a flap [ɾ].
- m.m. says:
  
  October 9, 2011 at 12:09 am
  
  Don’t forget a slew of others that are not only inaccurate, but pretty much meaningless but pretty much meaningless besides ‘foreign sounding’.
Sravana says:

October 8, 2011 at 8:08 pm

This approach is not novel or unique to the iPhone. Speech recognition in general models phonemes as probability distributions over sounds, with the distribution being something like a bell curve, with the distribution found by training over many examples of that phoneme. These prior distributions can be adapted to a different dialect (or speaker) by training on sentences in that dialect. I’m guessing that Apple’s systems start off with some general American dialect and modify to non-American accents.

There is also an issue of whether it is the phoneme’s acoustic characteristics that vary between dialects, or the word’s pronunciation itself. Examples like t-flapping can be interpreted either way — you can modify the acoustic model for ‘t’, or modify the pronunciation of ‘butter’.
Ellen K. says:

October 9, 2011 at 1:38 am

The article says towards the end that there’s settings for 5 different kinds of English. So sounds to me like there’s 5 different standards, not one. Including separate British and US standards. Why it then gives earlier that example of British versus US t’s I don’t know.
- trawicks says:
  
  October 9, 2011 at 3:06 am
  
  I believe they’re referring to Google’s Voice Search in that paragraph. The article is confusing in this regard, because it implies in paragraphs 2 and 3 that they’re discussing features of Apple’s Voice Recognition software, then follows this by saying that Apple is ‘tight lipped’ about the project. You’re right, though, it more or less negates everything written before.
Eric Armstrong says:

October 9, 2011 at 12:37 pm

Apple isn’t releasing Siri in Canada (yet) as far as I can tell because they haven’t developed a model for Canadian speech yet. My understanding is that, though many Canadians would probably have success with the US version of the software, Apple (and Nuance, who is behind the Dragon engine for speech recognition that Siri uses now) isn’t releasing it to Canadians because they haven’t standardized for our regional variations. Whether we’ll be able to try Siri out (when the iPhone 4S is released) with the US model at all is yet to be seen.
Andrej Bjelaković says:

October 9, 2011 at 8:44 pm

And what about all the British speakers who pronounce ‘butter’ with a glottal stop?
http://toboganium.com/ says:

September 23, 2014 at 8:48 pm

I saw firsthand who was making money and who wasn’t, what worked
and what didn’t, what people wanted that
wasn’t available and what was to easy to find but wasn’t wanted.
Had you always been thinking to upgrade the pavers of the patio.

With its thick, glossy needles and dense, upward-reaching branches,
the yew is useful as both a shrub and tree.

Speech Recognition and Accents

Related

About Ben

8 Responses to Speech Recognition and Accents

Subscribe to Blog via Email

Recent Posts

All-Time Most Popular

Recent Comments

For the Uninitiated

Sites I Love

Categories

Archives

Speech Recognition and Accents

Share this:

Related

About Ben

8 Responses to Speech Recognition and Accents

Follow Us!

Subscribe to Blog via Email

Recent Posts

All-Time Most Popular

Recent Comments

For the Uninitiated

Sites I Love

Categories

Archives