Lip-Reading Software Could Soon Become A Reality

Yeah, you heard it right. In the near future, we may know much more about what’s happening on a football or basketball match, we could know what the people at some political rally have to say about the speaker, and much more. One research team based at the University of East Anglia in the UK is developing a new exciting piece of software that could read lips better than human lip-readers. They are still in early stages of development, more accurately they are still in research phase meaning that the final piece of the software could be much, much more advanced than what they have now.

And at the moment, they have a pretty impressive thing. Their program classifies visual aspects of sounds, meaning that they use visual clues our mouth are making to recognize what it was said. And the best thing about it is that it doesn’t need actual audio input in order to work. It seems unimportant but this feature or better to say “ability” could actually be something that will enable the machine to properly understand us.
You see, the majority of research dealing with speech understanding uses both audio and visual clues in order to be able to understand speech. That’s pretty hard to do because you must combine two sources of information that often don’t show exactly the same information. That’s because of the McGurk Effect, an interesting psychological phenomenon showing how our senses can fool us on an everyday basis. You see, visual motions of our mouth, while we speak, are valued more by our brain than audio ones. so when we hear someone saying one thing but moving her mouth like she says something else, we’ll value more the visual clues and we won’t hear the exact sounds.

This is why the lip-reading project at East Anglia University is so thrilling. It relies solely on visual clues to understand speech, and there are much fewer visual clues than audio ones when we hear someone speaks, but we nevertheless trust visual clues more. The person that created this program is Dr. Helen Bear; she developed it as a part of her Ph.D., but later decided to make it fully functional. She is aware of the potential obstacles when it comes to understanding speech without the audio component, “It turns out there are some visual distinctions between ‘/p/,’ ‘/b/,’ and ‘/m/’ but it’s not something that human lip-readers have been able to achieve. But with a machine we are showing that those distinctions are there, they do exist and our recognizers are much better at doing it.”

It looks like they are on the right track; Dr. Bear says that they don’t actually know how the machine works because their machine works by learning the differences between the pronunciation of different sounds and it is still learning. The accuracy of the machine is around 15-20%, and while it looks pretty low you must be aware that speech is extremely complicated, especially if you throw away audio component.

If this machine gets 100% functional, the possibilities are endless. First, sound impaired people could finally understand everyone without any problems, movie subtitles could be made automatically, and suspects could be understood just by looking at their mouth …

We hope that lip-reading machine will soon become a part of our everyday life because there are just too many potential uses for it, and it could mean a whole new world to many people with hearing problems.

Leave a Reply

Your email address will not be published. Required fields are marked *