Skip to main content

Helen

An Accessibility Device for AI-based Lip Reading

What it does

Helen is a wearable camera that performs automated lip reading using deep learning. It can supplement hearing aids in noisy environments (by lipreading and transcribing spoken content), and enable audio independent communication where speech recognition fails.


Your inspiration

The hearing impaired have a tough time communicating, either because hearing aids can't isolate voices from crowds, or because such smart hearing aids cost around $3000. A current solution to this problem is sign language; but its relatively small user base limits its proliferation. After verifying these claims with hearing institutes, we realised that we could enhance communication for the hearing impaired by using visual information to capture speech, rather than relying solely on audio. We studied that automated lip reading could be achieved by using novel AI research, and set out trying to package this research into an economical device.


How it works

Helen has a simple 3-stage workflow: 1. Using a RaspberryPi, Helen records video of speaker. 2. It then transmits this video information to a system running LipNet - the AI that performs lip reading 3. LipNet analyses the video and outputs a transcription of what was spoken during the video. LipNet itself was theorised by researchers at Oxford, DeepMind and CIFAR and implemented by us. Using spatiotemporal convolutional neural networks, it encodes the changes in visual information over time to map sequences of lip movements to words. It then uses bi-directional gated recurrent units to determine how much information to persevere and forget, so that the starting and ending of words can be demarcated. Multilayer perceptrons then aggregate all of this information together to finally output a transcription of spoken content. This transcription can then be converted into audio, or even into braille (for those who are visually impaired as well).


Design process

We started out in December 2018 by first trying to implement the AI, and then building a device that could efficiently transmit high quality video to the AI for lip reading. Building the AI was an interesting challenge, and was made easier by concise and precise research papers from Oxford, and open-source examples and implementations of the AI which served as references. While this was a complicated and at times tedious process, it took only a couple of months to get it running. Making the hardware to record and stream video to the AI was a harder task, as we could not allow any drops in frames or jitters in the video which would distort the lip reading process. Initially, the quality of both the AI and the capturing hardware was so poor, that the only character that was successfully lip-read was 's'. However, as we iterated through several versions of our RaspberryPi based prototype, we continuously made improvements to the video capturing, the streaming and the processing algorithms. By March, we had a fully functioning prototype that could record high quality video and stream it losslessly to a server running our implementation of LipNet. This prototype averaged a surprisingly high word accuracy of between 60 and 80% that could stretch even to 95%.


How it is different

To our knowledge, there are no devices in the market that perform automated lip reading. Helen, then, becomes the first ever wearable device that hearing impaired patients can use to supplement their hearing aids and receive transcriptions of what was spoken to them, in real time. Unlike conventional hearing aids, Helen extracts speech based on visual lip movements, making it the first device that opens up an entirely new dimension of audio-independent communication. Building upon this idea, we believe that Helen can be used even by non-hearing-impaired individuals to communicate without audio. For example, when trying to key in a message on a crowded train, speech recognition might not work well due to noisy surroundings. With Helen, one would only need to look into the device, mouth their message, and Helen would transcribe it in a flash. Helen's methodology, form factor and multiple use cases make it the first wearable device for AI-based lip reading.


Future plans

Hardware wise, we wish to make the device much smaller, so that it can even be clipped onto a person's spectacles. We also want to include a source of illumination onto the device to make lip reading in dark surroundings possible. Software wise, we are looking into theorising our own AI (rather than using the existing LipNet) that is more resistant to jitters in head movements, and can also handle lip reading from side-profiles. Gathering larger datasets is also essential to this process. Finally, growth wise, we would like to put Helen in the hands of hearing impaired people, implement their feedback and begin a rollout of the product.


Awards

1. Semi Finalist, Finalist and eventual Winner of the President's Cup, a university wide innovation competition held by HKUST. 2. Most Innovative Device at the Institute of Engineering and Techonology's YPEC Hong Kong round, and qualified to the China round. 3. Undergraduate Champion, IET YPEC (Hong Kong round)


End of main content. Return to top of main content.

Select your location