Due to the ongoing pandemic, most peoples are working from home in almost every sector. Coordinating work over video calls due to the COVID-19 pandemic has been a tough job. As of now, there are so many well-acquainted video conferencing software. One of the greatest features of these video calling apps is automatic switching between video feeds of the person talking in real-time.
However, this does not work with sign language users so they could feel left out. According to the latest news, Google researchers have decided to fix this accessibility issue by building a real-time sign language detection engine.
Google posted about this development in its blog post. The company revealed this technology can detect when a person in a video call is trying to communicate using sign language and bring the spotlight on them. Google claimed that the engine will be able to tell when a person starts signing and make them the active speaker. Google researchers at ECCV 2020 has presented this model engine.
In its research paper titled Real-Time Sign Language Detection using Human Pose Estimation, the company talks about how a ‘plug and play’ detection engine was created for video conferencing apps. It seems that the efficiency and latency of the video feed are the two most crucial aspects and the new model handles both these very well.
Google has explained it all in detail how this sign language detection engine works. First, the video passes through PoseNet and estimates the key points of the body such as eyes, nose, shoulders, and more, and thus helps the engine create a stick figure of the person. It then compares its movements to a trained model with the German Sign Language corpus. According to Google, this is how the model detects whether the said person has started or stopped signing.
Now, a genuine question comes that how are they assigned an active speaker role when there is essentially no audio? Well, Google has gone through some real hurdles in this space before overcoming it by building a web demo that transmits a 20 kHz high-frequency audio signal to the video conferencing app you connect with it. This fools the video conferencing app into thinking that the person using sign language is speaking and thus, makes them an active speaker.
Google claims that as of now, their researchers have already managed to achieve 80% accuracy in predicting when a person starts signing and they are trying their level best to reach 90% accuracy.
It seems that this sign detection engine is just a demo in terms of a research paper now, but very soon, popular video conferencing apps like Google Meet or Zoom will adopt this into their system.