It’s hard to master digital dubbing. While dialogues in videos are usually simple, the lip syncing can get a little messy. In some cases, it can take a very long time for an actor to fully express themselves due to shallow editing.
But thanks to a new neural network, Indian-American researcher Andy Weinstein is letting it off the leash, creating an AI that can perfectly dub videos in an vernacular language.
How it works
Now, both Weinstein and Amrutanjan Bangura at the University of Toronto Langone Medical Center were very clear in their decision to build the AI. Weinstein has talked about it on several occasions, noting the cognitive challenge that a “modern Indian film is set in.” Bangura interviewed more than 100 actors for information related to the dialects of Indic languages.
Weinstein talked about how far he’s come. He said:
At first, dubbing was an experiment I did in graduate school with some human actors and for one particular movie “Black Knight” at the University of Michigan, it was a learning experience for me.
In November 2017, the two researchers published a paper entitled “A Dubbing Machine that Learns (Programming is a separate field)” in the journal Angewandte Chemie International Edition. In it, they describe the machine learning network called VOIP (Voice, Vocal and IPVoice), and outline the computer’s learning methodology.
Among their researchers, Mike Kessler at the University of South Carolina is the smartest. He helped write the code for the VOIP system. Weinstein and Bangura tested out this machine learning system on two popular Hollywood films, Made in Japan and Mad Max: Fury Road. They brought actors on-set to watch the new dubbed versions and give their opinion.
They found that the machine could produce subtitles for Hindi, Bengali, Tamil, Telugu, and Malayalam.
While they didn’t replace a human with the AI, they felt that the voice of the dubbing machine helped a bit. Bangura said:
Working with actors and producers as partners who are mindful of their own communication styles and working in a fast-moving collaborative environment was critical to achieving this goal. I’m proud of the work we’ve done, and I hope the path we’ve paved will stimulate others to push the frontier in artificial intelligence and content creation.
While there’s little difference in pronunciation between Bollywood and mainstream Hollywood, remaking popular movies to suit an Indian audience isn’t easy. Having the right judgment about the accuracy of lip syncing seems important.
Adding rich narration to videos usually involves carefully cross-checking dialogues with text transcripts. That’s why it’s extremely important for the directors, writers, and actors to be accurate. The probs we face with lip syncing, like small pronunciation errors, could be reduced through the use of AI.
The AI uses a “process set” for correct dubbing, rather than a general framework to describe the range of actions that need to be executed. The set sits between speech-recognition and text-to-speech, which allows it to compile artificial vocabularies from computer-cribbed scripts and deliver highly-accurate voice actors in both English and scripts in languages like Hindi.
Now that the AI system is complete, the next step is to create a library of dubbing options. Weinstein and Bangura also want to build a sound feed and a transcript of the AI system to calculate performance rating scores for dubbing actors. Their main aim is to bridge the gap between tech and film making.
This is a tech story that brings us one step closer to having Dubsmash for subcontinental languages.
Read next: How artificial intelligence could make medicine better