NVIDIA’s CEO Jensen Huang had been addressing a virtual audience during his keynote at the GTC21 event like he had done before—from his kitchen.
But this time around, the “kitchen keynote” (as it had been dubbed) got a bit sci-fi. For 14 seconds of the 1-hour-and-48-minute presentation, Huang wasn’t quite himself. Instead, a photorealistic digital clone of the CEO (and his kitchen) popped up on screen—and no one knew.
The GPU Technology Conference (GTC), which took place online this spring from April 12-16, showcases the latest in artificial intelligence, accelerated computing, intelligent networking, game development and more. So it makes sense that the company would show off its tech prowess there. Based in Santa Clara, Calif., NVIDIA designs graphics processing units for various industries, including gaming and automotive.
To create the virtual version of Huang, a full face and body scan was done to create a 3D model, then AI was trained to mimic his gestures and expressions, all via the company’s Omniverse technology, a multidisciplinary collaboration tool for creating 3D virtual spaces. “Unlike the common approach of creating a 3D digital replica of a real person where you scan and capture as much data of that person as possible, we set a very difficult goal of replicating Jensen's behavior and performance without much data of him,” explained David Wright, VP and executive creative director of NVIDIA.
Wright explained that the concept began around February, with the final version of Huang being built, using a voice recording of his keynote, roughly a week before the event.
The use of virtual stand-ins at digital events isn’t necessarily new, and this past year has seen a sharp increase in the development and implementation of avatar-based platforms. But those characters don’t really look like you or me.
But what if they could?
Founded in 2017, U.K.-based Synthesia set out to make it easier to create synthetic video content. It’s now the world's largest platform for AI video generation, boasting the creation of six million videos to date.
“Just like Photoshop completely changed how we work with photos, keyboards and computers completely changed how we work with text from pen and paper, of course, and in music, synthesizers and software have also completely changed how we create songs today,” explained Victor Riparbelli, CEO and co-founder of Synthesia, about the technology’s impact on video production.
To create an AI-generated video on Synthesia, users either select an existing avatar image or design a custom one by submitting three to four minutes of video footage and a script that’s used to build talking head-style videos.
Primarily used by companies for training, learning and marketing and sales purposes, Synthesia’s API can be used to create personalized event invites, video chat bots or virtual facilitators, interactive videos and interstitial videos during conferences.
“We're working on making experiences that today are text-driven and making them video-driven,” Riparbelli said. “For example, a warehouse worker in a big tech company consuming their training as a two-minute video versus a five-page PDF is a much better experience.”
One of Synthesia's clients, EY (formerly known as Ernst & Young) uses AI avatars, “not as a replacement for taking real meetings, but after they've had a call, instead of sending an email, they can now send the video,” he said. While Volkswagen trains teams at its car dealerships around the world. The software is able to translate text into 55 languages, which is key since the company works with many global companies that need to communicate to remote team members across borders.
The company is also currently working with a conference producer to create AI-generated content for upcoming in-person events, using interactive videos at kiosks to help navigate attendees throughout the space. Riparbelli also explained that the technology could be used to easily insert different data points such as location or industry into sponsored messages, similar to auto-generated email formats.
“I think there's never been a bigger need among people to consume information by video,” Riparbelli said. “I think businesses very much realized that if they communicate by text it's just not as effective. They want to communicate by video because they want to increase engagement. They want to increase conversion rates. They want to increase information retention. And video is just the natural way to do that.” But he noted that the costs and lengthy production process of shooting IRL videos make it prohibitive and unfeasible for most companies.
According to company research, Riparbelli said that nine out of 10 people don’t realize they're watching a synthetic video—probably because they’re not looking for it.
This brings up the question of the ethical use of such content. Several years ago, AI-generated imagery, commonly known as “deepfakes,” of Hollywood actors presented in compromising positions made headlines, which raised concerns over the potential dangers of this type of content. Riparbelli explained that Synthesia has safeguards in place to prevent users from abusing the platform. That includes requesting consent when creating custom avatars.
Despite the possible pitfalls, both Wright and Riparbelli emphasized the desire to make the technology easier to use.
“Regarding the future, we do not pause. We are always pushing the boundaries of what is possible today and creating something new,” Wright said. “We want to make it easier and faster for anyone to create digital characters. We will always be working on virtual humans, virtual avatars and the like, and we will continue to bridge the experience between the physical and virtual worlds closer together.”