The Silicon Trend Tech Bulletin

Icon Collap
Home / AR / VR / Nvidia Omniverse Avatar Combines AI Speech and 3D Models

Nvidia Omniverse Avatar Combines AI Speech and 3D Models

Published Tue, Nov 09 2021 18:05 pm
by The Silicon Trend



Nvidia Omniverse Avatar Combines AI Speech and 3D Models

The US multinational tech firm - Nvidia Corp. publicized a new platform for creating interactive AI avatars - Omniverse Avatar, combining numerous innovative techs like AI speech, 3D animation, facial tracking, and more to power virtual agent's range. 



Nvidia Omniverse 

This new platform is a further step for AI assistant development that is virtually customizable for any sector. Moreover, for enhanced customer satisfaction and business progress, it assists in streamlined customer interactions such as reservations, restaurant orders, personal appointments, and banking transactions.

Nvidia's founder and CEO - Jensen Huang, said, "The dawn of intelligent virtual assistants has arrived." He added that the platform combines the technologies to develop the most complex real-time apps. Omniverse Avatar is part of Nvidia Omniverse - a virtual world simulation and 3D collaboration platform with over 70K participants in open beta.



Technology Demo

At Nvidia GTC, Huang exhibited avatar tech demos: Project Tokkio - customer support, Project Maxine - video conferencing, and NVIDIA DRIVE Concierge - innovative vehicle services. The first Project Tokkio demo - a cute anime character in a digital kiosk talks 2 people through the restaurant menu for veggie items. The avatar uses facial tracking tech and responds according to the customer's facial expressions.

The 2nd demo - Huang's toy version answering questions about climate change and protein production. In the Project Maxine demo, an English-language orator is shown at a video conference in a noisy environment but can be heard clearly without noise. Her words were transcribed and translated into Spanish, French, and German.



Omniverse Avatar Elements

• Avatar animation is powered by the NVIDIA Audio2Face and Video2Face.

• Speech recognition is based on a software creation kit - NVIDIA Riva, that recognizes multiple spoken languages.

• Recommendation engine by NVIDIA Merlin - allows ventures to build deep learning recommending systems to handle colossal data.

• Natural language understanding is based on the Megatron 530B - recognizing, generating, and understanding human language.

• Perception potentials are enabled by NVIDIA Metropolis, a framework for video analytics.






Image source: