Whenever a new GPU or CPU hits the market, it boasts faster speeds, usually in the magnitude of two or three times better than before — at most. But Nvidia on Thursday blew right past those normally strong performance increases. The company’s new GPU performs 20 times better than its predecessor, giving a major boost to the cloud computing companies that will use the chips in their data centers.
Nvidia on Thursday unveiled its A100 graphics processor. It’s the first GPU based on the Santa Clara, California, company’s new Ampere architecture.
“The Ampere architecture provides the greatest generational leap out of our eight generations of GPUs,” Paresh Kharya, Nvidia’s director of product management for data center and cloud platforms, said Wednesday in a briefing with reporters.
“Eight” comes up again in Thursday’s announcement: The DGX A100 board combines eight A100s into a super-GPU that can work as one giant processor — or as separate GPUs for different users or tasks. It weighs 50 pounds and fits into Nvidia CEO Jensen Huang’s oven as you can see in the video.
The A100 is aimed at intensive tasks like AI training, conversational AI, high-performance data analytics, genomics, scientific simulation, seismic modeling and financial forecasting. And it will be used to help explore cures and vaccines for the novel has infected over 4.3 million people so far around the globe. Because of the GPU’s speed, it will let researchers crunch data in days or months instead of years., which
Nvidia is one of the world’s biggest graphics chips makers, and it has built a cult following among gamers. In earlier days, GPUs mainly went into computers and gaming consoles, aimed at tasks that required high-quality, responsive graphics. Today, GPUs are used by anything that needs to crunch a lot of data quickly. Because of the way GPUs efficiently process information, they’re key to robots, self-driving cars, data centers powering artificial intelligence and. They’re also typically more efficient and require less floor space than CPUs, which traditionally have served as the brains of systems.
The A100 chip is based on a 7-nanometer design. A key part of semiconductor manufacturing is shrinking the components called transistors, extraordinarily tiny electronic switches that process data for everything from microwave oven clocks to artificial intelligence algorithms running in our phones. The smaller the transistors, the better the battery life and performance. The Ampere architecture boasts 54 billion transistors, “making it the world’s largest 7-nanometer chip,” Nvidia said.
With the A100, not only will machines be capable of crunching a lot of data quickly, but also the servers will be more flexible.
“It’s going to unify that infrastructure into something much more flexible, much more fungible and increase its utility makes it a lot easier to predict how much capacity you need,” Huang said Wednesday in a briefing with reporters.
Nvidia’s new A100 GPU is already shipping to customers around the globe. It will be used by the biggest names in cloud computing, including Alibaba, Amazon, Baidu, Google and Microsoft. The companies operate huge server farms that house the world’s data. Netflix, Reddit and most other online services rely on the cloud operators to keep their sites up and running. Nvidia said that Microsoft, with its Azure cloud, will be one of the first companies to use the A100.
“Azure will enable training of dramatically bigger AI models using Nvidia’s new generation of A100 GPUs to push the state-of-the-art on language, speech, vision and multi-modality,” Mikhail Parakhin, Microsoft corporate vice president, said in a press release.
Other organizations that plan to use the A100 include national laboratories, leading universities and research institutions like Indiana University; Germany’s Julich Supercomputing Centre, Karlsruhe Institute of Technology, and Max Planck Computing and Data Facility; and the US Department of Energy’s National Energy Research Scientific Computing Center.
The A100 will be particularly useful in training and operating AI systems. The technology is flexible, letting companies scale their servers up or down as needed, and the A100’s speed can reduce the amount of time it takes to teach an artificial intelligence program.
“Modern and complex AI training and inference workloads that require a large amount of data can benefit from state-of-the art technology like Nvidia A100 GPUs, which help reduce model training time and speed up the machine learning development process,” Gary Ren, machine learning engineer at food delivery service DoorDash, said in a press release.
Along with cloud and supercomputer organizations, many tech companies will use the A100 in servers. That includes Atos, Dell, Fujitsu, Lenovo and Supermicro.
The new A100 pulls off five “miracles,” as Nvidia’s Kharya put it. First is the Ampere architecture.
“This is unquestionably the first time that we’ve unified the acceleration workload of the entire data center into one single platform,” Huang said. “Everything from video analytics to image processing through voice to training to inference to data processing is now on one unified server.”
Second is Nvidia’s third-generation Tensor cores, which improve high performance computing applications. Third is a multi-instance GPU that lets a single A100 be partitioned into as many as seven separate GPUs to “deliver varying degrees of compute for jobs of different sizes, providing optimal utilization and maximizing return on investment.”
Along with those advancements, the A100 use the third generation of Nvidia’s NVLink, a high-speed, GPU to GPU interconnect. In the A100, the NVLink is twice as fast, letting multiple GPUs be connected to operate as one giant GPU. The fifth advancement in the A100 is something called structural sparsity. The efficiency technique “harnesses the inherently sparse nature of AI math to double performance.”
Along with the A100, Nvidia on Thursday introduced its DGX A100 system. It features eight A100 GPUs connected with Nvidia NVLink and is aimed at intensive AI computing. One DGX A100, which starts at $199,000, is capable of delivering 5 petaflops of AI performance and consolidates the power and capabilities of an entire data center into a single system.
The first organization using the DGX A100 is the US Energy Department’s Argonne National Laboratory. It plans to use the cluster’s AI and computing power to better understand and fight COVID-19
“The compute power of the new DGX A100 systems coming to Argonne will help researchers explore treatments and vaccines and study the spread of the virus, enabling scientists to do years’ worth of AI-accelerated work in months or days,” Rick Stevens, associate laboratory director for computing, environment and life sciences at Argonne, said in a press release.
The company also updated its software linked to the A100 GPU. That includes Jarvis, a multimodal conversation AI system; Merlin, a deep recommender application framework; and Nvidia’s high performance computing SDK to help supercomputer makers debug and optimize their code for the A100.
Jarvis provides a complete, GPU software stack and tools to make it easy for developers to build and launch real-time conversational bots that can understand terminology unique to each company and its customers. For instance, a bank’s app built with Jarvis would understand what financial terms mean.
Nvidia expects Jarvis to be helpful during the pandemic, when more people are working from home and telemedicine and remote learning are becoming the norm.
“Conversational AI is central to the future of many industries, as applications gain the ability to understand and communicate with nuance and contextual awareness,” CEO Huang said in a press release. “Nvidia Jarvis can help the healthcare, financial services, education and retail industries automate their overloaded customer support with speed and accuracy.”
Companies that will use Jarvis include Voca, an AI agent for call center support; Kensho, which provides automatic speech transcriptions for finance and business; and Square, which has a virtual assistant for appointment scheduling.