IBM, Nvidia Build “World’s Fastest Supercomputer” for US Government

 

The US Department of Energy unveiled Friday what it says is the world’s fastest supercomputer.

The new system, called Summit, is eight times faster than Titan, which has until now been the fastest supercomputer in the US.

Designed by IBM and Nvidia, Summit is powered by the former’s Power 9 processors and the latter’s Tensor Core GPUs. The machine takes up an area about the size of two tennis courts at the DOE’s Oak Ridge National Lab in Tennessee and can draw up to 13MW of power when firing on all cylinders.

The new computing architecture developed for the system combines AI computing capabilities with traditional high-performance computing ones. That means it can be used to build scientific models and simulations while at the same time processing machine-learning workloads, greatly opening up the possibilities in supercomputer-assisted scientific research.

The government’s total contract cost for the system is up to $280 million, including “maintenance and all potential options that could be exercised,” Katie Bethea, an outreach project manager for ORNL, wrote in an email to Data Center Knowledge.

The 9,000 square feet of data center space the Summit occupies was part of the lab’s Computational Sciences building, renovated specifically to accommodate the new system. “We brought in 15MW of power and utilities and constructed evaporative cooling towers specifically for Summit,” she said.

The cooling system pumps more than 4,000 gallons of water through its pipes to dissipate the heat the supercomputer produces, according to IBM.

Gunning for China’s HPC Lead

Summit can perform 200 quadrillion (200,000 trillion) calculations per second, or 200 petaflops. Until now, the world’s fastest supercomputer has been the Sunway TaihuLight system at the National Supercomputing Center in Wuxi, China, capable of 93.01 petaflops.

The world’s supercomputers are ranked twice a year by the non-profit Top500. Sunway TaihuLight has topped the four previous editions of the Top500 list consecutively, but it will likely be unseated for the first time in two years by Summit later this month, when the next edition is published.

ORNL

Racks full of storage drives for the Summit supercomputer at ORNL. The overhead piping delivers chilled water to cool the system's processors, pumping more than 4,000 gallons of water per minute through the system.

Summit is “the most powerful and the smartest supercomputer in the world,” Paresh Kharya, a product management and marketing director for accelerated computing at Nvidia, said on a conference call with reporters Friday. “It’s also the world’s largest GPU-accelerated supercomputer.”

GPUs have long been used to accelerate traditional HPC workloads. In recent years, however, they’ve also emerged as the go-to acceleration chip for training deep-learning models for AI applications. Being the number-one supplier of GPUs, Nvidia has been the primary beneficiary of this trend.

28,000 GPUs

The Santa Clara, California-based chipmaker spent more than $3 billion on development of the Volta Tensor Core GPU that powers Summit, an Nvidia spokesperson said. According to Kharya, the GPUs are responsible for 95 percent of the system’s performance.

The compute cluster consists of about 4,600 nodes, with two CPUs and six GPUs per node, for a total of 27,648 GPUs, according to Nvidia. The GPUs are interconnected with IBM’s most recent Power9 CPUs – designed specifically for HPC and AI workloads – by Nvidia’s new NVLink interconnect technology.

Nvidia and IBM worked together to tightly couple their chips via NVLink, which delivers speeds up to 300 gigabytes per second, or 10 times faster than PCIe, Kharya said.

Like other ORNL supercomputers, Summit is theoretically open for use by any researcher team with a proposal deemed important enough by the DOE. But the system’s schedule is already full for the foreseeable future.

According to a post on the Nvidia blog, projects it’s already booked for include:

  • Cancer Research: The DOE and National Cancer Institute are working on a program called CANcer Distributed Learning Environment (CANDLE). Their aim is to develop tools that can automatically extract, analyze, and sort existing health data to reveal previously hidden relationships between disease factors such as genes, biological markers, and the environment.
  • Fusion Energy: Fusion, the energy source powering the Sun, has long been touted for its promise of clean, abundant energy. Summit will be able to model a fusion reactor and its magnetically confined plasma, hastening commercial development.
  • Disease and Addiction: Researchers will use AI to identify patterns in the function and evolution of human proteins and cellular systems. These patterns can help us better understand Alzheimer’s, heart disease, or addiction, and inform the drug discovery process.

‘Highly Efficient Supercomputing’

While capable of using as much power as a hyperscale data center built by the likes of Facebook or Google, the system gets a lot out of every kilowatt-hour it consumes. More than 15 gigaflops per watt, to be precise.

One of the fundamental supercomputing challenges for industry, government, and academia has always been “to do highly efficient supercomputing,” Kharya said. Summit is five times more efficient than Titan and 50 times more efficient than Jaguar, the supercomputer ORNL deployed about a decade ago.

Read 70 times
KT_Austin

About:

I love my computer because all my friends live inside it

Welcome to AustinLaptop.Com

Subscribe to our Newsletter
Top