The launch of Oak Ridge National Laboratory’s Titan Supercomputer was in many ways a turning point for NVIDIA’s GPU compute business. Though already into their third generation of Tesla products by that time, getting Tesla into the world’s most powerful supercomputer is as much of a singular mark of “making it” as there can be. Supercomputer contracts are not just large orders in and of themselves, but they indicate that the HPC industry has accepted GPUs as reliable and performant, and is ready to significantly invest in them. Since then Tesla has ended up in several other supercomputer contracts, with Tesla K20 systems powering 2 of the world’s top 10 supercomputers, and Tesla sales overall for this generation have greatly surpassed the Fermi generation.

Of course while landing their first supercomputer contract was a major accomplishment for NVIDIA, it’s not the only factor in making the current success of Tesla a sustainable success. To steal a restaurant analogy, NVIDIA was able to get customers in the door, but could they get them to come back? As announced by the US Department of Energy at the end of last week the answer to that is yes. The DoE is building 2 more supercomputers, and it will be NVIDIA and IBM powering them.

The two supercomputers will be Summit and Sierra. At a combined price tag of $325 million, the supercomputers will be built by IBM for Oak Ridge National Laboratory and Lawrence Livermore National Laboratory respectively. They will be the successors to the laboratories respective current supercomputers, Titan and Sequoia.

Hardware

Both systems will be of similar design, with Summit being the more powerful of the two. Powering the systems will be a triumvirate of technologies; IBM POWER9 CPUs, NVIDIA Volta-based Tesla GPUs, and Mellanox EDR Infiniband for the system interconnect.

Starting with the CPU, at this point this is the first real attention POWER9 has received. Relatively little information is available on the CPU, though IBM has previously mentioned that POWER9 is going to emphasize the use of accelerators (specialist hardware), which meshes well with what is being done for these supercomputers. Otherwise beyond this we don’t know much else other than that it will be building on top of IBM’s existing POWER8 technologies.

Meanwhile on the GPU side, this supercomputer announcement marks the reintroduction of Volta by NVIDIA since going quiet on it after the announcement of Pascal earlier this year. Volta was then and still remains a blank slate, so not unlike the POWER9 CPU we don’t know what new functionality is due with Volta, only that it is a distinct product that is separate from Pascal and that it will be building off of Pascal. Pascal of course introduces support for 3D stacked memory and NVLink, both of which will be critical for these supercomputers.

Speaking of NVLink, as IBM’s POWER family is the first CPU family to support NVLink it should come as no surprise that NVLink will be the CPU-GPU and GPU-GPU interconnect for these computers. NVIDIA’s high-speed PCIe replacement, NVLink is intended to allow faster, lower latency, and lower energy communication between processors, and is expected to play a big part in NVIDIA’s HPC performance goals. While GPU-GPU NVLink has been expected to reach production systems from day one, the DoE supercomputer announcement means that the CPU-GPU implementation is also becoming reality. Until now it was unknown whether an NVLink equipped POWER CPU would be manufactured (it was merely an option to licensees), so this confirms that we’ll be seeing NVLink CPUs as well as GPUs.

With NVLink in place for CPU-GPU communications these supercomputers will be able to offer unified memory support, which should go a long way towards opening up these systems to tasks that require frequent CPU/GPU interaction, as opposed to the more homogenous nature of systems such as Titan. Meanwhile it is likely – though unconfirmed – that these systems will be using NVLink 2.0, which as originally announced was expected for the GPU after Pascal. NVLink 2.0 introduces cache coherency, which would allow for further performance improvements and the ability to more readily execute programs in a heterogeneous manner.

Systems

US Department of Energy Supercomputers
  Summit Titan Sierra Sequoia
CPU Architecture IBM POWER9 AMD Opteron
(Bulldozer)
IBM POWER9 IBM BlueGene/Q
GPU Architecture NVIDIA Volta NVIDIA Kepler NVIDIA Volta N/A
Performance (RPEAK) 150 - 300 PFLOPS 27 PFLOPS 100+ PFLOPS 20 PFLOPS
Power Consumption ~10MW ~9MW N/A ~8MW
Nodes 3,400 18,688 N/A N/A
Laboratory Oak Ridge Oak Ridge Lawrence Livermore Lawrence Livermore
Vendor IBM Cray IBM IBM

Though similar in design, the total computational power and respective workloads will differ for Summit and Sierra. Sierra, the smaller of the systems, is to be delivered to Lawrence Livermore National Laboratory to replace their current 20 PetaFLOP Sequoia supercomputer. LLNL will be using Sierra for the National Nuclear Security Administration’s ongoing nuclear weapon simulations, with LLNL noting that “the machine will be dedicated to high-resolution weapons science and uncertainty quantification for weapons assessment.”


Sierra: 100+ PetaFLOPS

Due to its use in nuclear weapons simulations, information on Sierra is more restricted than it is for Summit. Publicly, Sierra is being quoted as offering 100+ PFLOPS of performance, over five-times the performance of Sequoia. As these supercomputers are still in development the final performance figures are unknown – power consumption and clockspeed cannot be guaranteed this early in the process, not to mention performance scaling on such a large system – and it is likely Sierra will exceed its 100 PFLOPS performance floor.

Meanwhile the more powerful of the systems, Summit, will be delivered to the Oak Ridge National Laboratory. In building their current Titan supercomputer, ORNL expected to get 4-5 years out of Titan, and adhering to that schedule Summit will be Titan’s replacement.


Summit: 150 to 300 PetaFLOPS

Summit’s performance is expected to be in the 150-300 PFLOPS range, once again varying depending on the final clockspeeds and attainable performance of the cluster. In 2012 ORNL wanted their next system to offer 10x the performance of Titan, and at this point Summit’s performance estimates range from 5x to 10x Titan, so while not guaranteed at this time it is still a possibility that Summit will hit that 10x goal.

As Summit is geared towards public work, we know quite a bit more about its construction than we do Seirra. Summit will be built out of roughly 3400 nodes, with each node containing multiple CPUs and GPUs (as opposed to 1 of each per Titan node). Each node in turn will be backed by at least 512GB of memory, most likely composed of 512GB of DDR4 and a to-be-determined amount of High Bandwidth Memory (stacked memory) on each GPU. Backing that in turn will be another 800GB of NVRAM per node.

From a power standpoint Summit is expected to draw 10MW peak, roughly 10% higher than Titan’s 9MW. However despite the slight increase in power consumption Summit is expected to physically be far smaller than Titan. With Summit nodes taking up roughly the same amount of space as Titan nodes, Summit’s nodes will occupy around only 20% of the volume of Titan’s nodes. Key to this of course is increasing the number of processors per node; along with multiple CPUs per node, NVIDIA’s new mezzanine form factor GPUs would play a large part here, as they allow for GPUs to be installed and cooled in a fashion similar to socketed CPUs, as opposeded to bulky PCIe cards.


NVIDIA Pascal Test Vehicle Showing New GPU Form Factor

Like Titan before it, Summit will be dedicated to what ORNL calls “open science.” Time on the supercomputer will be granted to researchers through application proposals. Much of the science expected to be done on Summit is similar to the science already done on Titan – climate simulations, (astro)physics, nuclear, etc – with Summit’s greater performance allowing for more intricate simulations.

Finally, Summit is expected to come online in 2017, with trials and qualifications leading up to the machine being opened to users in 2018. As it stands, when Summit launches it will be the most powerful supercomputer in the world. Its 150 PLFOPS lower bound being roughly 3x faster than the current record holder, China’s Xeon Phi powered Tianhe-2, and no other supercomputers have been announced (yet) that are expected to surpass that number.

Wrapping things up, for both IBM and NVIDIA securing new supercomputer contracts is a major win. With IBM indirectly scaling back its role in the supercomputer race – BlueGene/Q being the last of the BlueGenes – IBM will continue providing supercomputers by providing heterogeneous powered by a mix of their own hardware and NVIDIA GPUs. NVIIDA of course is no less thrilled to be in not only the successor to Titan, but in another DoE lab’s supercomputer as well, and with a greater share of the underlying technology than before.

Though with that said, it should be noted that this is not the last major supercomputer order the DoE will be placing. The CORAL project for these supercomputers also includes a supercomputer for Argonne National Laboratory, who will be replacing their Mira supercomputer in the same timeframe. The details for that supercomputer will be announced at a later date, so there is still one more supercomputer contract to be awarded.

Comments Locked

29 Comments

View All Comments

  • xsoft7 - Monday, November 17, 2014 - link

    Can it run Crisis?
  • nathanddrews - Monday, November 17, 2014 - link

    No, no, no... Can it run Assassin's Creed Unity?
  • basroil - Monday, November 17, 2014 - link

    It could probably optimize AC:U enough that it could run on your phone with the same quality... (though file sizes would be quite a bit larger)
  • domboy - Monday, November 17, 2014 - link

    I think a better question would be how many instances of <insert your preferred application/game> can it run at simultaneously??
  • tipoo - Monday, November 17, 2014 - link

    It can run crysis about as much as that joke is funny. Only in some far fetched theoretical.
  • Yorgos - Monday, November 17, 2014 - link

    that thing cannot even run tetris for two reasons,
    you won't have a GUI and secondly those parts appear only on presentations.
    I don't think that it will be an easy task to deliver this supercomputer, they only have the Infiniband ready :D
  • name99 - Tuesday, November 18, 2014 - link

    You do understand that it is SUPPOSED to be hard to deliver these machines, right?

    The US government is trying to do more than one thing when it orders machines like this --- not just to get the machine itself, but ALSO to provide support for advancing US industry in one of the areas where it remains the world leader. That means stretch goals and an expectation that, occasionally, the deadline will slip or a plan B will be required if something turned out to be a little too ambitious.
  • Samus - Monday, November 17, 2014 - link

    300 PFLOPS, 10MW.

    Makes previous efficiency look like a 63 Lincoln Continental. The real problem though is how to handle the heat generated by 10MW of equipment.
  • Noëlius - Monday, November 17, 2014 - link

    The same way you would cool a raised floor area about 18,000 sq/ft. That's a fraction of a large data centre! I don't see a problem.
  • przemo_li - Wednesday, November 19, 2014 - link

    By offering for no charge to heat whole neighborhood? Not only this helps with dissipating heat, but it also rise Performance/Watt still higher!

Log in

Don't have an account? Sign up now