What's Hot

    Gordon Moore, Intel Co-Founder, Tech Industry Visionary and Passes At 94

    April 11, 2023

    NVIDIA CES 2023 Special Address Live Blog (8am PT/16:00 UTC)

    April 9, 2023

    Intel Leadership Shuffle: Stuart Pann in for IFS, Raja Koduri out for GPUs & off to AI Startup

    April 6, 2023
    Facebook Twitter Instagram
    PC Central
    • Home
    • Guides
    • CPU

      Gordon Moore, Intel Co-Founder, Tech Industry Visionary and Passes At 94

      April 11, 2023

      AMD Quietly Launches A620 Platform: Sub $100 AM5 Motherboards

      March 17, 2023

      Intel Unveils Core i9-13900KS: Raptor Lake Spreads Its Wings to 6.0 GHz

      March 15, 2023

      Best CPUs for Gaming March 2023

      March 7, 2023

      A Lighter Touch: Exploring CPU Power Scaling On Core i9-13900K and Ryzen 9 7950X

      February 14, 2023
    • gpus

      NVIDIA CES 2023 Special Address Live Blog (8am PT/16:00 UTC)

      April 9, 2023

      Intel Leadership Shuffle: Stuart Pann in for IFS, Raja Koduri out for GPUs & off to AI Startup

      April 6, 2023

      The AMD CES 2023 Keynote Live Blog (6:30pm PT/02:30 UTC)

      March 27, 2023

      AMD Issues Early Q3’22 Financial Results: Misses Guidance By $1B as Client Revenue Craters

      March 25, 2023

      NVIDIA Releases Hotfix For GeForce Driver To Resolve CPU Usage Spikes

      March 20, 2023
    • Notebook Reviews

      The ASUS Vivobook Pro 15 OLED Review: For The Creator In All Of Us

      March 30, 2023

      Intel Expands 12th Gen Core to Ultraportable Laptops, from 5-cores at 9 W to 14-cores at 28 W

      March 22, 2023

      Intel Alder Lake-H Core i9-12900HK Review: MSI's Raider GE76 Goes Hybrid

      February 23, 2023

      Updated AMD Notebook Roadmap: Zen 4 on 4nm in 2023, Zen 5 By End of 2024

      January 7, 2023

      AMD Mobile GPU 2022 Update: Radeon 6000S Series, 6x50M Parts, and Navi 24-Based 6500M and 6300M

      December 30, 2022
    • Desktop Reviews

      ZOTAC’s Streaming Mini-PC, the MI553B, with Integrated AVerMedia Capture Card

      March 26, 2023

      Samsung ArtPC: Cylindrical PC with 360º audio, i5/i7 plus NVMe, Preorders from $1200

      March 26, 2023

      HP Envy 27-Inch AIO Updates: Six-Core Coffee Lake, 4K Display, NVMe

      March 16, 2023

      AMD Creates Quad Core Zen SoC with 24 Vega CUs for Chinese Consoles

      March 12, 2023

      ASUS Booth Tour at CES 2016: 10G Switches, External GPU Dock, USB-C Monitor and more

      February 20, 2023
    • Mac Reviews
    Facebook Twitter Instagram
    PC Central
    Home»gpus»NVIDIA Announces H100 NVL – Max Memory Server Card for Large Language Models
    gpus

    NVIDIA Announces H100 NVL – Max Memory Server Card for Large Language Models

    bfteamBy bfteamAugust 27, 2022No Comments8 Mins Read
    Facebook Twitter Pinterest LinkedIn Tumblr Email
    Share
    Facebook Twitter LinkedIn Pinterest Email

    While this year’s Spring GTC event doesn’t feature any new GPUs or GPU architectures from NVIDIA, the company is still in the process of rolling out new products based on the Hopper and Ada Lovelace GPUs its introduced in the past year. At the high-end of the market, the company today is announcing a new H100 accelerator variant specifically aimed at large language model users: the H100 NVL.

    The H100 NVL is an interesting variant on NVIDIA’s H100 PCIe card that, in a sign of the times and NVIDIA’s extensive success in the AI field, is aimed at a singular market: large language model (LLM) deployment. There are a few things that make this card atypical from NVIDIA’s usual server fare – not the least of which is that it’s 2 H100 PCIe boards that come already bridged together – but the big takeaway is the big memory capacity. The combined dual-GPU card offers 188GB of HBM3 memory – 94GB per card – offering more memory per GPU than any other NVIDIA part to date, even within the H100 family.

    NVIDIA H100 Accelerator Specification Comparison
      H100 NVL H100 PCIe H100 SXM
    FP32 CUDA Cores 2 x 16896? 14592 16896
    Tensor Cores 2 x 528? 456 528
    Boost Clock 1.98GHz? 1.75GHz 1.98GHz
    Memory Clock ~5.1Gbps HBM3 3.2Gbps HBM2e 5.23Gbps HBM3
    Memory Bus Width 6144-bit 5120-bit 5120-bit
    Memory Bandwidth 2 x 3.9TB/sec 2TB/sec 3.35TB/sec
    VRAM 2 x 94GB (188GB) 80GB 80GB
    FP32 Vector 2 x 67 TFLOPS? 51 TFLOPS 67 TFLOPS
    FP64 Vector 2 x 34 TFLOPS? 26 TFLOPS 34 TFLOPS
    INT8 Tensor 2 x 1980 TOPS 1513 TOPS 1980 TOPS
    FP16 Tensor 2 x 990 TFLOPS 756 TFLOPS 990 TFLOPS
    TF32 Tensor 2 x 495 TFLOPS 378 TFLOPS 495 TFLOPS
    FP64 Tensor 2 x 67 TFLOPS? 51 TFLOPS 67 TFLOPS
    Interconnect NVLink 4
    (600GB/sec)
    NVLink 4
    (600GB/sec)
    NVLink 4
    18 Links (900GB/sec)
    GPU 2 x GH100
    (814mm2)
    GH100
    (814mm2)
    GH100
    (814mm2)
    Transistor Count 2 x 80B 80B 80B
    TDP 700-800W 350W 700W
    Manufacturing Process TSMC 4N TSMC 4N TSMC 4N
    Interface 2 x PCIe 5.0
    (Quad Slot)
    PCIe 5.0
    (Dual Slot)
    SXM5
    Architecture Hopper Hopper Hopper

    Driving this SKU is a specific niche: memory capacity. Large language models like the GPT family are in many respects memory capacity bound, as they’ll quickly fill up even an H100 accelerator in order to hold all of their parameters (175B in the case of the largest GPT-3 models). As a result, NVIDIA has opted to scrape together a new H100 SKU that offers a bit more memory per GPU than their usual H100 parts, which top out at 80GB per GPU.

    Under the hood, what we’re looking at is essentially a special bin of the GH100 GPU that’s being placed on a PCIe card. All GH100 GPUs come with 6 stacks of HBM memory – either HBM2e or HBM3 – with a capacity of 16GB per stack. However for yield reasons, NVIDIA only ships their regular H100 parts with 5 of the 6 HBM stacks enabled. So while there is nominally 96GB of VRAM on each GPU, only 80GB is available on regular SKUs.

    The H100 NVL, in turn, is the mythical fully-enabled SKU with all 6 stacks enabled. By turning on the 6th HBM stack, NVIDIA is able to access the additional memory and additional memory bandwidth that it affords. It will have some material impact on yields – how much is a closely guarded NVIDIA secret – but the LLM market is apparently big enough and willing to pay a high enough premium for nearly perfect GH100 packages to make it worth NVIDIA’s while.

    Even then, it should be noted that customers aren’t getting access to quite all 96GB per card. Rather, at a total capacity of 188GB of memory, they’re getting effectively 94GB per card. NVIDIA hasn’t gone into detail on this design quirk in our pre-briefing ahead of today’s keynote, but we suspect this is also for yield reasons, giving NVIDIA some slack to disable bad cells (or layers) within the HBM3 memory stacks. The net result is that the new SKU offers 14GB more memory per GH100 GPU, a 17.5% memory increase. Meanwhile the aggregate memory bandwidth for the card stands at 7.8TB/second, which works out to 3.9TB/second for the individual boards.

    Besides the memory capacity increase, in a lot of ways the individual cards within the larger dual-GPU/dual-card H100 NVL look a lot like the SXM5 version of the H100 placed on a PCIe card. Whereas the normal H100 PCIe is hamstrung some by the use of slower HBM2e memory, fewer active SMs/tensor cores, and lower clockspeeds, the tensor core performance figures NVIDIA is quoting for the H100 NVL are all at parity with the H100 SXM5, indicating that this card isn’t further cut back like the normal PCIe card. We’re still waiting on the final, complete specifications for the product, but assuming everything here is as presented, then the GH100s going into the H100 NVL would represent the highest binned GH100s currently available.

    And an emphasis on the plural is called for here. As noted earlier, the H100 NVL is not a single GPU part, but rather it’s a dual-GPU/dual-card part, and it presents itself to the host system as such. The hardware itself is based on two PCIe form-factor H100s that are strapped together using three NVLink 4 bridges. Physically, this is virtually identical to NVIDIA’s existing H100 PCIe design – which can already be paired up using NVLink bridges – so the difference isn’t in the construction of the two board/four slot behemoth, but rather the quality of the silicon within. Put another way, you can strap together regular H100 PCie cards today, but it wouldn’t match the memory bandwidth, memory capacity, or tensor throughput of the H100 NVL.

    Surprisingly, despite the stellar specs, TDPs remain almost. The H100 NVL is a 700W to 800W part, which breaks down to 350W to 400W per board, the lower bound of which is the same TDP as the regular H100 PCIe. In this case NVIDIA looks to be prioritizing compatibility over peak performance, as few server chassis can handle PCIe cards over 350W (and fewer still over 400W), meaning that TDPs need to stand pat. Still, given the higher performance figures and memory bandwidth, it’s unclear how NVIDIA is affording the extra performance. Power binning can go a long way here, but it may also be a case where NVIDIA is giving the card a higher than usual boost clockspeed since the target market is primarily concerned with tensor performance and is not going to be lighting up the entire GPU at once.

    Otherwise, NVIDIA’s decision to release what’s essentially the best H100 bin is an unusual choice given their general preference for SXM parts, but it’s a decision that makes sense in context of what LLM customers need. Large SXM-based H100 clusters can easily scale up to 8 GPUs, but the amount of NVLink bandwidth available between any two is hamstrung by the need to go through NVSwitches. For just a two GPU configuration, pairing a set of PCIe cards is much more direct, with the fixed link guaranteeing 600GB/second of bandwidth between the cards.

    But perhaps more importantly than that is simply a matter of being able to quickly deploy H100 NVL in existing infrastructure. Rather than requiring installing H100 HGX carrier boards specifically built to pair up GPUs, LLM customers can just toss H100 NVLs in new server builds, or as a relatively quick upgrade to existing server builds. NVIDIA is going for a very specific market here, after all, so the normal advantage of SXM (and NVIDIA’s ability to throw its collective weight around) may not apply here.

    All told, NVIDIA is touting the H100 NVL as offering 12x the GPT3-175B inference throughput as a last-generation HGX A100 (8 H100 NVLs vs. 8 A100s). Which for customers looking to deploy and scale up their systems for LLM workloads as quickly as possible, is certainly going to be tempting. As noted earlier, H100 NVL doesn’t bring anything new to the table in terms of architectural features – much of the performance boost here comes from the Hopper architecture’s new transformer engines – but the H100 NVL will serve a specific niche as the fastest PCIe H100 option, and the option with the largest GPU memory pool.

    Wrapping things up, according to NVIDIA, H100 NVL cards will begin shipping in the second half of this year. The company is not quoting a price, but for what’s essentially a top GH100 bin, we’d expect them to fetch a top price. Especially in light of how the explosion of LLM usage is turning into a new gold rush for the server GPU market.

    GPUs GTC2023 H100 HBM3 Hopper LLMs MachineLearning NVIDIA
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    bfteam
    • Website

    Related Posts

    NVIDIA CES 2023 Special Address Live Blog (8am PT/16:00 UTC)

    April 9, 2023

    Intel Leadership Shuffle: Stuart Pann in for IFS, Raja Koduri out for GPUs & off to AI Startup

    April 6, 2023

    The ASUS Vivobook Pro 15 OLED Review: For The Creator In All Of Us

    March 30, 2023

    Leave A Reply Cancel Reply

    • Popular Posts

    Gordon Moore, Intel Co-Founder, Tech Industry Visionary and Passes At 94

    April 11, 2023

    CES 2016: 34-inch 3440×1440 AIO Hands-On at GIGABYTE

    July 30, 2016

    CES 2016: ASRock Shows mini-STX 5×5 for Business and Education

    December 22, 2016

    CES 2016: MSI’s 27-inch 4K Gaming AIO with Full Sized Discrete GPU, the 27XT 6QE

    December 28, 2016

    Apple Updates MacBook Pro Family for 2018: More CPU Cores, DDR4, & Same Form Factors

    December 7, 2018

    Apple Announces Q1 FY 2018 Earnings

    December 20, 2018

    CES 2019: Digital Storm Spark, a ‘Mini-ITX’ with MXM RTX 2080

    August 1, 2019

    The 2020 Mac Mini Unleashed: Putting Apple Silicon M1 To The Test

    August 3, 2020
    • Privacy Policy
    • Contact Us
    © 2023 PC Central.Get expert reviews on best computers, CPUs, motherboards, graphics cards and other computer components. We provide the latest tech news and up-to-date product reviews to help you make the right choice. Join us now!

    Type above and press Enter to search. Press Esc to cancel.