Designing for GPUs, AI, & HPC

442 words. 2-minute read

Chat GPT made Artificial Intelligence (AI) mainstream, but AI utilization in data centers has been exploding. 📈

Data shows AI and GPU use in data centers is real.

THE PROBLEM FOR DATA CENTER OWNERS: Densities are rising. HPC and AI are being implemented. CPUs are being swapped for GPUs. Owners don't know how to design a data center without knowing how fast rack densities will increase. 

  • The most common thing we hear from owners is "I can't predict the future."
    • They believe anything from 20kW/rack to 100kW/rack is possible in their data center. 
  • Owners must make strategic capital requests. Their investment in the data center to support HPC is a single investment that must provide growth. 
    • But owners don't want to overbuild without knowing the future. 
  • The industry lacks best practices on when to change from air to liquid cooling for HPC workloads. 

THE KEY TAKEAWAY IS SCALE: No one can predict what rack densities will be, but you can create an effective design for HPC and AI by focusing on scale

  1. Scale = flexibility. It allows you to support your data center today and allows you to add higher-density workloads in the future. 

WHAT'S REQUIRED: Designing for scale requires creating design scenarios for different density profiles. How would a 20kW/rack design differ from a 40kW/rack design? How would 40kW/rack differ from 75kW/rack?

 

  1. By articulating scenarios, owners can map the commonalities in the infrastructure across all scenarios. The common infrastructure becomes the foundation that you can scale as the densities grow. 
    • For example, if an owner identifies chilled water as a foundation, their day 1 design uses that chilled water to support 20kW/rack. 
  2. In addition, the design projects how to leverage the same chilled water to scale beyond day 1 by installing infrastructure and capacity up front to make it accessible in the future. 
  3. This allows scale:
    • 40kW/rack: Add a Chiller Distribution Unit (CDU) that would support Rear Door Heat Exchangers. 
    • 75kW/rack: Install rack-mounted CDU for liquid-cooled chips, coupling Rear Door Heat Exchanger (40kW scenario) with liquid-cooled chips. 

CONCLUSION: It’s not all or nothing. Wholesale shifts aren’t required. The data center doesn’t have to be only liquid cooling. By planning for scale, owners are making responsible up-front investments without prohibiting their future capacity to support HPC.

 

GO DEEPER:

HPC market size projections today – 2026.

CIOs discuss what’s new and what’s next for HPC.

5 trends Shaping the Future of high-performance Computing

President Biden announces funding for Ethical AI.

Data Center Accelerator Market Share Report.