Guo Mingchi: Integrating into Nvidia Ecosystem, LPU Production Will Surge 10-Fold, With Significant Impact on PCB Supply Chain

SnapshotLaborer

2026-03-17 03:36:33

Abstract generation in progress

Nvidia Incorporates Groq LPU Technology into Rubin Platform, Triggering a Deep Supply Chain Transformation

At Nvidia GTC, CEO Jensen Huang announced the launch of the Nvidia Groq 3 LPU chip, officially integrating it into the Vera Rubin platform as the core inference acceleration component for next-generation AI data centers.

Renowned Apple supply chain analyst Ming-Chi Kuo immediately released a supply chain investigation report, pointing out that after Nvidia’s investment in Groq, the shipment forecast for LPUs has been significantly raised. The total shipments for 2026 to 2027 are expected to reach 4 to 5 million units, representing over a tenfold increase compared to historical annual production.

Kuo believes this explosive growth is driven by two main factors: first, the deep integration of LPU with Nvidia’s CUDA ecosystem greatly lowers development barriers; second, the rapid expansion of ultra-low latency inference scenarios such as AI agents, real-time consumer applications, and physical AI. He also notes that mass production of LPU/LPX rack systems will have a significant impact on the PCB supply chain, with WUS Printed Circuit potentially becoming a key beneficiary.

Huang Renxun GTC Announcement: LPU Officially Becomes the Seventh Pillar of the Rubin Platform

In this year’s GTC keynote, Huang revealed how Nvidia integrated the IP technology acquired from Groq last year into the Rubin platform. Nvidia Groq 3 LPU, as an inference acceleration chip, becomes the seventh core building block of the Rubin platform, following Rubin GPU, Vera CPU, NVLink 6 switch, ConnectX 9 smart network card, Bluefield 4 data processing unit, and Spectrum-X switch.

Technically, Groq 3 LPU differs markedly from mainstream AI accelerators. Most rely on HBM as working memory, whereas each Groq 3 LPU has 500MB of SRAM—similar to the high-speed cache memory used in CPUs and GPUs. Although this capacity is much lower than the 288GB HBM4 in Rubin GPUs, its bandwidth reaches 150TB/s, far exceeding the 22TB/s bandwidth of HBM.

For bandwidth-sensitive AI decoding operations, Groq 3’s ultra-high bandwidth offers significant advantages in inference scenarios, especially suitable for deploying cutting-edge AI models requiring large-scale, low-latency, highly interactive outputs.

Supply Chain Forecast: Shipments Expected to Reach 4-5 Million Units in 2026-2027

According to Kuo’s latest supply chain research, Nvidia’s investment in Groq has substantially raised the shipment forecast for LPUs. He predicts that total LPU shipments in 2026-2027 will reach 4 to 5 million units, with 30-40% in 2026 and 60-70% in 2027. Compared to historical annual production, this represents an increase of over ten times.

At the rack level, Nvidia plans to increase LPU density from 64 to 256 units per rack to maintain ultra-low latency during inference decoding and to meet the expanding KV cache requirements driven by long-context reasoning.

Kuo expects the new rack architecture to enter mass production between Q4 2026 and Q1 2027, with shipments rising from 300-500 units in 2026 to 15,000-20,000 units in 2027.

Ecosystem Integration as a Key: Three Major Technical Nodes Determine Deployment Speed

Kuo points out that the rapid growth in LPU demand fundamentally stems from its deep integration within Nvidia’s ecosystem. The integration with Nvidia CUDA significantly reduces application development and deployment barriers, allowing developers to utilize LPU computing power without restructuring existing workflows. Meanwhile, the rapid expansion of AI agents (such as programming agents), real-time consumer applications, and physical AI for ultra-low latency inference further drives LPU demand.

He highlights three key technical integration nodes to watch: First, at the network architecture level, whether rack-level interconnects can seamlessly connect via NVLink Fusion and RealScale; second, at the developer interface level, whether Nvidia NIM can enable developers to deploy workloads directly without distinguishing between GPU and LPU; third, at the compiler level, whether TensorRT-LLM can support the “pre-compile” architecture of LPU. Kuo believes that the pace of advancing these three integrations will directly determine the speed and depth of LPU’s large-scale deployment.

PCB Supply Chain Enters a New Cycle: WUS Printed Circuits Could Become a Key Beneficiary

Kuo emphasizes that mass production of LPU/LPX rack systems has significant implications for the PCB supply chain. He notes that LPU/LPX racks represent the first large-scale commercial deployment of M9-grade CCL (copper-clad laminate) materials, with WUS printed circuits playing a critical role in this supply chain.

M9-grade CCL materials require extremely high manufacturing processes, involving breakthroughs in processing quartz glass fabrics for high-layer-count boards. Kuo believes that if LPU/LPX rack systems ramp up smoothly, it will not only make a substantial contribution to WUS’s 2027 performance but also validate the company’s technological capabilities in high-end manufacturing, potentially catalyzing a new growth cycle for the entire PCB industry.

Risk Warning and Disclaimer

Market risks exist; investments should be cautious. This article does not constitute personal investment advice and does not consider individual users’ specific investment goals, financial situations, or needs. Users should consider whether any opinions, viewpoints, or conclusions herein are suitable for their particular circumstances. Invest at your own risk.

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.