The chip race in Silicon Valley has been underway since long before generative AI applications supercharged tech companies’ appetite for them. In 2015, Google’s AI system AlphaGo, powered by a Google-designed chip known as a tensor processing unit (TPU), beat a professional human player at the ancient Chinese game Go. Since then, Google has unveiled a series of chips it designed in-house to power AI systems in its data centers. Most recently, in December 2024, Google announced a new AI chip for quantum computing named Willow. The company says Willow can complete a standard benchmark computation in under 5 minutes—one that would take one of today’s fastest supercomputers 10 septillion, or 1025, years.

Around the same time that Google was launching AlphaGo, researchers at IBM started investigating building AI hardware, too. By 2021, IBM had opened its AI Hardware Center in Albany, New York, to create a wider AI hardware-software ecosystem, and by 2022, IBM’s new Telum microprocessor chip had brought AI inferencing to IBM Z, the mainframes that run roughly 70% of the world’s transactions by value. In late 2024, IBM announced a new Spyre accelerator chip, which brought generative AI to IBM Z mainframes for enterprise users.

Meanwhile, AWS has been working on its own computer chips for AI projects since at least 2018. Fast forward to the 2024 AWS annual event, where Amazon announced its latest custom Trainium3 AI chip, which it is bringing to customers paired with partner Anthropic’s large language models. Many companies have snapped up AWS’s AI chips, including Apple, which drew attention at AWS 2024 as it was a rare moment where Apple discussed one of its vendors.

Not to be outdone, Microsoft, which has made chips to power its gaming functions for years, announced its own custom AI chips in 2023, around the same time tech giant Meta announced its own silicon chip plans. OpenAI is the latest to join the custom silicon party, though it hasn’t made any official announcements yet. While no details have been shared publicly, Reuters reported earlier this month that OpenAI was finalizing its chip designs, with plans to start fabricating them via TSMC in 2025.

Why has the chip race intensified recently? IBM’s Varshney says that when companies can customize chips to specific language models for the use cases they need, they can cut costs, improve latency or speed up the movement of data from one network to another. He points to an example: historically, when companies were doing fraud detection and examining incoming invoices, they used classical computing techniques because the volume was high, and they needed very quick latency. “They also had to do this a million times a day, so the cost would add up really quickly,” Varshney says.

Now that companies can optimize their chips for specific models, the cost of high-volume use cases goes down, and it becomes more cost-effective to use these solutions in production at scale. “So from an enterprise perspective, the use cases don't change,” Varshney says. “But now we start to go after the high-volume ones where earlier the ROI didn't exist.”