AI-Driven Hardware and Compute Acceleration

Hardware and compute demands are exposing critical tooling gaps. The opportunity for more robust design tools to meet rising demands for power, compute, and advancement in AI has never been more wide open.

5 min read

In recent months, as more and more conversations turn to inference, I've had some great debates at dinners with other friends and experts in the industry about the mix of local on-device inference vs. cloud and the second-order effects of different mixes, like the impact on infrastructure, power consumption, and network demands. Yet regardless of how these discussions shake out, there is one consistent viewpoint: as workloads like reasoning and complex planning capabilities advance and need for more memory increases, compute and power will need to scale significantly. This observation may be somewhat obvious, but what is often overlooked is the outdated state of tools that we rely on for next-generation compute.

Over the past two decades, as compute demands have increased, the complexity of hardware has grown dramatically. The push for more compute has led to intricate trade-offs in power efficiency, size, and specialization (e.g., domain-specific accelerators). While it’s true that the tooling ecosystem has advanced somewhat—Electronic Design Automation (EDA) tools for defect prediction, layout optimization, and power management still have critical gaps. Incumbent software like Synopsys, Cadence, ANSYS, Siemens Questa, etc., have steep learning curves, and certain AI-specific features—like dynamic workload modeling—remain immature. Current verification approaches often fall short for AI workloads due to the dynamic nature of AI models, which can introduce edge cases that are difficult to predict or test exhaustive, limiting potential for more hardware software co-design for AI models outside of the largest providers. This is compounded by the sheer scale of data required for validating domain-specific accelerators and advanced packaging which can overwhelm existing tools.

The diminishing returns from traditional transistor scaling have shifted the focus from transistor density to architectural innovation and energy optimization. Scaling compute and power for AI isn’t just about adding more transistors or creating novel architectures; it demands rethinking energy efficiency across the entire stack—from silicon to software.

The need for energy-conscious design is particularly pressing given the rise in inference workloads. The distinction between cloud-based inference (optimized for large-scale, high-throughput environments) and on-device inference (which may prioritize low latency and power efficiency) introduces divergent design challenges. Bridging these requirements introduces additional challenges in areas like hybrid inference, where workloads are split dynamically between edge and cloud to optimize for latency, energy, or cost. As more intensive computation like reasoning or reinforcement learning (RL) based workloads increase at scale, power constraints are likely to be even more prevalent.

Model optimization (i.e. model pruning, quantization) and architectural advancements also offer ways to ease these power constraints. Frameworks that employ smaller language models such as Mixture of Experts (MoE) and multi-agent workflows are emerging as well. Smaller models inherently require fewer computations for each token generation, leading to lower power consumption per inference. For simplicity, if Power ≈ Total Compute (FLOPs) + Memory Overhead, and reducing model size drops FLOPs significantly more than you inflate the tokens in your prompt (memory), it'd seem to be less power demanding. However, if many thousands of tokens in a multi-shot prompt are needed to match the larger model’s performance, you'd end up using more total compute, and therefore more power, despite the smaller model. The trade-off between multiple inferences on smaller models vs. single inferences on large models is not always straightforward of course; caching, hardware utilization, and how optimized inference setup is can significantly impact net power consumption. Ultimately, however, while these optimizations and frameworks are important for different customer environments and use cases, I don’t believe they alone will materially ease the power constraints that data centers and edge deployments will grapple with in the next five to ten years.

During that same period, we’ll be entering a slowing/more post-Moore’s Law era. Although transistor scaling may continue, albeit more slowly and with higher complexity, gains in compute will come from architectural innovations, advanced packaging (like we’ve seen with 3D packaging, chiplets), and new specialized accelerators from entrants like Amazon, Meta, and OpenAI who already have programs underway. I believe we’ll also see continued advancements in robotics, which will further drive fresh demand for specialized hardware design. In both domains, high development costs and nascent supply chains pose significant barriers to large-scale hardware deployment.

To address these challenges, the industry needs robust new tools for hardware design management, planning, simulation, verification, and testing. AI-driven approaches to these problems, such as reinforcement learning for design optimization or generative models for architecture exploration, allow for faster design iterations, exploration of larger design spaces, and real-time simulations for interactive design and control. Shorter design cycles paired with more accurate planning and verification would help pave the way for truly next-generation hardware.

Below is a market snapshot illustrating some of these emerging tools, alongside the key segments in both chips, circuits, and broader hardware design that I’m seeing increased activity in.

If you’re building in this space, I’d love to chat.

hg@flyingfish.vc

‍

About the author

Heather Gorham

Principal

hg@flyingfish.vc

Share this post