AI processor (AIP) startups more than doubled from 2018, and now there are 138 companies building dedicated AI silicon across 18 countries. The market is worth $85B today and driven mainly by inference (cloud & local) and edge deployments (wearables to PCs).
Of the 138 AIP suppliers, 64% are privately held, and most were founded within the last 7 years. As might be expected, the companies are focused on the cloud/local inference and edge; training remains capital-intensive.
However, the startup wave has crested: The peak formation year was 2018 (by then 54% of startups had already appeared). Interesting, the rise in the number of startups began before Nvidia stunned the industry with its explosion in sales. One would think that Nvidia’s success was the honey that attracted all the ants, but a good number (54%) of the startup companies got going before that. Since 2022, the sector has averaged three acquisitions per year.
An AIP is a chip optimized to run neural-network workloads fast and efficiently by doing huge amounts of tensor math while moving data as little as possible. The processor types of offerings span GPUs, NPUs, CIM/PIM, neuromorphic processors, and matrix/tensor engines. (CPUs and FPGAs are excluded from market totals due to functional generality. Common patterns include tensor/matrix engines, near-compute SRAM + HBM/DDR, NoC fabric, and PCIe/CXL/NVLink/Ethernet off-chip links.
The combination of LLM inference at scale, edge AI proliferation, and memory-bound workloads is reshaping silicon roadmaps.


Figure 1. Population of AIP suppliers Source: Jon Peddie Research
Of the five major market segments, Inference (cloud and local) and Edge (wearables to PCs) are the areas where the companies are putting most of their efforts.


Figure 2 Market segments by companies Source; Jon Peddie Research
Of the privately held companies, has the window closed? The peak in startups happened in 2018, five years before Nvidia’s sales exploded. And by then, 54% of the startups appeared.


Figure 3. The rate of AI processor startups being founded Source: Jon Peddie Research
Since 2022, there has been an average of three acquisitions every year.
AIPs are being offered as GPUs, NPU, CIMs, Neuromorphic processors, CPUs, and even FPGAs. Our report does not include CPUs and FPGRAs in its evaluation of the market because their generality makes them impossible to differentiate by function.


Figure 4. AIP distribution Source: Jon Peddie Research
What an AIP looks like inside
The world of AI processors spans cloud services, data center chips, embedded IP, and neuromorphic hardware. Founders and engineers address the gaps that CPUs and GPUs can't fill: managing memory, maintaining high utilization with small batches, meeting latency goals on strict power budgets, and providing consistent throughput at scale. Companies develop products along two main dimensions: the type of workload—training, inference, or sensor-level signal processing—and the deployment tier, from hyperscale data centers to battery-powered devices.
Most technical work centers on memory and execution control. Compute-in-memory and analog techniques reduce data transfers by performing calculations within memory arrays and keeping partial sums nearby. Wafer-scale chips store activations in local SRAM and stream weights for long sequences. Reconfigurable fabrics alter data flow and tiling during compile time to optimize utilization across multiple layers. Training chips emphasize interconnect bandwidth and collective communication, while inference chips prioritize batch-one latency, key-value caching for transformers, and power efficiency at the edge.


Figure 5. A typical AIP Source: Jon Peddie Research
Adoption depends on go-to-market strategies and ecosystem backing. Cloud providers incorporate accelerators into managed services and model-serving frameworks. IP vendors collaborate with handset, auto, and industrial SoC teams, offering toolchains, models, and density roadmaps. Edge specialists release SDKs that compress models, quantize to INT8 or lower, and map operators onto sparse or analog units while achieving accuracy targets. Neuromorphic groups publish compilers for spiking networks and emphasize energy efficiency and latency on event streams. Refinements in compilers, kernel sets, and observability tools often outweigh peak TOPS.
Competition varies by tier. Training silicon focuses on cost per model trained considering network, memory, and compiler constraints. Inference silicon targets cost per token or frame within latency limits, using cache management and quantization as tools. Edge devices compete on milliwatts per inference and toolchain portability. IP vendors compete on tape-out time, PPA goals, and verification support. Research projects balance speed to market against experiments that may alter the trade-offs between memory, compute, and communication.
Throughout this process, teams customize designs to meet specific needs, such as attention depth, parameter count, activation size, sparsity, and precision policies. When companies synchronize silicon, compiler, and deployment tools, they reduce integration costs and speed up the transition from models to high throughput. Customers then have multiple options: expand in the cloud, scale up with wafer-scale systems, embed NPUs in SoCs, or accelerate compute at sensors using analog and neuromorphic chips.