Please enable JavaScript.
Coggle requires JavaScript to display documents.
AI Infrastructure - Coggle Diagram
AI Infrastructure
Model
Data
Clouds/orchestration
-
Rents compute and schedules it
hyperscalers (AWS, Azure, Google Cloud)
Neoclouds (CoreWeave, Nebius, Nscale) - purpose-built GPUaaS
orchestration software (Kubernetes, Slurm, Ray) turns thousands of G{Us into one usable supercomputer
Constraints: Requires advanced orchestration, monitoring and traffic monitoring
small config issues create performance bottlenecks
Idle GPUs are money burning -> utilisation is key
Dominance: Big-3 hyperscales (Google, AWS, Google cloud) but neoclouds (Nvidia-backed) are fast growing insurgents
Trend: capacity is supply-constrainted
(Microsoft disclosed an S$80B backlog of azure orders it cant fulfil due to power constraints)
Fuel. enterprise data, synthentic data, labelling, curation, retrieval pipelines
Constraints: Industry is reaching peak data. high quality human text is being exhausted, pushing reliance on synthetic data and proprietary data moats.
Data governance, copyright etc.
Dominance/trend. Scale AI (labelling), hyperscalers (proprietary data) and whoever owns unique real-world data
Applications
Users
Demand layer. Consumers, enterprises, developers and now autonomous agents
-
Trend: Scale multiplication: the shift from humans-in-the-loop to agentic consumption, where one task spawns thousands of model calls, multiplies the demand on every layer below it
SW products built on top of models (copilots, coding assistants e.g. Cursor, Claude code), vertical tools (legal, medical, defnse) and agent frameworks
Constraints
-
Unit-economics problems: agentic models require far more tokens per task, so even a 90% drop in inference cost, may not produce cheaper enterprise AI. Token costs can exceed the human labour being replaced
Trend: No settled winner, most contested, lowest barrier layer. Trend is vertical specialisation and race to prove ROI
Foundation Models (LLM, Vision, multimodals, VLA). The engine.
Training (building the models) VS Inference (running it) are economically distinct workloads
Constraints: Training cost, data availability, capability gains per dollar getting harder.
DeepSeek is a warning (Jan 25): match top-tier US performance at a fraction of training cost, wiping 600B from NVIDIA market cap in a day, showing algorithmic efficiency can suddenly devalue brute-force compute spend
Dominance: OpenAI, Anthropic, Google DeepMind, Meta (Llama), xAI, DeepSeek/Qwen (china)
Trend: Open-weight models commodising middle, frontier labs racing on reasoning and agentic cap
Useful POV
Scale = Capability
three things coming tgt:
- Transformer architecture: (Attention 2017) parallelism > fits GPU (hardware)
- Scaling law: OpenAI (2020) - predictable loss reduction> makes it a predictable investment
- ChatGPT (2022): proves consumer demand and justified investment
- RLHF/instruction tuning makes the model follow intent rather than just complete text
Physical resource limits to
Compute, power, water
Single Point of Failure / Bottlenecks exist as various part of the AI stack:
- memory: HBM capacity
- Networking: connectivity and data access
- Storage/data
- Power
but actual bottleneck is location and workload specific
-
Race going on
-
Where is the money?
- Semiconductors (Nvidia, AMD, micron, TSMC)
- Power (Constellation)
- data centre real estate
- model labs (OpenAI, anthropic, google deepmind, xAI)
-
-
Societal concerns
-
resources (energy, water) VS
local community
Dominance
(handful of firms hold large market indices share)
CEOs are joining political scenes as well