Not the cloud, not the algorithm: the multiverse is the Silicon

AI: The Artificial Intelligence, isn’t just another trend in the industry, it's a turning point which unlike previous tech revolutions, this one is not only about the speed, the scale or the ease and connectivity, it introduces something fundamentally different: a shift in how machines “think.”

It is no longer a proposed theory gathering dust in some research paper. It is being adapted in the real world in more ways than we can imagine, sometimes in ways we probably should have stress-tested before deploying.

Beyond algorithms and models, at its core, AI is running on trillions of switches etched into the silicon, which are so in sync with each other that even if one of the trillions loose it’s sync the billion dollar companies go dark within milli-seconds, the failure of it spreads faster than any wildfire.

To truly understand how AI works, we need to look beyond the code, and into the world of semiconductors.

And hence AI doesn't live in the cloud; it lives on silicon. Every 'thought' generated by a Transformer is actually a rhythmic chorus of billions of transistors switching on and off. It's less 2001: A Space Odyssey and more a very, very fast calculator that read the entire internet and never forgot a word of it.

These specialized silicon chips perform the massive parallel computations required for training and running neural networks. Far from the abstract idea of “the cloud,” AI ultimately depends on physical systems, data centers filled with hardware (consuming more electricity than that of a small developing nation), that forms the backbone of modern intelligence.

What really is this intelligence?

This might be a controversial take: The intelligence that we keep talking about is not rocket science, it is a purely computation of logic and maths which grows on the abundance of data which we consume and generate, leaving it as our digital footprints onto the internet over the decades.Every search, every post, or any interaction that calls you attention, all of it, is data.

The entire concept is based on machine learning and training models based on your data, parameters and algorithms but when you go to the core of these fancy terms it is just a large number of mathematical operations. These operations range from simple addition subtraction to comparison and complex matrix manipulations. The same operations which kids used to dread back in schools are now one of the most demanding part of more than 75% of engineering jobs.

This isn't any magic, it is arithmetic, just at a fast rate which the human brain could never, somewhere at the speed faster than our blink.

The operation ranges from thousands to millions to billion operations per second, and to back such powerful computation, a rock solid hardware foundation is what we require which brings us back to where we started, not the cloud, not the algorithms.

the Silicon.

The Silicon ARC

Every civilization is marked by material which became the basis of it.The Stone Age shaped flint into tools and called it progress. The Bronze Age melted copper and tin and built empires. The Industrial Age forged steel and rewired the physical world. Each age took a material, pushed it to its limits, and in doing so, pushed humanity somewhere new.

We are living in the Silicon Age. And we are only beginning to understand what that means.

Silicon: neither a best conductor nor a good insulator, it sits stubbornly in between a semiconductor.

That middle ground is exactly where the power lies, the most powerful material of the 21st century.
By introducing small traces of supporting cast such as phosphorus, boron etc is called doping, which gives full control of how you want electricity to pass through silicon. Add phosphorus, and you get extra electrons floating around, eager to move. Add boron, and you create "holes" absences of electrons that act as positive charge carriers. Two flavors of silicon, each behaving differently, precisely by design.
Till here there is neither a single word of code nor any algorithm, right now it’s manipulation of the source element at atomic level to adjust the current flow through a crystal. But the programming has already begun even before the computer does.

Silicon’s first born

Once the current flow is adjusted the next step is to find a balance between si acting as a conducting and blocking current, or maybe even better can it switch between the two?

Yes ofc it can, something as simple as a Transistor, the first born of silicon.

Layer doped silicon in the right order, applying a voltage source and voila the basis of a transistor is ready through which the flow of current through the whole thing is now in your control. Acting as a switch on-off and in terms of Si language 1-0 is the breakthrough of the implication. Representing 0 and 1 reliably, allows you to represent any logic and perform any computation that has ever existed.

All of that, from a piece of sand, doped with a few alien atoms, no bigger than a grain of gold in this industry.

Scaling up or Scaling down?

The first ever transistor, built in 1947 at bell labs was a chunky hand assembled thing almost the size of your palm to a single chip holding billions of transistors on a wafer almost the size of a fingernail.

To put things into perspective for you, if each transistor on a modern chip was a person, the entire population could fit into that chip with still room for more.

This isn’t incremental progress, it’s an entirely different class of achievement, one that unfolded so steadily and so quietly that we forgot to be amazed by it.

You know what is the problem with making things smaller forever? the physics does not bend for your plan.

Today the world is looking at 2nm technology which started as a millimeter for the first ever transistor. To understand, we are now trying to achieve the width of a transistor less than that of a DNA. At these scales, the neat, obedient rules of classical physics start to blur. As a matter of fact not just classical but Quantum mechanics also starts to become inconvenient.

The walls of the transistor stop being walls. Electrons leak through like ghosts walking through doors. And a leaky transistor is the one not in your control.

Physics itself is pushing back. The question driving the entire semiconductor industry right now isn't just how do we make it smaller , it's what do we do when smaller stops working?

Silicon’s Second born

A single transistor on it’s own is useless, it’s like a lone lego piece with no set to belong to. The magic happens when you start connecting them, wiring them in the right configuration to build a logic gate, the basic building block of complex operations.
Wire millions of them together and you get a circuit, wire circuits together to get a processor. Each layer acts as the building blocks for the next one and here is when the silicon is now not just a material but an architecture, the physical structure capable of supporting the “intelligence”.

This is where sand becomes silicon, and silicon becomes the brain the neural runs on.

Special needs

For decades, the CPU, central processing unit, the powerhouse of computing, was the holy grail at doing tasks sequentially one after the another with precision and flexibility.

But here is a catch: artificial intelligence is running neural networks which follow a parallel computation of running millions of matrices simultaneously. The CPU really struggled to catch up at multitasking at the same time. It did not give up but the lag was not keeping up with the world.

To fulfill the special needs of ai, Nvidia’s GPU, the graphics processing unit was put to test. Originally designed to render video game frames (another famously parallel problem), unlocked their power to compute parallely through CUDA. And almost by accident this became the engine for AI.

So NVIDIA didn’t invent AI, they just happened to have built the hardware which could support the AI. That's the evolution: accidental fit → deliberate design.

The evolution did not stop here, Google built a custom silicon, TPU, tensor processing unit, specifically built for neural network math was further followed by the NPU, neural processing unit, embedded in phones, laptops, cars and what not. Each generation is not adapted to AI's needs, but designed around them from the start.

Type	What it’s good at	Example	Where it’s used
GPU	Massive parallel computation	NVIDIA A100 / H100	AI training, graphics
TPU	Optimized for tensor math	Google Trillium	Cloud AI workloads
NPU / ASIC	Efficient, task-specific	Apple Neural Engine	Phones, edge devices
FPGA	Reconfigurable hardware	Xilinx Versal	Custom acceleration

The algorithm follows the hardware. And the hardware follows the math. It always has.

Is AI hungry?

AI’s hunger for the hardware is often referred to as an appetite built into architecture.

ChatGPT isn't smart because of clever code alone: it's smart because someone built a machine hungry enough to feed it.

At the heart of every large language model is a mechanism called attention. The idea is elegant: when processing any word in a sequence, the model doesn't just look at that word in isolation. It looks at every other word simultaneously and calculates: how relevant is each one to this one? What relationships matter here?

Now just perform simple math, from one word to a sentence to a paragraph and so on, it is just thousands of tokens at once, the computation becomes almost violent at this stage. This isn't a software bottleneck, no algorithm or performative code can make this math cheaper.

The only thing supporting this multitask is the hardware .

To put things into perspective imagine when chatGPT answers your question, somewhere in a data center GPU cores fired in concert to produce that response. Now wherever a token is generated, it requires the entire context to be passed on. Every word on your screen as a response given by chatGPT is an orchestra played between the tokens within a fraction of a second.

Why are semiconductors a power story now?

At some point in the last decade, chips stopped being a technology story and became a power story.

The geography of modern chip production is in a way where, NVIDIA designs the GPUs. ARM designs the architecture inside almost every mobile chip. ASML, a Dutch company most people have never heard of, manufactures the only machines in the world capable of printing transistors at cutting-edge scales, extreme ultraviolet lithography machines that cost $200 million each and take years to build. And TSMC, in Taiwan, actually fabricates the most advanced chips on Earth, for nearly everyone.

This is a supply chain so concentrated, so specialized, and so fragile that a single factory fire, a single political miscalculation, a single blocked strait could ripple through every AI lab, every smartphone manufacturer, every data center on the planet.

Semiconductors are the new oil. Except oil is just carbon that pooled underground for a million years. Anyone with a drill can reach it. The scarcity in chips isn't the material because silicon is literally sand, it's the knowledge to process it at 2 nanometer precision, the decades of accumulated engineering that cannot be downloaded or reverse engineered overnight.

That's why the United States passed the CHIPS Act. That's why China has spent hundreds of billions trying to build a domestic semiconductor industry. That's why Taiwan's political status is not just a regional issue, it is a technology issue, and therefore a global one.

Whoever controls the silicon, controls what gets built next.

Is it saturated?

The industry swear by Moore’s Law: that the number of transistors on a chip doubles roughly every two years, more than a law of physics, it reflects ambition, a prophecy that the industry willed into reality.

But what next? At 2nm the Quantum effects have become a real pain, heat, power all have led to crises to an extent where the data centers have now gone under water. The returns on scaling are diminishing fast.

Is it time to question if scaling down the size is the answer or we need a change in approach? Maybe a time to pull another research paper out of the lab to put to test?

But even with so much confusion and unanswered questions there are 3 parallels running around i.e the 3-D stacking, the Neuromorphic computing and the Photonic computing.

3-D stacking is essentially when you can't spread out, you stack the layers of silicon vertically connected by vias, more compute in the same footprint, without shrinking a single transistor.

Neuromorphic computing is where these chips are not just running the neural networks but also mimic the biological brain. IBM's NorthPole, Intel's Loihi. Instead of the traditional fetch-execute cycle, these chips process information the way neurons do, asynchronously, efficiently, only firing when there's something worth firing for.

This also introduces a philosophy of low power computing, maybe for the next time.

And lastly, Photonic computing is a step towards solving a few problems such as speed, heat generated etc. Here we replace the electrons with the photons, as lights travel faster, generate less heat and carry multiple signals simultaneously. The chips that compute at the speed of light, with a fraction of the energy, are too significant to ignore.

But the question still exists? Are we going to go smaller…..or maybe not

The Chips You've Never Seen, Running the World You Live In

Some of the real world heroes of this movie are:

NVIDIA H100. The chip the AI industry runs on. One unit costs around $30,000. A training cluster costs hundreds of millions. When you hear that a model took six months to train, what you're actually hearing is: six months of thousands of H100s running flat out, drawing enough power to light up a small city. Somewhere in a windowless data center, the future is being computed. The electricity bill is somebody else's problem.

Apple M-series. A completely different bet. Instead of brute power, Apple went after efficiency stacking CPU, GPU, NPU, and memory onto a single die so data never has to travel far to be processed. The result is a laptop that runs AI locally, on battery, without asking a server for permission. The M-series didn't make AI more powerful. It made AI yours.

Google TPU v5. Built to do one thing. Is it faster than anything else alive? Google doesn't sell you the chip, they use it themselves, and rent you the output. You'll never touch a TPU. You've probably already benefited from one today without knowing it. That's the point.

Qualcomm Snapdragon NPU. The most invisible chip on this list, embedded so deep you'd need a schematic to find it. It's why your phone transcribes speech, sharpens photos, and runs on-device AI without breathing a word to the cloud. Not flashy. Just quietly everywhere.

And these are just the names you've heard of.

Each one of them is serving completely different tasks than the other but all of them answer the same question and i.e is this what the “intelligence” lives in? And the fact that I just gave four answers to the same question implies how open to innovation this still is.

In the end, it’s still just sand, engineered to think. And the next evolution belongs to whoever configures it better.