Skip to main content

Command Palette

Search for a command to run...

Can AI Catch a Killer Without a Brain?

A murder mystery that reveals how neural networks really work

Updated
15 min readView as Markdown
Can AI Catch a Killer Without a Brain?
K
Hi! I'm someone who's really passionate about writing and loves to use it as a medium to explore new ideas. Tech fascinates me and I'm always open to learning more.

Sweat trickled down his forehead, the salty taste reminding him of the blood left at the crime scene. He reached for his laptop, desperate for something, anything, to crack the case.

One would think he was about to examine the crime scene photographs he had received a week ago. Instead, frustrated with being unable to solve the case, he typed three letters.

They spelled "GPT".

It sounded ridiculous at first. After all, how could an AI solve a crime? It has never seen a crime scene, questioned a suspect, or understood what guilt feels like. Yet systems built on neural networks are increasingly being used to analyse evidence and identify patterns that humans might miss.

What Kind of AI Are We Even Talking About?

To understand whether AI can solve a crime, we first need to understand how it "thinks." Broadly speaking, AI can be divided into three categories: Artificial Narrow Intelligence (ANI), which performs specific tasks such as self-driving cars and smart speakers. Generative AI, which includes systems such as ChatGPT that can generate content, and Artificial General Intelligence (AGI), a hypothetical system capable of performing any intellectual task a human can perform. While AGI captures public imagination, today's AI systems remain firmly within the first two categories. The tool sitting on the detective's screen belonged to this second category.

When a network is trained on vast amounts of data, it can give rise to systems such as Large Language Models (LLMs) like GPT. A common approach used to train such systems is supervised learning, which involves learning relationships between inputs and outputs.

Teaching a Machine Without Teaching It

In supervised learning, a model is shown thousands of labelled examples, an input alongside its correct answer, until it begins to recognise the relationship between the two. A spam filter is trained this way. It is shown thousands of emails marked spam or not spam until it learns to tell them apart on its own. The detective, in his own way, had spent years doing something similar, learning to read a crime scene through hundreds of cases that came before this one. But unlike the detective, a neural network does not learn through instinct built over years. It learns through a far more mechanical process, and that process is worth slowing down for.

But how does a machine learn these relationships in the first place? This is where neural networks come in.

The Case

The case involved the mysterious death of an old man. He had two sons and a daughter, and had lived alone since his wife's death, a wealthy widower with time on his hands.

It was a long weekend, and his children had come to visit. His sons arrived with their wives who were both engineers, one specialising in metallurgy and the other in electrical engineering. His eldest son was soon going to become a father. His daughter, meanwhile, had long compared her marriage to her parents', and felt it fell short.

It was when the family stepped out to buy groceries from a nearby supermarket that the murder took place. When they returned, they found the old man dead.

The crime scene offered very little to work with. A corpse, a few drops of blood, and very little else to go on.

The detective certainly didn't know where to begin. But perhaps AI would notice something he couldn't.

Neural networks

Neural networks are a family of model architectures designed to identify complex, nonlinear patterns in data. During training, a neural network automatically learns the optimal combinations of features from the input data in order to minimise loss and improve its predictions. In the case of our detective, those inputs could include family relationships, witness statements, and evidence collected from the crime scene.

At the heart of every neural network lies an artificial neuron. Despite the name, it bears only a loose resemblance to a biological neuron. An artificial neuron simply receives numerical inputs, performs a calculation on them, and produces an output.

Neurons become truly useful when they are connected together in layers. In a dense layer, every neuron is connected to every neuron in the next layer. The output produced by one neuron becomes part of the input for neurons in the following layer.

Weights, Bias, and the Spark That Fires a Neuron

Not all pieces of information are equally important. A neural network captures this idea through values known as weights. In a neural network, weights determine how much importance is assigned to each input. When information passes through the network, inputs are multiplied by their corresponding weights. During training, the network gradually adjusts these weights, allowing it to improve its predictions over time. Alongside weights, neurons also possess a parameter known as a bias. While weights determine the influence of incoming information, the bias helps determine how easily a neuron becomes activated. Together, weights and biases shape the decisions made by the network.

There is one more ingredient at play here. After a neuron combines its inputs, weights, and bias, the result is passed through what is known as an activation function. This decides whether, and how strongly, the neuron should pass its output forward. Some activation functions, such as the sigmoid function, squeeze every value into a smooth range between 0 and 1. Others, such as ReLU (short for Rectified Linear Unit), simply let positive values through and block out the rest. Without an activation function of some kind, a network could only ever draw straight lines through data. With one, it can bend and curve around far messier patterns, much like a detective who has to weigh contradictory clues instead of following one tidy rule.

The remarkable aspect of neural networks is that these values are not manually assigned by programmers. Instead, the network gradually learns appropriate weights and biases during training by repeatedly adjusting them in response to its mistakes.

Learning to Read a Digit

To understand how information flows through a neural network, consider a system trained to recognize handwritten digits. Each image is made up of 28 × 28 pixels, giving a total of 784 pixels. Every pixel is represented by a neuron in the input layer, with each neuron storing a value between 0 and 1 that reflects the brightness of its corresponding pixel. This value is known as the neuron's activation.

At the opposite end of the network lies the output layer, which contains ten neurons corresponding to the digits 0 through 9. The neuron with the highest activation represents the digit the network believes it is looking at.

The input and output layers are relatively straightforward to understand. The real mystery lies in the hidden layers between them. How can a collection of neurons transform a grid of pixel values into a confident prediction? The network learns to recognize increasingly complex patterns. Early neurons may respond to simple features such as edges, lines, or curves. Later neurons can combine these simpler features to identify more meaningful structures, such as loops and shapes, eventually allowing the network to recognize an entire digit.

Getting there takes more than a single glance at one digit. Researchers typically train such a network on tens of thousands of handwritten samples, nudging its weights slightly after each one, and often cycling through the entire dataset many times over. Each full cycle is known as an epoch. On its own, the adjustment after any single image is barely noticeable. Repeated across enough images and enough epochs, those small nudges accumulate into a network that can read a digit it has never encountered before.

A detective rarely solves a case using a single clue. Individual pieces of information may appear meaningless on their own, but when combined, they can reveal a pattern that was previously hidden.

How does the network know whether it is improving?

At the beginning of training, the network's weights and biases are initialized randomly and due to this its predictions are often inaccurate. To determine how far its predictions are from the correct answers, the network uses something known as a cost function.

For each training example, the cost function measures the difference between the network's output and the actual answer. A larger difference results in a higher cost, while a smaller difference results in a lower cost. By averaging this value across many training examples, we obtain a measure of how well the network is performing overall.

The objective of training is simple, it is to minimise the cost function. By repeatedly adjusting its weights and biases in ways that reduce the cost, the network gradually improves its predictions over time.

Rolling Downhill

Picture the cost function as a vast, hilly landscape, where height represents how wrong the network currently is. Training is the process of finding the lowest point in that landscape. At each step, the network senses the slope beneath it, a method known as gradient descent, and takes a small step downhill. It does this again, and again, across thousands of examples, gradually descending towards a set of weights and biases that performs well. The trouble is, in a landscape with many hills and valleys, the network needs to know not just that it should move downhill, but exactly which of its weights to nudge, and by how much.

Knowing that a prediction is wrong is not enough. The network must also determine which of its many connections contributed to the mistake. Backpropagation works by tracing the error backwards through the network and estimating how much each connection influenced the final prediction. Connections that contributed heavily to the error are adjusted more aggressively than those that had little impact.

Feeding the Machine the Case File

The detective typed everything he knew about the crime. Why was it so hard for him? If a neural network could learn to recognize handwritten digits by assigning different weights to different patterns, perhaps it could do the same with a murder investigation. There was no one at home, no sign of forced entry, the murderer had not left any sign of fingerprints and the only blood that was left at the scene was from the old man's mouth. He also entered details of the housekeeper, investigations done on the neighbours and an elaborate description of the family. The forensic team reported an alleged acid poisoning. Apart from this he pasted a picture of the crime scene, and the corpse.

To a human investigator, these details appeared unrelated. To a neural network, they were simply inputs. Individually, none of them pointed conclusively toward a killer.

The model breaks the information down into patterns it can process. The written details are converted into numerical representations, while the image is analysed for visual features. The absence of forced entry becomes one signal. The missing fingerprints become another. The acid poisoning report becomes a stronger forensic signal and so on.

These inputs then move through the network's layers. In the earlier layers, the model identifies simpler signals like who had access, who had motive, what evidence was physically present, and what details seemed unusual. In the deeper layers, these signals are combined. Motive alone is not enough. Opportunity alone is not enough. When it all appears together, the pattern becomes harder to ignore.

This is where weights matter. The model gives lower importance to clues that could apply to many people, such as general family conflict. It gives higher importance to clues that narrow the situation, such as forensic evidence, timing, and details that directly connect a suspect to the method of murder. Slowly, the activations begin to shift toward one possibility more strongly than the others.

The model began running through everyone who had been near the house that day, weighing each person the way it had been taught to: by how strongly their data lined up with motive, access, and evidence.

Running Through the Suspects

The housekeeper was the first to register a signal. She had keys to the house and was alone with the old man more often than anyone else. But when the inputs were checked, there was nothing, no financial strain, no history of conflict, nothing in her statement that pulled the network toward suspicion. Her signal rose for a moment and then settled back down, too weak to matter.

The neighbours came next, mostly because their statements contradicted each other on small details, what time they'd heard a door close, whether a car had left the building. Inconsistent testimony is the kind of thing that looks important to a human ear. But inconsistency alone carried very little weight. None of them had access to the house, and without that, the signal had nowhere to go.

Even the electrical engineer son's wife drew a brief flicker of attention. She'd argued with the old man over dinner the night before, loudly enough for the housekeeper to mention it. But an argument was not access, and access was not motive. The signal touched the surface and faded.

Then there was the daughter. Her dissatisfaction with her marriage had come up more than once while the detective was gathering statements, and it was tempting to read tension into every detail of her relationship with her father. But discontent in a marriage and motive for murder are not the same input, no matter how much they might look alike on the surface. The model held her activation low. There was nothing connecting her to the acid, nothing connecting her to the missing fingerprints, nothing beyond a strained marriage that had nothing to do with the old man's death.

The Signal That Wouldn't Fade

One signal, however, refused to fade. It kept reappearing, and each new piece of information made it stronger instead of weaker.

The old man had a special soft spot for his daughter. His shares were not equally split. His two sons received significantly less. This angered one of the sons' wives, the one who was pregnant, the one who was a metallurgy engineer.

Her resentment wasn't only about the money. She had grown used to her husband staying silent every time the subject of the will came up, the same quiet defeat she recognised in the old man's daughter, who had spent her own marriage measured against a standard she could never quite meet.

What turned that resentment into something else happened that very morning. While the family got ready for the grocery run, she overheard the old man telling the housekeeper that he intended to revise the will once more, not to even things out, but to leave his daughter an even larger share, convinced that her struggling marriage meant she needed it most.

For a woman who had already watched her husband's share shrink once, the thought of it shrinking again, with a child on the way, wasn't something she could simply resent from a distance anymore. The anger she'd been carrying for years finally met its end.

While inspecting the family members, the detective learned that during the grocery trip, she had stepped away to rest in the car and take her medication. Access to acids would not have been difficult for someone with her professional background.

Now there were two signals reinforcing each other: motive, and the means to act on it. The network's prediction climbed higher, but it still hadn't crossed the threshold. A theory without proof was still just a theory.

The image pasted into the model from the crime scene hinted at a faint stain on the carpet. Initially, it appeared insignificant. It was the kind of detail a human eye might walk past without a second look. However, testing revealed that it contained the same ingredients found in the pregnant woman's medication.

Motive, opportunity, and forensic evidence had finally converged to a single point. Every smaller signal that had once seemed disconnected, the unequal shares, the time alone in the car, the faint stain on the carpet, was now pulling in the same direction, and pulling hard enough that one possibility outweighed all the rest combined.

It all added up. The detective closed the laptop. He had figured it out.

Can AI Catch a Killer Without a Brain?

Of course, the AI did not “solve” the case the way a detective would. It did not feel a flicker of suspicion, notice a tremor in someone’s voice, or sense that a story did not quite add up. It took every fact it was given, weighed each one against patterns it had learned from data, and pointed towards the explanation the numbers supported most strongly.

This is, in many ways, close to how AI is already being used in real investigations today, not to replace detectives, but to sift through volumes of evidence, financial records, call logs, forensic reports, faster, and flag the patterns worth a closer look. The final judgement, the weighing of context, motive, and human behaviour that a courtroom demands, still rests with people.

So, can AI catch a killer without a brain? In this story, it got there by treating clues as numbers and patterns as evidence. It did not understand guilt, grief, or the quiet unfairness of an unequal inheritance the way a person does. It only knew how to weigh, combine, and converge in the direction the data pointed. Perhaps that is enough to narrow down a list of suspects. Whether it is enough to call it justice is a question only we get

A

Loved reading this!! The way you blended a murder mystery with neural networks made the technical concepts feel so much more approachable and interesting. The article was engaging, clear and really creative. Great work! 👏