The world of computers can be very difficult to understand because there are so many big words. One big word is machine learning. Machine learning is just a way for a computer to learn things without a person telling it every single rule. Inside this world, there is a very simple and very special way for computers to learn called K-Nearest Neighbors, or KNN. This way of learning is very natural for humans. We do this every day. When we see someone new, we look at who their friends are to guess what they are like. We say “birds of a feather flock together,” which means things that are the same usually stay close to each other. KNN is the computer way of doing this. It is a method that uses closeness to make guesses about new information.
Quick Refresh: A Deep Review of KNN
- The Basic Concept
- Similarity Rule: Computers guess things by looking at what is nearby. This follows the idea that “birds of a feather flock together”.
- The “Lazy” Style: KNN is a “lazy learner.” It does not build a complex model during training; it just saves the data and waits until you ask a question to do any work.
- Distance: How we measure “Close”
- Euclidean (As the Crow Flies): This is a straight line between two points. It is good for smooth, continuous data but can be sensitive to “weird” data points (outliers).
- Manhattan (The Taxi Driver): This follows a grid, like a car driving in a city. It is better for data that moves in steps and is less bothered by outliers.
- The Number K
- Small K (like 1): The computer is very sensitive. One mistake in the data can change the whole guess (this is called overfitting).
- Large K: The computer becomes too simple and ignores small, important patterns (this is called underfitting).
- Odd Numbers: We usually use odd numbers for K (like 3 or 5) to make sure there is never a tie during a vote.
- Preparing the Information
- Scaling: This is the most important step. We must make all numbers fit in a small range (like 0 to 1) so that big numbers don’t “bully” small numbers in the distance math.
- Encoding: Computers only like numbers. We must change words into numbers using Label Encoding (for ordered words like Small/Large) or One-Hot Encoding (for unordered words like Red/Blue).
- The Tools for the Job
- Classification: Used to put things into groups (like “Spam” or “Not Spam”) using a majority vote.
- Regression: Used to guess a specific number (like a house price) by taking the average of the neighbors.
- Checking the Work
- Confusion Matrix: A table that shows when the computer was “right” and when it made a “mistake” (False Alarms or Missed Cases).
- Iris Dataset: A famous set of 150 flowers used by almost everyone to practice KNN because it is very clean and easy to understand.
- Modern Uses
- Vector Databases: Modern AI uses KNN-style searching to find similar text for tools like ChatGPT.
- Approximate Search: Because checking every single point is slow, big systems use “Approximate Nearest Neighbor” (ANN) to speed things up.
At a Glance
The Idea of the Lazy Learner
In most computer learning, the computer acts like a student who studies hard for a test. The computer looks at many examples, tries to find a secret rule, and then it remembers only the rule and forgets the examples. But KNN is not like that. KNN is called a lazy learner. It is called lazy because it does not do any work when you first give it information. It does not try to find a rule. It does not try to build a map. It just takes all the data you give it and saves it in its memory. It is like a student who does not study for the test at all, but instead, they bring all their books and notes into the test room and look for the answer only when they see the question.
Because the computer is lazy, it has a very fast start. You can give it a lot of information, and it just says “Okay, I saved it”. But there is a problem with being lazy. When you ask the computer a new question, it has to work very, very hard. It has to go through every single thing it saved in its memory to find the answer. This makes the computer slow when it needs to give an answer if it has too much data. This way of learning is also called instance-based learning or memory-based learning because it relies entirely on the specific examples it has saved in its memory.
| Learning Type | How It Acts | Training (Study Time) | Prediction (Test Time) |
| Eager Learning | Builds a rule or model. | Slow (must find rules). | Fast (uses the rule). |
| Lazy Learning (KNN) | Saves everything. | Very Fast (just saves). | Slow (searches all data). |
How KNN Makes a Guess
When a computer uses KNN to guess something, it follows a very simple path. Let us imagine we want the computer to guess if a fruit is an orange or a grapefruit. We tell the computer two things about many fruits: how heavy they are and how bumpy their skin is. The computer saves all this. Now, we show the computer a new fruit. The computer does not know what it is. It follows these steps to find out.
Picking the Number K
The first thing we must do is pick a number called K. This number K tells the computer how many neighbors it should look at. If we say K equals 3, the computer will look for the 3 closest fruits it has in its memory. If we say K equals 5, it looks for the 5 closest. This number is very important. If the number is too small, like 1, the computer might look at one fruit that is weird or wrong and give a bad answer. If the number is too big, the computer might look at fruits that are too far away and are not really neighbors anymore. Usually, people pick an odd number like 3 or 5 so there is no tie when they vote.
Measuring the Distance
To find the neighbors, the computer must know how far away they are. The computer uses math to find the distance between the new fruit and all the fruits in its memory. This distance is not like a road distance. It is a number distance. If the new fruit weighs 100 grams and an old fruit weighs 105 grams, the weight distance is 5. The computer does this for every fact it knows. After it calculates all these distances, it looks for the ones that have the smallest numbers. These are the nearest neighbors.
Voting for Classification
If we want the computer to pick a name for the fruit, like “Orange,” this is called classification. The computer looks at the labels of the K neighbors it found. Let us say we picked K as 5. The computer finds the 5 closest fruits. If 4 of them are oranges and 1 is a grapefruit, the computer takes a vote. Since more neighbors are oranges, the computer says “This new fruit is an orange”. This is called majority voting.
Averaging for Regression
Sometimes we do not want a name. Sometimes we want a number. For example, we want to guess how much a house will cost. This is called regression. In this case, the computer does not vote. It takes the prices of the K closest houses and finds the average. If the 3 closest houses cost $100,000 and $110,000 and $120,000, the computer adds them up and divides by 3 to get $110,000.
| Goal | Method Name | How Decision Is Made | Example |
| Pick a label or group. | Classification | Majority Vote (Most common label). | Is it a cat or a dog? |
| Pick a specific number. | Regression | Average (Mean of neighbor values). | How much is this house worth? |
A Step-by-Step Walk Through Classification
To make it very clear, let us look at exactly how a computer does classification using KNN. We will follow these five steps.
- Choose the value for K. First, we decide how many neighbors to use. Let us say we choose K = 3.
- Compute distance. The computer looks at the new point and calculates the distance to every single point it has in its storage.
- Select neighbors. The computer finds the 3 points that have the smallest distance to the new point.
- Count labels. The computer looks at the names of those 3 points. Maybe it finds two points named “Group A” and one point named “Group B”.
- Assign prediction. Because “Group A” has more points (2 is more than 1), the computer says the new point belongs to “Group A”.
Distance: How to Measure Closeness
In KNN, everything depends on how we measure distance. If we measure it wrong, the neighbors will be wrong, and the answer will be wrong. There are two main ways computers measure distance in KNN.
Euclidean Distance: The Bird Path
Euclidean distance is the most common way. People often call it “as the crow flies”. This is the straight line between two points. If you are a bird and you want to go from point A to point B, you just fly in a straight line. This is the shortest possible path.
To calculate this, the computer does some math. It takes the differences between two points, squares them (multiplies them by themselves), adds them together, and then takes the square root. This math makes big differences very important. If one number is very different, the whole distance becomes very large very fast.
Manhattan Distance: The Taxi Path
Manhattan distance is different. It is named after Manhattan because the streets there are like a grid. In a city, you cannot walk through buildings. You must walk along the streets. You go some blocks East and then some blocks North. This is also called “city block distance” or “taxicab distance”.
To calculate this, the computer does not square any numbers. It just takes the absolute difference (the positive number) for each fact and adds them up. This distance is better when the data is high-dimensional or when there are strange points (outliers) that might confuse the Euclidean math.
Calculating the Distance Example
Let us use a real example with numbers. Imagine we have two points. Point 1 is at (2, 3) and Point 2 is at (6, 7).
Euclidean Distance:
- Difference in first number: 6−2=4. Square it: 4×4=16.
- Difference in second number: 7−3=4. Square it: 4×4=16.
- Add them: 16+16=32.
- Take the square root of 32. It is about 5.65.
Manhattan Distance:
- Difference in first number: ∣6−2∣=4.
- Difference in second number: ∣7−3∣=4.
- Add them: 4+4=8.
In this example, the Euclidean distance (5.65) is shorter than the Manhattan distance (8). This is always true because a straight line is the shortest way to go.
| Metric Name | How it moves | Best for… | Calculation Style |
| Euclidean | Straight line (Bird flies). | Open space, physical things. | Squares differences, then square root. |
| Manhattan | Grid path (Taxi drives). | City-like data, many features. | Adds absolute differences. |
Why the K Value Matters
Picking the number for K is like a balancing act. If you choose a very small number, like K = 1, the computer is very sensitive. It looks at only one neighbor. If that one neighbor is a mistake in the data, the computer will make a mistake too. This is called high variance or overfitting. It means the computer is following the training data too closely, like a student who memorizes the mistakes in their textbook.
If you choose a very large number for K, the computer becomes very stable, but it might get lazy in a bad way. It looks at so many neighbors that it might ignore the small patterns that are actually there. This is called high bias or underfitting. It means the computer is being too simple and missing the point.
Most people try many different K values to see which one is the most accurate. They use something called “cross-validation,” which is just a fancy way of testing the computer many times with different K numbers to find the best one.
Real Life Uses for KNN
KNN is very simple, but it is used by many big companies and hospitals because it works well for certain things.
Movie and Product Recommendations
Have you ever noticed that Netflix or Amazon suggests things you might like? They use methods like KNN. They look at you as a data point. They look at what movies you liked. Then they find “nearest neighbors,” which are other people who liked the same movies. If those neighbors liked a new movie that you haven’t seen, the computer recommends it to you. This is called a recommendation system.
Medical Diagnosis
Doctors use KNN to help find out if people are sick. They save data about many past patients, like their blood pressure, age, and symptoms. When a new patient comes, the computer looks for the nearest neighbors—past patients who had very similar numbers. By looking at what happened to those past patients, the computer can help the doctor guess what is wrong with the new patient. For example, doctors have used KNN to look at images of knees to find diseases.
Image Recognition
Computers can use KNN to recognize what is in a picture, like a cat or a dog. The computer measures things in the picture, like how many dark pixels there are or how “fluffy” the patterns look. By comparing these measurements to many pictures it already knows, it can find the closest match and tell you what the animal is.
Predicting House Prices
In many places, house prices are similar to the prices of houses nearby. If you want to know how much a house will sell for, the KNN computer looks at the K closest houses that were sold recently. It considers things like:
- How many rooms are in the house.
- How old the house is.
- Where the house is located (latitude and longitude).
- How much money people in that area make.
The computer takes the average price of the K most similar houses to give you a guess.
| Application | What it does | Benefit |
| Netflix / Amazon | Finds similar users or products. | Helps you find things you like. |
| Hospitals | Finds patients with similar symptoms. | Helps doctors find diseases early. |
| Real Estate | Finds similar houses nearby. | Helps people know the right price. |
Code Example
using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading.Tasks;
namespace Architecture.MachineLearning
{
// --- Result Pattern ---
// We avoid throwing exceptions for logic failures. It's cleaner for the caller.
public record Result<T>(T Value, bool IsSuccess, string Error = "");
// --- Abstractions ---
// Strategy Pattern: Allows switching between Euclidean, Manhattan, or Minkowski without changing the Engine.
public interface IDistanceMetric
{
double Calculate(double[] vectorA, double[] vectorB);
}
// --- Implementations ---
public class EuclideanDistance : IDistanceMetric
{
public double Calculate(double[] vectorA, double[] vectorB)
{
return Math.Sqrt(vectorA.Zip(vectorB, (a, b) => Math.Pow(a - b, 2)).Sum());
}
}
public class ManhattanDistance : IDistanceMetric
{
public double Calculate(double[] vectorA, double[] vectorB)
{
return vectorA.Zip(vectorB, (a, b) => Math.Abs(a - b)).Sum();
}
}
public record TrainingSample<TLabel>(double[] Features, TLabel Label);
/// <summary>
/// A robust KNN Classifier following SOLID principles.
/// Includes feature normalization to prevent high-magnitude features from dominating the model.
/// </summary>
public class KnnClassifier<TLabel> where TLabel : notnull
{
private readonly int _k;
private readonly IDistanceMetric _metric;
private readonly List<TrainingSample<TLabel>> _samples = new();
private double[] _minFeatures;
private double[] _maxFeatures;
public KnnClassifier(int k, IDistanceMetric metric)
{
_k = k > 0 ? k : throw new ArgumentOutOfRangeException(nameof(k));
_metric = metric ?? throw new ArgumentNullException(nameof(metric));
}
/// <summary>
/// Asynchronously trains the model and calculates normalization boundaries.
/// </summary>
public async Task<Result<bool>> TrainAsync(IEnumerable<TrainingSample<TLabel>> data)
{
var dataList = data.ToList();
if (!dataList.Any()) return new Result<bool>(false, false, "No training data provided.");
_samples.Clear();
_samples.AddRange(dataList);
// Optimization: Pre-calculate min/max for normalization (The Right Way)
int featureCount = _samples[0].Features.Length;
_minFeatures = new double[featureCount];
_maxFeatures = new double[featureCount];
await Task.Run(() => {
for (int i = 0; i < featureCount; i++)
{
_minFeatures[i] = _samples.Min(s => s.Features[i]);
_maxFeatures[i] = _samples.Max(s => s.Features[i]);
}
});
return new Result<bool>(true, true);
}
/// <summary>
/// Predicts the label for an unscaled input vector.
/// </summary>
public Result<TLabel> Predict(double[] rawInput)
{
if (_samples.Count < _k)
return new Result<TLabel>(default!, false, "Insufficient training data for K neighbors.");
if (rawInput.Length != _minFeatures.Length)
return new Result<TLabel>(default!, false, "Feature dimension mismatch.");
// 1. Normalize the input using stored boundaries
double[] normalizedInput = Normalize(rawInput);
// 2. Compute distances to all normalized samples
var neighbors = _samples
.Select(s => new {
Label = s.Label,
Distance = _metric.Calculate(normalizedInput, Normalize(s.Features))
})
.OrderBy(n => n.Distance)
.Take(_k)
.ToList();
// 3. Perform majority voting
var winner = neighbors
.GroupBy(n => n.Label)
.OrderByDescending(g => g.Count())
.ThenBy(g => g.Average(x => x.Distance)) // Tie-breaker: closest average distance
.First()
.Key;
return new Result<TLabel>(winner, true);
}
private double[] Normalize(double[] vector)
{
var normalized = new double[vector.Length];
for (int i = 0; i < vector.Length; i++)
{
double range = _maxFeatures[i] - _minFeatures[i];
// Prevent division by zero if all values are identical
normalized[i] = range == 0 ? 0 : (vector[i] - _minFeatures[i]) / range;
}
return normalized;
}
}
// --- Demonstration ---
public class Program
{
public static async Task Main()
{
// Example: Predicting patient health based on [HeartRate, Age]
// Notice the difference in scale (HeartRate ~70 vs Age ~40).
// Normalization is critical here.
var data = new List<TrainingSample<string>>
{
new(new[] { 60.0, 20.0 }, "Healthy"),
new(new[] { 110.0, 75.0 }, "At Risk"),
new(new[] { 65.0, 25.0 }, "Healthy"),
new(new[] { 120.0, 80.0 }, "At Risk")
};
var knn = new KnnClassifier<string>(k: 3, new EuclideanDistance());
var trainResult = await knn.TrainAsync(data);
if (!trainResult.IsSuccess) return;
var prediction = knn.Predict(new[] { 105.0, 70.0 });
if (prediction.IsSuccess)
Console.WriteLine($"Assessment: {prediction.Value}");
else
Console.WriteLine($"Error: {prediction.Error}");
}
}
}
Understanding the KNN Logic
1. The Data Used in the Code
In the C# example, we used “Medical Assessment” data.
- Features (Input): – Heart Rate: Usually between 60 and 120 bpm.
- Age: Usually between 20 and 80 years.
- Label (Output): “Healthy” or “At Risk”.
Why Normalization Matters: Because Heart Rate (up to 120) is a bigger number than Age (up to 80), the computer thinks the Heart Rate is “more important” if we don’t scale them. My code scales both to a range of 0 to 1 so they have equal power.
2. The Step-by-Step Walkthrough
Let’s look at the query: Heart Rate 105, Age 70.
Step 1: Storage The algorithm looks at its memory. It has 4 people saved:
- (60, 20) -> Healthy
- (110, 75) -> At Risk
- (65, 25) -> Healthy
- (120, 80) -> At Risk
Step 2: Distance Calculation The code calculates how far the new person (105, 70) is from everyone else.
- Distance to Person 2 is very small (they are both old with high heart rates).
- Distance to Person 4 is also small.
- Distance to Person 1 is very large (they are very different).
Step 3: Sorting It ranks the neighbors from “Closest” to “Farthest.”
3. Interpretation with different K values
If K = 1 (The Specialist)
The algorithm looks at only the single closest neighbor.
- The closest person is Person 2 (110 bpm, 75 years).
- Person 2 is “At Risk.”
- Output: “At Risk.”
- Problem: If Person 2 was a “weird” case (an outlier), the algorithm would be wrong. It is too sensitive.
If K = 3 (The Committee)
The algorithm looks at the 3 closest neighbors.
- Neighbor 1: Person 2 (At Risk) – Distance: 0.1
- Neighbor 2: Person 4 (At Risk) – Distance: 0.2
- Neighbor 3: Person 3 (Healthy) – Distance: 0.8
- Why skip Person 1? Because Person 1 has a distance of 0.9. It is the “fourth” node and we only asked for 3.
- The Vote: 2 votes for “At Risk”, 1 vote for “Healthy.”
- Output: “At Risk.”
- Benefit: Even though one neighbor (Person 3) was different, the majority fixed the mistake.
4. Expected Output
For the input [105.0, 70.0], the expected output is: Assessment: At Risk
Summary of Decisions
- Small K (like 1): Very sharp but follows “noise” or mistakes in the data too easily.
- Large K (like 11): Very smooth but might ignore small, important patterns because it listens to too many distant neighbors.
- The “Right” K: Usually an odd number (to avoid tie votes) like 3 or 5.
The “Rulers” of KNN: How We Measure Distance
In the C# code, the IDistanceMetric interface allows us to swap between two primary ways of measuring how “different” two pieces of data are.
1. Euclidean Distance (The Straight Line)
This is the distance most people know from school ($a^2 + b^2 = c^2$). In the code, it looks like this:
Math.Sqrt(vectorA.Zip(vectorB, (a, b) => Math.Pow(a - b, 2)).Sum())
The Step-by-Step Logic:
- Subtraction (
a - b): We find the “gap” between each feature. For example, if Patient A’s heart rate is 100 and Patient B’s is 80, the gap is 20. - Squaring (
Math.Pow(..., 2)): This does two things:- It makes every number positive (distance can’t be negative).
- It punishes large gaps. A gap of 10 becomes 100, but a gap of 20 becomes 400. This makes the algorithm very sensitive to outliers.
- Summing (
Sum()): We add up all those squared gaps across all features (Age, Heart Rate, etc.). - Square Root (
Math.Sqrt): We take the square root to bring the final number back to a scale we can understand.
2. Manhattan Distance (The City Block)
Think of a taxi in a city like New York. You can’t drive through buildings (a straight line); you have to follow the grid of streets.
vectorA.Zip(vectorB, (a, b) => Math.Abs(a - b)).Sum()
The Step-by-Step Logic:
- Absolute Difference (
Math.Abs(a - b)): We find the gap, but we just throw away the negative sign. A gap of -20 becomes 20. - Summing (
Sum()): We just add them all up. - Why use this? It is less sensitive to outliers than Euclidean because we don’t square the numbers. It’s a “gentler” ruler.
3. The Secret Sauce: Normalization
Remember the Normalize() function in my code? This is where the “Architect” beats the “Amateur.”
If you don’t normalize, here is what happens:
- Heart Rate Gap: 100 to 80 = 20 units.
- Age Gap: 70 to 71 = 1 unit.
Without normalization, the computer thinks the heart rate gap (20) is 20 times more important than the age gap (1). By scaling everything to a range of 0 to 1, a 20-bpm difference and a 20-year age difference have the exact same impact on the distance.
Comparison Table
| Feature | Raw Data | Normalized (0 to 1) |
| Calculation | $100 – 80 = 20$ | $0.8 – 0.6 = 0.2$ |
| Result | Distance is dominated by big numbers. | Distance is “fair” across all features. |
Challenges and Problems for KNN
Even though KNN is good, it is not perfect. There are some things that make it struggle.
The Problem of Scale
Imagine we are looking at houses. One fact is the number of rooms (usually 1 to 5). Another fact is the price (usually $100,000 to $500,000). Because the price numbers are so big, the distance calculation will only care about the price. A difference of 2 rooms is very important, but to the computer, it looks tiny compared to a difference of $1,000 in price. To fix this, we have to “scale” the data, which means making all the numbers stay between 0 and 1 so they are all treated as equally important.
The Curse of Dimensionality
This is a very big name for a simple problem. KNN works best when there are only a few facts (dimensions), like height and weight. But if we give the computer too many facts—like 100 different things about a person—the “space” where the data lives becomes very, very big. In this huge space, every point becomes far away from every other point. It is like trying to find a neighbor in a giant, empty universe. If everyone is far away, the idea of a “nearest neighbor” does not work anymore. This makes the computer’s guesses very poor.
Large Datasets
Because the computer is a lazy learner, it must look at every single piece of data every time you ask it a question. If you have a million points, it has to calculate a million distances for every new question. This takes a lot of time and a lot of computer memory. This is why KNN is usually best for smaller datasets.
Good and Bad Things About KNN
Before using KNN, we should look at the good and bad parts to see if it is the right tool for us.
The Pros (Good Things)
- It is simple. You can explain it to almost anyone, and it is easy to build.
- No training phase. You don’t have to wait for the computer to study. You just give it data, and it is ready.
- It can do two jobs. It works for classification (labels) and regression (numbers).
- It is flexible. If you get new data, you don’t have to start over. You just add it to the memory.
The Cons (Bad Things)
- It is slow for big data. Calculating distances to everything takes a long time.
- It uses a lot of memory. You have to store every single piece of data you have ever seen.
- It is sensitive to “noise.” If there are errors or weird points in your data, KNN can get confused easily.
- It needs scaling. You must do extra work to make sure large numbers don’t drown out small numbers.
| Good Things (Pros) | Bad Things (Cons) |
| Simple and easy to understand. | Slow when there is a lot of data. |
| No time spent on “studying” data. | Uses a lot of computer storage/memory. |
| Can pick names or guess numbers. | Gets confused by big and small scales. |
| Easily takes in new information. | Fails if there are too many types of facts. |
Conclusion: Final Thoughts on Neighbors
K-Nearest Neighbors is a very friendly way to start with machine learning. It reminds us that often, the best way to understand something new is to look at what is already around it. Even though it is simple, it is a powerful tool for finding patterns in smaller groups of data. It helps doctors find diseases, it helps you find your next favorite movie, and it helps you know how much a house should cost.
If you remember to keep your data scaled, pick a good number for K, and don’t give the computer too many irrelevant facts, KNN can be a very accurate and reliable friend for making guesses. It is the most “human” way for a computer to think, relying on the simple idea that closeness means similarity. As long as the computer has enough memory and the dataset is not too huge, this lazy learner will continue to be one of the most important methods in the world of data.
Leave a Reply