The Machine Studying “Introduction Calendar” Day 11: Linear Regression in Excel

by root December 12, 2025

written by root December 12, 2025 0 comment 80 views

Linear Regression, lastly!

For Day 11, I waited many days to current this mannequin. It marks the start of a new journey on this “Introduction Calendar“.

Till now, we principally checked out fashions primarily based on distances, neighbors, or native density. As you might know, for tabular information, determination timber, particularly ensembles of determination timber, are very performant.

However beginning as we speak, we swap to a different viewpoint: the weighted strategy.

Linear Regression is our first step into this world.
It seems easy, but it surely introduces the core substances of contemporary ML: loss capabilities, gradients, optimization, scaling, collinearity, and interpretation of coefficients.

Now, after I say, Linear Regression, I imply Odd Least Sq. Linear Regression. As we progress by means of this “Introduction Calendar” and discover associated fashions, you will note why you will need to specify this, as a result of the identify “linear regression” could be complicated.

Some folks say that Linear Regression is not machine studying.

Their argument is that machine studying is a “new” area, whereas Linear Regression existed lengthy earlier than, so it can’t be thought-about ML.

That is deceptive.
Linear Regression matches completely inside machine studying as a result of:

it learns parameters from information,
it minimizes a loss perform,
it makes predictions on new information.

In different phrases, Linear Regression is among the oldest fashions, but in addition one of many most basic in machine studying.

That is the strategy utilized in:

Linear Regression,
Logistic Regression,
and, later, Neural Networks and LLMs.

For deep studying, this weighted, gradient-based strategy is the one that’s used in all places.

And in fashionable LLMs, we’re now not speaking about a number of parameters. We’re speaking about billions of weights.

On this article, our Linear Regression mannequin has precisely 2 weights.

A slope and an intercept.

That’s all.

However we’ve got to start someplace, proper?

And listed below are a number of questions you may consider as we progress by means of this text, and within the ones to return.

We are going to attempt to interpret the mannequin. With one characteristic, y=ax+b, everybody is aware of {that a} is the slope and b is the intercept. However how will we interpret the coefficients the place there are 10, 100 or extra options?
Why is collinearity between options such an issue for linear regression? And the way can we do to unravel this concern?
Is scaling essential for linear regression?
Can Linear regression be overfitted?
And the way are the opposite fashions of this weighted familly (Logistic Regression, SVM, Neural Networks, Ridge, Lasso, and many others.), all linked to the identical underlying concepts?

These questions kind the thread of this text and can naturally lead us towards future matters within the “Introduction Calendar”.

Understanding the Pattern line in Excel

Beginning with a Easy Dataset

Allow us to start with a quite simple dataset that I generated with one characteristic.

Within the graph under, you may see the characteristic variable x on the horizontal axis and the goal variable y on the vertical axis.

The purpose of Linear Regression is to seek out two numbers, a and b, such that we will write the connection:

y=a x +b

As soon as we all know a and b, this equation turns into our mannequin.

Linear regression in Excel – all pictures by creator

Creating the Pattern Line in Excel

In Google Sheets or Excel, you may merely add a development line to visualise one of the best linear match.

That already offers you the results of Linear Regression.

Linear regression in Excel – all pictures by creator

However the function of this text is to compute these coefficients ourselves.

If we need to use the mannequin to make predictions, we have to implement it straight.

Introducing Weights and the Price Perform

A Word on Weight-Primarily based Fashions

That is the primary time within the Introduction Calendar that we introduce weights.

Fashions that be taught weights are sometimes known as parametric discriminant fashions.

Why discriminant?
As a result of they be taught a rule that straight separates or predicts, with out modeling how the information was generated.

Earlier than this chapter, we already noticed fashions that had parameters, however they weren’t discriminant, they have been generative.

Allow us to recap rapidly.

Determination Bushes use splits, or guidelines, and so there aren’t any weights to be taught. So they’re non-parametric fashions.
k-NN just isn’t a mannequin. It retains the entire dataset and makes use of distances at prediction time.

Nevertheless, once we transfer from Euclidean distance to Mahalanobis distance, one thing attention-grabbing occurs…

LDA and QDA do estimate parameters:

means of every class
covariance matrices
priors

These are actual parameters, however they aren’t weights.
These fashions are generative as a result of they mannequin the density of every class, after which use it to make predictions.

So though they’re parametric, they don’t belong to the weight-based household.

And as you may see, these are all classifiers, they usually estimate parameters for every class.

Linear Regression is our first instance of a mannequin that learns weights to construct a prediction.

That is the start of a brand new household within the Introduction Calendar:
fashions that depend on weights + a loss perform to make predictions.

The Price Perform

How can we receive the parameters a and b?

Nicely, the optimum values for a and b are these minimizing the associated fee perform, which is the Squared Error of the mannequin.

So for every information level, we will calculate the Squared Error.

Squared Error = (prediction-real worth)²=(a*x+b-real worth)²

Then we will calculate the MSE, or Imply Squared Error.

As we will see in Excel, the trendline offers us the optimum coefficients. In case you manually change these values, even barely, the MSE will enhance.

That is precisely what “optimum” means right here: another mixture of a and b makes the error worse.

The basic closed-form answer

Now that we all know what the mannequin is, and what it means to attenuate the squared error, we will lastly reply the important thing query:

How will we compute the 2 coefficients of Linear Regression, the slope a and the intercept b?

There are two methods to do it:

the precise algebraic answer, referred to as the closed-form answer,
or gradient descent, which we are going to discover simply after.

If we take the definition of the MSE and differentiate it with respect to a and b, one thing lovely occurs: every thing simplifies into two very compact formulation.

These formulation solely use:

the typical of x and y,
how x varies (its variance),
and the way x and y fluctuate collectively (their covariance).

So even with out figuring out any calculus, and with solely primary spreadsheet capabilities, we will reproduce the precise answer utilized in statistics textbooks.

The best way to interpret the coefficients

For one characteristic, interpretation is simple and intuitive:

The slope a
It tells us how a lot y modifications when x will increase by one unit.
If the slope is 1.2, it means:
“when x goes up by 1, the mannequin expects y to go up by about 1.2.”

The intercept b
It’s the predicted worth of y when x = 0.
Usually, x = 0 doesn’t exist in the actual context of the information, so the intercept just isn’t at all times significant by itself.
Its function is usually to place the road appropriately to match the middle of the information.

That is often how Linear Regression is taught:
a slope, an intercept, and a straight line.

With one characteristic, interpretation is simple.

With two, nonetheless manageable.

However as quickly as we begin including many options, it turns into tougher.

Tomorrow, we are going to focus on additional concerning the interpretation.

At the moment, we are going to do the gradient descent.

Gradient Descent, Step by Step

After seeing the basic algebraic answer for Linear Regression, we will now discover the opposite important software behind fashionable machine studying: optimization.

The workhorse of optimization is Gradient Descent.

Understanding it on a quite simple instance makes the logic a lot clearer as soon as we apply it to Linear Regression.

A Light Heat-Up: Gradient Descent on a Single Variable

Earlier than implementing the gradient descent for the Linear Regression, we will first do it for a easy perform: (x-2)^2.

Everybody is aware of the minimal is at x=2.

However allow us to fake we have no idea that, and let the algorithm uncover it by itself.

The concept is to seek out the minimal of this perform utilizing the next course of:

First, we randomly select an preliminary worth.
Then for every step, we calculate the worth of the spinoff perform df (for this x worth): df(x)
And the subsequent worth of x is obtained by subtracting the worth of spinoff multiplied by a step measurement: x = x – step_size*df(x)

You may modify the 2 parameters of the gradient descent: the preliminary worth of x and the step measurement.

Sure, even with 100, or 1000. That’s fairly shocking to see, how effectively it really works.

However, in some circumstances, the gradient descent won’t work. For instance, if the step measurement is simply too massive, the x worth can explode.

Gradient descent for linear regression

The precept of the gradient descent algorithm is similar for linear regression: we’ve got to calculate the partial derivatives of the associated fee perform with respect to the parameters a and b. Let’s be aware them as da and db.

Squared Error = (prediction-real worth)²=(a*x+b-real worth)²

da=2(a*x+b-real worth)*x

db=2(a*x+b-real worth)

After which, we will do the updates of the coefficients.

With this tiny replace, step-by-step, the optimum worth shall be discovered after a number of interations.

Within the following graph, you may see how a and b converge in the direction of the goal worth.

We are able to additionally see all the small print of y hat, residuals and the partial derivatives.

We are able to totally admire the fantastic thing about gradient descent, visualized in Excel.

For these two coefficients, we will observe how fast the convergence is.

Now, in observe, we’ve got many observations and this ought to be carried out for every information level. That’s the place issues grow to be loopy in Google Sheet. So, we use solely 10 information factors.

You will note that I first created a sheet with lengthy formulation to calculate da and db, which comprise the sum of the derivatives of all of the observations. Then I created one other sheet to point out all the small print.

Categorical Options in Linear Regression

Earlier than concluding, there’s one final essential thought to introduce:
how a weight-based mannequin like Linear Regression handles categorical options.

This matter is crucial as a result of it reveals a basic distinction between the fashions we studied earlier (like k-NN) and the weighted fashions we’re coming into now.

Why distance-based fashions battle with classes

Within the first a part of this Introduction Calendar, we used distance-based fashions equivalent to Ok-NN, DBSCAN, and LOF.
However these fashions rely solely on measuring distances between factors.

For categorical options, this turns into unimaginable:

a class encoded as 0 or 1 has no quantitative which means
the numerical scale is unfair,
Euclidean distance can’t seize class variations.

That is why k-NN can’t deal with classes appropriately with out heavy preprocessing.

Weight-based fashions remedy the issue in a different way

Linear Regression doesn’t evaluate distances.
It learns weights.

To incorporate a categorical variable in a weight-based mannequin, we use one-hot encoding, the commonest strategy.

Every class turns into its personal characteristic, and the mannequin merely learns one weight per class.

Why this works so effectively

As soon as encoded:

the dimensions downside disappears (every thing is 0 or 1),
every class receives an interpretable weight,
the mannequin can regulate its prediction relying on the group

A easy two-category instance

When there are solely two classes (0 and 1), the mannequin turns into very simple:

one worth is used when x=0,
one other when x=1.

One-hot encoding just isn’t even mandatory:
the numeric encoding already works as a result of Linear Regression will be taught the suitable distinction between the 2 teams.

Gradient Descent nonetheless works

Even with categorical options, Gradient Descent works precisely as common.

The algorithm solely manipulates numbers, so the replace guidelines for a and b are equivalent.

Within the spreadsheet, you may see the parameters converge easily, identical to with numerical information.

Nevertheless, on this particular two-category case, we additionally know {that a} closed-form system exists: Linear Regression basically computes two group averages and the distinction between them.

Conclusion

Linear Regression could look easy, but it surely introduces nearly every thing that fashionable machine studying depends on.
With simply two parameters, a slope and an intercept, it teaches us:

outline a value perform,
discover optimum parameters, numerically,
and the way optimization behaves once we regulate studying charges or preliminary values.

The closed-form answer reveals the class of the arithmetic.
Gradient Descent reveals the mechanics behind the scenes.
Collectively, they kind the inspiration of the “weighted + loss perform” household that features Logistic Regression, SVM, Neural Networks, and even as we speak’s LLMs.

New Paths Forward

You could assume Linear Regression is easy, however with its foundations now clear, you may lengthen it, refine it, and reinterpret it by means of many various views:

Change the loss perform
Change squared error with logistic loss, hinge loss, or different capabilities, and new fashions seem.
Transfer to classification
Linear Regression itself can separate two courses (0 and 1), however extra sturdy variations result in Logistic Regression and SVM. And what about multiclass classification?
Mannequin nonlinearity
By means of polynomial options or kernels, linear fashions all of the sudden grow to be nonlinear within the authentic area.
Scale to many options
Interpretation turns into tougher, regularization turns into important, and new numerical challenges seem.
Primal vs twin
Linear fashions could be written in two methods. The primal view learns the weights straight. The twin view rewrites every thing utilizing dot merchandise between information factors.
Perceive fashionable ML
Gradient Descent, and its variants, are the core of neural networks and enormous language fashions.
What we realized right here with two parameters generalizes to billions.

Every part on this article stays throughout the boundaries of Linear Regression, but it prepares the bottom for a whole household of future fashions.
Day after day, the Introduction Calendar will present how all these concepts join.

Welcome to Ivugangingo!

At Ivugangingo, we're passionate about delivering insightful content that empowers and informs our readers across a spectrum of crucial topics. Whether you're delving into the world of insurance, navigating the complexities of cryptocurrency, or seeking wellness tips in health and fitness, we've got you covered.

The Machine Studying “Introduction Calendar” Day 11: Linear Regression in Excel

Understanding the Pattern line in Excel

Beginning with a Easy Dataset

Creating the Pattern Line in Excel

Introducing Weights and the Price Perform

A Word on Weight-Primarily based Fashions

The Price Perform

The basic closed-form answer

The best way to interpret the coefficients

Gradient Descent, Step by Step

A Light Heat-Up: Gradient Descent on a Single Variable

Gradient descent for linear regression

Categorical Options in Linear Regression

Why distance-based fashions battle with classes

Weight-based fashions remedy the issue in a different way

A easy two-category instance

Gradient Descent nonetheless works

Conclusion

New Paths Forward

Terraform co-founder sentenced to fifteen years in jail after pleading responsible

President Trump indicators government order threatening to punish nations for passing AI-related laws

Converter

Editors Pick

Newsletter

Categories

Related Posts