# Linear Regression in Go - Part 1

Fri, Nov 6, 2015 |Python is becoming the defacto standard for Big Data and Machine Learning, in particular this is because of some amazing tools like IPython Notebook that helps visualizing your data or scikit learn that implement some of the most popular machine learning algorithms.

So implementing a ML algorithm in go is a pure excercise.

#### What is Linear Regression

Linear regression is a *supervised* Machine Learning algorithm used to predict a continuous value, for
example can be used to predict prices in the market.

The term *supervised* is referred to the fact that the algorithm needs to be *trained* with a learning dataset,
we’ll see more examples of supervised algorithms in the future.

In the following example I’ve plotted a graph with some real data about house prices in the UK
(data taken from here
, its about the City of Windsor)
`X`

axis is the *lot size* while `Y`

axis is *price*.

As we can notice from the graph there’s a connection between the house size and the price. Biggest houses have higher prices.

I’ve drawn a red line in the graph, that’s my best guess, and we can train the linear regression to find that line so that when a new house comes out in the market, given the lot size we can determine the house price.

This red line is called *hypothesis function* (or *prediction*) and looks like:

$$h_\theta(x) = \theta_0 + \theta_1 x$$

where `$x$`

is our *feature* (the `lot size`

) and the result `$h_\theta(x)$`

is our price prediction.

But we could have more *features*, like the number of *bathrooms* or the number of *bedrooms*, we can even use polinomial functions of the features. A more complicated example is:

$$h_\theta(x) = \theta_1 + \theta_1 x_1 + \theta_2 x_1 ^ 2 + \theta_3 x_2 + \theta_4 x_2 ^ 2$$

Here `$x_1$`

is still our `lot size`

but is now a quadratic function, and `$x_2$`

might be the number of `bedrooms`

.

In this case what the machine learning algorithm will do is to find the right **weights** for this
function to give the best results, so will find the vector `$\theta = \langle\theta_0,\theta_1,\theta_2,\theta_3,\theta_4\rangle$`

.

#### Using Matrix and Vectors

If we arbitrarily define a new value `$x_0$`

to be equal to 1 we can rewrite the hypothesis function as follow:

$$h_\theta(x) = \bbox[yellow,5px]{\theta_0 x_0} + \theta_1 x_1 + \theta_2 x_2 + \theta_3 x_3 + \theta_4 x_4 = \displaystyle\sum_{j=0}^{n}\theta_j x_j $$

where `$n$`

is the number of features and `$x$`

is something like that:

$$x = \begin{bmatrix} x_0 = 1\newline x_1 \newline \vdots \newline x_n \end{bmatrix} $$

But since `$x_0=1$`

the two equations are equal and we can get to the vectorized format:

$$h_\theta(x) = \theta^T x$$

This is not just easier to read, it’s also indipendent from the number of *features* and can benefit from computationally
optimized functions like the ones you can find in packages like gonum matrix
.
**gonum** package uses BLAS
and
LAPACK
implementations, you can find more details here
.

This first post ends with the hypothesis function written in Go taking advantage of the `mat64.dot`

function:

```
1func Hypothesis(x, theta *mat64.Vector) float64 {
2 return mat64.Dot(x, theta)
3}
```

you can find the whole file here and relative test here .

In the next post about linear regression we’ll implement the cost function and the gradient descent,
the cost function is used to measure the error of a specific set of `$\theta$`

, while the gradient descent
is a function that will converge `$\theta$`

to the optimal values.

You can find part 2 here