Enrico Rubboli


Software Engineer and Enterpreneur
Playing with bitcoin - machine learning - go language - ruby - python
Preparing tortellini on demand


Linear Regression in Go - Part 1

|

Python is becoming the defacto standard for Big Data and Machine Learning, in particular this is because of some amazing tools like IPython Notebook that helps visualizing your data or scikit learn that implement some of the most popular machine learning algorithms.

So implementing a ML algorithm in go is a pure excercise.

What is Linear Regression

Linear regression is a supervised Machine Learning algorithm used to predict a continuous value, for example can be used to predict prices in the market.

The term supervised is referred to the fact that the algorithm needs to be trained with a learning dataset, we’ll see more examples of supervised algorithms in the future.

In the following example I’ve plotted a graph with some real data about house prices in the UK (data taken from here , its about the City of Windsor) X axis is the lot size while Y axis is price.

As we can notice from the graph there’s a connection between the house size and the price. Biggest houses have higher prices.

I’ve drawn a red line in the graph, that’s my best guess, and we can train the linear regression to find that line so that when a new house comes out in the market, given the lot size we can determine the house price.

This red line is called hypothesis function (or prediction) and looks like:

$$h_\theta(x) = \theta_0 + \theta_1 x$$

where $x$ is our feature (the lot size) and the result $h_\theta(x)$ is our price prediction.

But we could have more features, like the number of bathrooms or the number of bedrooms, we can even use polinomial functions of the features. A more complicated example is:

$$h_\theta(x) = \theta_1 + \theta_1 x_1 + \theta_2 x_1 ^ 2 + \theta_3 x_2 + \theta_4 x_2 ^ 2$$

Here $x_1$ is still our lot size but is now a quadratic function, and $x_2$ might be the number of bedrooms.

In this case what the machine learning algorithm will do is to find the right weights for this function to give the best results, so will find the vector $\theta = \langle\theta_0,\theta_1,\theta_2,\theta_3,\theta_4\rangle$.

Using Matrix and Vectors

If we arbitrarily define a new value $x_0$ to be equal to 1 we can rewrite the hypothesis function as follow:

$$h_\theta(x) = \bbox[yellow,5px]{\theta_0 x_0} + \theta_1 x_1 + \theta_2 x_2 + \theta_3 x_3 + \theta_4 x_4 = \displaystyle\sum_{j=0}^{n}\theta_j x_j $$

where $n$ is the number of features and $x$ is something like that:

$$x = \begin{bmatrix} x_0 = 1\newline x_1 \newline \vdots \newline x_n \end{bmatrix} $$

But since $x_0=1$ the two equations are equal and we can get to the vectorized format:

$$h_\theta(x) = \theta^T x$$

This is not just easier to read, it’s also indipendent from the number of features and can benefit from computationally optimized functions like the ones you can find in packages like gonum matrix . gonum package uses BLAS and LAPACK implementations, you can find more details here .

This first post ends with the hypothesis function written in Go taking advantage of the mat64.dot function:

1func Hypothesis(x, theta *mat64.Vector) float64 {
2    return mat64.Dot(x, theta)
3}

you can find the whole file here and relative test here .

In the next post about linear regression we’ll implement the cost function and the gradient descent, the cost function is used to measure the error of a specific set of $\theta$, while the gradient descent is a function that will converge $\theta$ to the optimal values.

You can find part 2 here