Overfitting and Regularization
So after a long time writing something, things have been good, got a job as a Machine Learning Expert in San Francisco where I am exploring new things. Today I would like to talk a little bit about some regularization and overfitting. Overfitting as is generally defined, happens when your model tends to be more sure of the data then it should be. In other words, your model will not generalize well. Overfitting is a serious issue which occurs in Maximum-likelihood estimation of model parameters. For example, in Logistic regression, you are trying to find the normal vector to the separating hyperplane between your two classes and as is normally done, you write the likelihood function for your training data and then maximize it and solve for the unknown parameter vector, and you get an Update Equation for the normal parameter vector. And then you make iterations using that equation and say "wahlaaa!" when you see it converging. However, this convergence is generally a deception. A thing or two to detect overfitting,
1) Observe your parameter vector, if its components vary hugely between one extreme to another, better be warned of overfitting
2) If your training set was small, overfitting will come into play.
In the Non-Bayesian setting, the easiest way to avoid overfitting is to use a Regularizer which penalizes your objective function if the parameter vector's components become too large, simply put, regularizer tries to loosen up your model and in that process you may observe your training error to increase.
Why does overfitting occur? A simple explanation is that generally the features that we collect in our data, they are not uncorrelated from each other. If you arrange your data points as rows of a matrix and then try to plot its Singular values through svd, you ll find out that singular values fall very quickly to zero, the eigen value spectrum of a matrix with size over 5 million by 2000 column matrix (top 400) is as follows.