Why is Gaussian curve so popular?
It was initially used in physics but over a period of time (more than a century), scientists realised that repeated observations of various datasets when plotted gave a Gaussian distribution. It became so normal to see most datasets following a Gaussian pattern that, the curve itself began to be called ‘Normal’ distribution.
What is the difference between probability theory and decision theory?
Probability theory is a framework that quantifies uncertainty while Decision theory enables predictions on uncertain events.
How can overfitting be avoided while fitting complex models to small datasets?
1) As overfitting is less severe with more data, it is best to have observations that is atleast 5 - 10 times of variables
2) Use Bayesian approach
3) Use regularisation constant to penalise complex models and prevent coefficients from reaching large values
What is the difference between Linear Regression, Curve fitting and Least Squares?
No, they are not. Linear Regression is a type of model that helps in statistical inference i.e. predicting y with x. Curve fitting is the geometric problem of fitting the best line through the data. Least Squares is a commonly used method of fitting the best line by minimising sum of squares error (i.e. squared difference between data point and mean).
Why do we take root mean square?
Root mean square (RMS) is a mathematical average value that is used when the data set contains positive and negative values. It is a work around method to avoid the values being nullified by the opposite arithmetic signs; squaring them avoids the cancellation and helps in estimating the actual magnitude of the values.
I remember using it extensively during my bachelors in electrical engineering for alternating current (A.C.). As the electric wave is sinusoidal, a simple average cancels them out giving a zero value, when the power is actually positive.
In a nutshell, RMS is the square root of arithmetic mean of the squared values of the data set. An application in statistics is the standard deviation which is the square root of the sum of squared deviation of values from the mean. As mentioned earlier, squaring is done to avoid negation of the values that are greater than or less than the mean value.
Most common application in machine learning is error minimisation for linear regression.
When can Linear Regression be used?
1) Causal analysis i.e. establish relationship between independent and dependent variables
2) Forecasting future values with the linear regression model (i.e. the equation)
3) Explaining the trend with regression coefficients
What is the difference between standardisation and normalisation?
Standardisation reduces the values such that the mean = 0 and variance = 1. When standardised, the covariance matrix will have 1 along its diagonals and correlation between variables as its off-diagonal elements. It also chances the distance between variables.
Normalisation on the other hand, scales down all values in the range of [0,1]. It can also be taken to any other scale, say [-1,1] as well.
As both techniques reduce variance, they should not be used if class distinction cannot be made due to low variance. For e.g. a pathologist may classify people into infected and non-infected based on high WBC count. As there is high variance in the values standardisation and normalisation may be applied to this dataset.
Most common application in machine learning is error minimisation for linear regression.
When can Linear Regression be used?
1) Causal analysis i.e. establish relationship between independent and dependent variables
2) Forecasting future values with the linear regression model (i.e. the equation)
3) Explaining the trend with regression coefficients
What is the difference between standardisation and normalisation?
Standardisation reduces the values such that the mean = 0 and variance = 1. When standardised, the covariance matrix will have 1 along its diagonals and correlation between variables as its off-diagonal elements. It also chances the distance between variables.
Normalisation on the other hand, scales down all values in the range of [0,1]. It can also be taken to any other scale, say [-1,1] as well.
As both techniques reduce variance, they should not be used if class distinction cannot be made due to low variance. For e.g. a pathologist may classify people into infected and non-infected based on high WBC count. As there is high variance in the values standardisation and normalisation may be applied to this dataset.
No comments:
Post a Comment