Accuracy, F Measure (aka F1 score), precision, recall
If you have gone through machine learning or any statistics related research literature I am pretty sure that you have come across cases where something called F Measure is calculated than the accuracy in some tests. In this short blog let's clear out what is F Measure and why is it needed rather than the simple accuracy.
Let's say in a binary classification task some rare incident is predicted. (Can be predicting the occurrence of a landslide given the weather conditions, landslides are rare :-P) If that rare incident only occurs for 1% of the time, a binary classification model predicting all negative (or 0) will get an accuracy of 99%. Does that make the model a good one? No! (Usually, a model having an accuracy of 99% will be dope!) Now you see there is a problem with the accuracy.
The solution is to use F Measure or F1 score or F score or...
Let's look at the equation for calculating F measure.
\[F1 = \frac{2}{\frac{1}{Precision} + \frac{1}{Recall}}\]
If you know a bit more about statistics you'll understand that f-measure is the harmonic mean (which is a type of average) of precision and recall. Now let's see what is precision and recall.
\[Precision = \frac{TruePositives}{TruePositives + FalsePositives}\]
So in simple terms, the Precision value indicates out of all the predicted positives, how many are real positives. Hence this measures how we can trust a positive predicted by our model. Higher Precision implies out of all the positives predicted by our model, most of them are real positives.
Why precision only does not make any sense?? Let's say our model predicts only one positive out of all the real positives and only that. Now the precision of the model becomes 1. (Highest) So this does not ensure that whether our model captures all the positives or not. Then comes the Recall!
\[Recall = \frac{TruePositives}{TruePositives + FalseNegatives}\]
This Recall value tells out of all the real positives, how many of them are captured by our model. Higher recall implies, our model managed to predict most of the real positive cases. Still, Recall only is not complete! because recall does not consider the false positives. A model which predicts many real negatives as positives will still have a higher recall.
To have both features, the harmonic mean is taken to calculate the f-measure.
So as a summary, f-measure is better compared to accuracy in cases where the positive incident is rare. (There is more to this, actually, f-measure is better in cases where the non-positive (negative) case is not clearly definable. I will share the details in a later blog)
Thanks for reading.
Let's say in a binary classification task some rare incident is predicted. (Can be predicting the occurrence of a landslide given the weather conditions, landslides are rare :-P) If that rare incident only occurs for 1% of the time, a binary classification model predicting all negative (or 0) will get an accuracy of 99%. Does that make the model a good one? No! (Usually, a model having an accuracy of 99% will be dope!) Now you see there is a problem with the accuracy.
The solution is to use F Measure or F1 score or F score or...
Let's look at the equation for calculating F measure.
\[F1 = \frac{2}{\frac{1}{Precision} + \frac{1}{Recall}}\]
If you know a bit more about statistics you'll understand that f-measure is the harmonic mean (which is a type of average) of precision and recall. Now let's see what is precision and recall.
\[Precision = \frac{TruePositives}{TruePositives + FalsePositives}\]
So in simple terms, the Precision value indicates out of all the predicted positives, how many are real positives. Hence this measures how we can trust a positive predicted by our model. Higher Precision implies out of all the positives predicted by our model, most of them are real positives.
Why precision only does not make any sense?? Let's say our model predicts only one positive out of all the real positives and only that. Now the precision of the model becomes 1. (Highest) So this does not ensure that whether our model captures all the positives or not. Then comes the Recall!
\[Recall = \frac{TruePositives}{TruePositives + FalseNegatives}\]
This Recall value tells out of all the real positives, how many of them are captured by our model. Higher recall implies, our model managed to predict most of the real positive cases. Still, Recall only is not complete! because recall does not consider the false positives. A model which predicts many real negatives as positives will still have a higher recall.
To have both features, the harmonic mean is taken to calculate the f-measure.
So as a summary, f-measure is better compared to accuracy in cases where the positive incident is rare. (There is more to this, actually, f-measure is better in cases where the non-positive (negative) case is not clearly definable. I will share the details in a later blog)
Thanks for reading.
An informative explanation on F measure.
ReplyDelete