Wednesday, April 14, 2010

AUC: a better measurement for predicate performance

While working on the JBN project with Oliver and his PhD student, I came across a more complex measurement for predication performance, the Area under the ROC curve (AUC). Normally, people would just use accuracy to evaluate a inference algorithm, that is how much percentage the predicted result is correct.
But AUC is far more complex and accuracy and measures the predication performance from another aspect. It can be interpreted as the probability that when we randomly pick one positive and one negative example, the classifier will assign a higher score to the positive example than to the negative.

First, I need to explain what is ROC. A receiver operating characteristic curve (ROC) is a graphical plot of true positive rate v.s. false positive rate, for a binary classifier system as its discrimination threshold is varied.
For a binary classification problem, where the outcomes are labeled either as positive (p) or negative (n) class, there are four possible outcomes.
• True positive (TP): the prediction outcome is positive and the actual value is also positive
• False positive (FP): the prediction outcome is positive but the actual value is negative
• True negative (TN): the prediction outcome is negative and the actual value is also negative
• False negative (FN): the prediction outcome is positive and the actual value is also positive
The true positive rate (TPR) is the percentage of correctly classified positive instances out of all real positive instances, i.e. TPR = TP / P = TP / (TP + FN). The false positive rate (FPR), on the other hand, defines the percentage of incorrect positive results among all negative instances, i.e. FPR = FP / N = FP / (FP + TN).
In ROC space defined by FPR and TPR as x and y axes respectively, each point is corresponding to a threshold value. For instance, with one threshold, if probability values below or equal to that are sent to the positive class, and other values are assigned to the negative class, a pair of TPR and FPR can be calculated. The lower leftmost point for a given ROC curve is a classifier's performance on the raw data. The upper rightmost point is always (100%, 100%). The more the point is towards up-left corner, the better, while the more the point is towards right-below corner, the worse. A ROC curve is plotted through points for each possible threshold values results in a curve.
In our project, we use the area under the ROC curve (AUC) to summarize the ROC curve into a single number as one metric to measure performance of a classifier. We used 11 thresholds with interval 10% to form a ROC curve with 11 points and the AUC is calculated using a form of the trapezoid rule, i.e. the sum of trapezoids’ area.


I found it is a very interesting and sophisticated measurement and it was the most interesting thing I learned out of this project. I know it is really hard to explain it in plain text here and I probably did really bad job. But happy to give it a try. :p

No comments:

Post a Comment