TechTalks from event: ICML 2011

Evaluation Metrics

  • Brier Curves: a New Cost-Based Visualisation of Classifier Performance Authors: Jose Hernandez-Orallo; Peter Flach; Cèsar Ferri
    It is often necessary to evaluate classifier performance over a range of operating conditions, rather than as a point estimate. This is typically assessed through the construction of ‘curves’over a ‘space’, visualising how one or two performance metrics vary with the operating condition. For binary classifiers in particular, cost space is a natural way of showing this range of performance, visualising loss against operating condition. However, the curves which have been traditionally drawn in cost space, known as cost curves, show the optimal loss, and hence assume knowledge of the optimal decision threshold for a given operating condition. Clearly, this leads to an optimistic assessment of classifier performance. In this paper we propose a more natural way of visualising classifier performance in cost space, which is to plot probabilistic loss on the y-axis, i.e., the loss arising from the probability estimates. This new curve provides new ways of understanding classifier performance and new tools to compare classifiers. In addition, we show that the area under this curve is exactly the Brier score, one of the most popular performance metrics for probabilistic classifiers.
  • A Coherent Interpretation of AUC as a Measure of Aggregated Classification Performance Authors: Peter Flach; Jose Hernandez-Orallo; Cèsar Ferri
    The area under the ROC curve (AUC), a well-known measure of ranking performance, is also often used as a measure of classification performance, aggregating over decision thresholds as well as class and cost skews. However, David Hand has recently argued that AUC is fundamentally incoherent as a measure of aggregated classifier performance and proposed an alternative measure. Specifically, Hand derives a linear relationship between AUC and expected minimum loss, where the expectation is taken over a distribution of the misclassification cost parameter that depends on the model under consideration. Replacing this distribution with a Beta(2;2) distribution, Hand derives his alternative measure H. In this paper we offer an alternative, coherent interpretation of AUC as linearly related to expected loss. We use a distribution over cost parameter and a distribution over data points, both uniform and hence model independent. Should one wish to consider only optimal thresholds, we demonstrate that a simple and more intuitive alternative to Hand’s H measure is already available in the form of the area under the cost curve.