ROC curve的解读

2021-05-02

要点总结

ROC这个术语代表Receiver Operating Characteristic，接受器的工作性能。

20世纪40年代，珍珠港袭击之后，ROC曲线首次被用于鉴别器系统的研究，这些鉴别器就是要在存在噪音的情况下检测出无线电信号。

现在，ROC曲线经常在临床上用来显示敏感性和特异性之间的联系。对于一个测试或一个组合的测试，对于每一个可能的切点cutoff，测试结果所呈现出来的真阳性和假阳性率。此外，ROC曲线下的面积给出了使用相关测试的好处。（Now ROC curves are frequently used to show the connection between clinical sensitivity and specificity for every possible cut-off for a test or a combination of tests. In addition, the area under the ROC curve gives an idea about the benefit of using the test(s) in question.）

该函数为一个或多个模型创建接收者工作特征(ROC)图。ROC曲线绘制虚警率与命中率之间的关系，用于对阈值范围进行概率预测。曲线下的面积被看作是预测准确性的衡量标准。1的测量值表示一个完美的模型。0.5表示的是随机预测。（This function creates Receiver Operating Characteristic (ROC) plots for one or more models. A ROC curve plots the false alarm rate against the hit rate for a probablistic forecast for a range of thresholds. The area under the curve is viewed as a measure of a forecast’s accuracy. A measure of 1 would indicate a perfect model. A measure of 0.5 would indicate a random forecast. ）

ROC 曲线的坐标轴含义:

The x-axis showing 1 – specificity (= false positive fraction = FP/(FP+TN)) 。x轴是假阳性率，也就是真阴性率。
The y-axis showing sensitivity (= true positive fraction = TP/(TP+FN))。y轴是真阳性率。

特殊的ROC

THE PERFECT TEST A perfect test is able to discriminate between the healthy and sick with 100 % sensitivity and 100 % specificity.
THE WORTHLESS TEST When we have a complete overlap between the results from the healthy and the results from the sick population, we have a worthless test. A worthless test has a discriminating ability equal to flipping a coin.
COMPARING ROC CURVES The closer an ROC curve is to the upper left corner, the more efficient is the test.

参数计算方法

#error metrics -- Confusion Matrix
err_metric=function(CM)
{
  TN =CM[1,1]
  TP =CM[2,2]
  FP =CM[1,2]
  FN =CM[2,1]
  precision =(TP)/(TP+FP)
  recall_score =(FP)/(FP+TN)
  f1_score=2*((precision*recall_score)/(precision+recall_score))
  accuracy_model  =(TP+TN)/(TP+TN+FP+FN)
  False_positive_rate =(FP)/(FP+TN)
  False_negative_rate =(FN)/(FN+TP)
  print(paste("Precision value of the model: ",round(precision,2)))
  print(paste("Accuracy of the model: ",round(accuracy_model,2)))
  print(paste("Recall value of the model: ",round(recall_score,2)))
  print(paste("False Positive rate of the model: ",round(False_positive_rate,2)))
  print(paste("False Negative rate of the model: ",round(False_negative_rate,2)))
  print(paste("f1 score of the model: ",round(f1_score,2)))
}

从上述代码可以看出，如下这些评估模型的参数所对应的计算方法：

precision
recall_score
f1_score
accuracy_model
False_positive_rate
False_negative_rate

plot ROC with ggplot2

check_pkg("ISLR")
data <- ISLR::Default
#divide dataset into training and test set
set.seed(1)
sample <-
  sample(c(TRUE, FALSE),
         nrow(data),
         replace = TRUE,
         prob = c(0.7, 0.3))
train <- data[sample,]
test <- data[!sample,]

#fit logistic regression model to training set
model <-
  glm(default ~ student + balance + income,
      family = "binomial",
      data = train)

#use model to make predictions on test set
predicted <- predict(model, test, type = "response")
#load necessary packages
library(ggplot2)
check_pkg("pROC")

#define object to plot and calculate AUC
rocobj <- roc(test$default, predicted)
auc <- round(auc(test$default, predicted),4)

#create ROC plot with minimal theme
ggroc(rocobj, colour = 'steelblue', size = 2) +
  ggtitle(paste0('ROC Curve ', '(AUC = ', auc, ')')) +
  theme_minimal()

致谢

ROC curves – what are they and how are they used

Plotting ROC curve in R Programming

How to Plot a ROC Curve Using ggplot2 (With Examples)