要点总结
ROC这个术语代表Receiver Operating Characteristic,接受器的工作性能。
20世纪40年代,珍珠港袭击之后,ROC曲线首次被用于鉴别器系统的研究,这些鉴别器就是要在存在噪音的情况下检测出无线电信号。
现在,ROC曲线经常在临床上用来显示敏感性和特异性之间的联系。对于一个测试或一个组合的测试,对于每一个可能的切点cutoff,测试结果所呈现出来的真阳性和假阳性率。此外,ROC曲线下的面积给出了使用相关测试的好处。(Now ROC curves are frequently used to show the connection between clinical sensitivity and specificity for every possible cut-off for a test or a combination of tests. In addition, the area under the ROC curve gives an idea about the benefit of using the test(s) in question.)
该函数为一个或多个模型创建接收者工作特征(ROC)图。ROC曲线绘制虚警率与命中率之间的关系,用于对阈值范围进行概率预测。曲线下的面积被看作是预测准确性的衡量标准。1的测量值表示一个完美的模型。0.5表示的是随机预测。(This function creates Receiver Operating Characteristic (ROC) plots for one or more models. A ROC curve plots the false alarm rate against the hit rate for a probablistic forecast for a range of thresholds. The area under the curve is viewed as a measure of a forecast’s accuracy. A measure of 1 would indicate a perfect model. A measure of 0.5 would indicate a random forecast. )
ROC 曲线的坐标轴含义:
- The x-axis showing 1 – specificity (= false positive fraction = FP/(FP+TN)) 。x轴是假阳性率,也就是真阴性率。
- The y-axis showing sensitivity (= true positive fraction = TP/(TP+FN))。y轴是真阳性率。
特殊的ROC
- THE PERFECT TEST A perfect test is able to discriminate between the healthy and sick with 100 % sensitivity and 100 % specificity.
- THE WORTHLESS TEST When we have a complete overlap between the results from the healthy and the results from the sick population, we have a worthless test. A worthless test has a discriminating ability equal to flipping a coin.
- COMPARING ROC CURVES The closer an ROC curve is to the upper left corner, the more efficient is the test.
参数计算方法
#error metrics -- Confusion Matrix
err_metric=function(CM)
{
TN =CM[1,1]
TP =CM[2,2]
FP =CM[1,2]
FN =CM[2,1]
precision =(TP)/(TP+FP)
recall_score =(FP)/(FP+TN)
f1_score=2*((precision*recall_score)/(precision+recall_score))
accuracy_model =(TP+TN)/(TP+TN+FP+FN)
False_positive_rate =(FP)/(FP+TN)
False_negative_rate =(FN)/(FN+TP)
print(paste("Precision value of the model: ",round(precision,2)))
print(paste("Accuracy of the model: ",round(accuracy_model,2)))
print(paste("Recall value of the model: ",round(recall_score,2)))
print(paste("False Positive rate of the model: ",round(False_positive_rate,2)))
print(paste("False Negative rate of the model: ",round(False_negative_rate,2)))
print(paste("f1 score of the model: ",round(f1_score,2)))
}
从上述代码可以看出,如下这些评估模型的参数所对应的计算方法:
- precision
- recall_score
- f1_score
- accuracy_model
- False_positive_rate
- False_negative_rate
plot ROC with ggplot2
check_pkg("ISLR")
data <- ISLR::Default
#divide dataset into training and test set
set.seed(1)
sample <-
sample(c(TRUE, FALSE),
nrow(data),
replace = TRUE,
prob = c(0.7, 0.3))
train <- data[sample,]
test <- data[!sample,]
#fit logistic regression model to training set
model <-
glm(default ~ student + balance + income,
family = "binomial",
data = train)
#use model to make predictions on test set
predicted <- predict(model, test, type = "response")
#load necessary packages
library(ggplot2)
check_pkg("pROC")
#define object to plot and calculate AUC
rocobj <- roc(test$default, predicted)
auc <- round(auc(test$default, predicted),4)
#create ROC plot with minimal theme
ggroc(rocobj, colour = 'steelblue', size = 2) +
ggtitle(paste0('ROC Curve ', '(AUC = ', auc, ')')) +
theme_minimal()
致谢
ROC curves – what are they and how are they used