The difference among TrainSet, ValidationSet and TestSet
之前一直没搞明白这三者的区别,尤其是验证集和测试集。后来,google了一下,看了一些资料,下面这个最靠谱吧。
- Training set: A set of examples used for learning, which is to fit the parameters [i.e., weights] of the classifier.
- Validation set: A set of examples used to tune the parameters [i.e., architecture, not weights] of a classifier, for example to choose the number of hidden units in a neural network.
- Test set: A set of examples used only to assess the performance [generalization] of a fully specified classifier.
这是Ripley, B.D(1996)在他的经典专著Pattern Recognition and Neural Networks中给出了这三个词的定义。
简单来说就是:
- 训练集:用于决定模型参数(如神经网络中,各层之间的权重系数)
- 验证集:用于选择模型(如神经网络的结构,隐藏节点的个数)
- 测试集:用于测试模型的泛化性能
References:
- http://blog.sina.com.cn/s/blog_4d2f6cf201000cjx.html
- http://www.cppblog.com/guijie/archive/2008/07/29/57407.html
- http://stats.stackexchange.com/questions/19048/what-is-the-difference-between-test-set-and-validation-set
- http://blog.sciencenet.cn/blog-397960-666113.html
- http://stackoverflow.com/questions/2976452/whats-is-the-difference-between-train-validation-and-test-set-in-neural-networ