scikit-learn探索
1.KFold与StratifiedKFold的区别
class sklearn.model_selection.StratifiedKFold(n_splits=3, shuffle=False, random_state=None)
Stratified K-Folds cross-validator Provides train/test indices to split data in train/test sets.This cross-validation object is a variation of KFold that returns stratified folds. The folds are made by preserving the percentage of samples for each class
意思就是Stra~是按着不同类别标签的相对占比来进行的分折
看看代码
import numpy as np import sklearn.datasets as ds import sklearn.svm as sksvm import sklearn.model_selection as skmodsel a=[[i] for i in range(100)] b=[0]*30;b.extend([1]*30);b.extend([2]*30);b.extend([3]*10) K_strafold=skmodsel.StratifiedKFold(n_splits=10) K_folds=skmodsel.KFold(n_splits=10) for train_indices, test_indices in K_folds.split(a): d={} for i in train_indices: d[b[i]]=d.setdefault(b[i],0)+1 print(d) print('------------------') for train_indices, test_indices in K_strafold.split(a,b): d={} for i in train_indices: d[b[i]]=d.setdefault(b[i],0)+1 print(d) d={} for i in range(100): d[b[i]]=d.setdefault(b[i],0)+1 print(d)
结果如下
{0: 20, 1: 30, 2: 30, 3: 10} {0: 20, 1: 30, 2: 30, 3: 10} {0: 20, 1: 30, 2: 30, 3: 10} {0: 30, 1: 20, 2: 30, 3: 10} {0: 30, 1: 20, 2: 30, 3: 10} {0: 30, 1: 20, 2: 30, 3: 10} {0: 30, 1: 30, 2: 20, 3: 10} {0: 30, 1: 30, 2: 20, 3: 10} {0: 30, 1: 30, 2: 20, 3: 10} {0: 30, 1: 30, 2: 30} ------------------ {0: 27, 1: 27, 2: 27, 3: 9} {0: 27, 1: 27, 2: 27, 3: 9} {0: 27, 1: 27, 2: 27, 3: 9} {0: 27, 1: 27, 2: 27, 3: 9} {0: 27, 1: 27, 2: 27, 3: 9} {0: 27, 1: 27, 2: 27, 3: 9} {0: 27, 1: 27, 2: 27, 3: 9} {0: 27, 1: 27, 2: 27, 3: 9} {0: 27, 1: 27, 2: 27, 3: 9} {0: 27, 1: 27, 2: 27, 3: 9} ------------------ {0: 30, 1: 30, 2: 30, 3: 10}
我们设置的是十折,也就是说将a分为十份,每次取一份来做test集
结果很明显,类别标签里有0,1,2各30个,和10个3标签,他们的占比就是3:3:3:1
当直接使用KFold的时候,则是每次取出十个同一列表标签的来做一折,看起来就好像是按着b的排列顺序,第一次取0~9做第一折,10-19做第二折,以此类推
但当使用Stra~的时候,每次取都是按着占比来取的,每一折都满足3:3:3:1,测试集也是满足这种
相关推荐
liwenshui 2020-08-19
83327712 2020-07-30
Kwong 2020-06-05
80377612 2020-05-25
80377612 2020-02-01
89377069 2019-12-30
89377069 2019-12-14
NS 2019-09-25
CandyGL 2018-05-17
yangzzguang 2019-06-28
81367464 2019-06-27
yuan00yu 2019-06-27
HappinessSourceL 2019-06-27
yukyinbaby 2019-06-27
五小郎的学习笔记 2019-06-25
HappinessSourceL 2019-06-21
89367464 2019-06-21
seedcup 2019-06-10
weijinqian0 2019-05-24