scikit-learn探索

80377612

2019-06-25

1.KFold与StratifiedKFold的区别

class sklearn.model_selection.StratifiedKFold(n_splits=3, shuffle=False, random_state=None)
Stratified K-Folds cross-validator Provides train/test indices to split data in train/test sets.This cross-validation object is a variation of KFold that returns stratified folds. The folds are made by preserving the percentage of samples for each class

意思就是Stra~是按着不同类别标签的相对占比来进行的分折

看看代码

import numpy as np
import sklearn.datasets as ds
import sklearn.svm as sksvm
import sklearn.model_selection as skmodsel

a=[[i] for i in range(100)]
b=[0]*30;b.extend([1]*30);b.extend([2]*30);b.extend([3]*10)

K_strafold=skmodsel.StratifiedKFold(n_splits=10) 
K_folds=skmodsel.KFold(n_splits=10)
for train_indices, test_indices in K_folds.split(a):
    d={}
    for i in train_indices:
        d[b[i]]=d.setdefault(b[i],0)+1
    print(d)
    
print('------------------')    
for train_indices, test_indices in K_strafold.split(a,b):
    d={}
    for i in train_indices:
        d[b[i]]=d.setdefault(b[i],0)+1
    print(d)
    
d={}
for i in range(100):
    d[b[i]]=d.setdefault(b[i],0)+1   
print(d)

结果如下

{0: 20, 1: 30, 2: 30, 3: 10}
{0: 20, 1: 30, 2: 30, 3: 10}
{0: 20, 1: 30, 2: 30, 3: 10}
{0: 30, 1: 20, 2: 30, 3: 10}
{0: 30, 1: 20, 2: 30, 3: 10}
{0: 30, 1: 20, 2: 30, 3: 10}
{0: 30, 1: 30, 2: 20, 3: 10}
{0: 30, 1: 30, 2: 20, 3: 10}
{0: 30, 1: 30, 2: 20, 3: 10}
{0: 30, 1: 30, 2: 30}
------------------
{0: 27, 1: 27, 2: 27, 3: 9}
{0: 27, 1: 27, 2: 27, 3: 9}
{0: 27, 1: 27, 2: 27, 3: 9}
{0: 27, 1: 27, 2: 27, 3: 9}
{0: 27, 1: 27, 2: 27, 3: 9}
{0: 27, 1: 27, 2: 27, 3: 9}
{0: 27, 1: 27, 2: 27, 3: 9}
{0: 27, 1: 27, 2: 27, 3: 9}
{0: 27, 1: 27, 2: 27, 3: 9}
{0: 27, 1: 27, 2: 27, 3: 9}
------------------
{0: 30, 1: 30, 2: 30, 3: 10}

我们设置的是十折，也就是说将a分为十份，每次取一份来做test集
结果很明显，类别标签里有0,1,2各30个，和10个3标签，他们的占比就是3:3:3:1
当直接使用KFold的时候，则是每次取出十个同一列表标签的来做一折，看起来就好像是按着b的排列顺序，第一次取0~9做第一折，10-19做第二折，以此类推
但当使用Stra~的时候，每次取都是按着占比来取的，每一折都满足3:3:3:1，测试集也是满足这种

scikit-learn

安科网

scikit-learn探索

80377612

1.KFold与StratifiedKFold的区别

80377612

相关推荐

用于可解释机器学习的 Python 库

数据归一化 scikit-learn中的Scaler

Python数据分析软件包介绍

2020年十大用于数据科学的Python库

MachineLearning入门-2

如何在Apache Pyspark中运行Scikit-learn模型

使用Scikit-Learn库对Keras模型进行超参数调整

十个基本的Python数据科学软件包

在PyODPS DataFrame自定义函数中使用pandas、scipy和scikit-learn

入门系列之Scikit-learn在Python中构建机器学习分类器

初步学习Scikit-learn（sklearn）

在PyODPS DataFrame自定义函数中使用pandas、scipy和scikit-learn

2018年最受大家欢迎的五大机器学习工具和五大数据学习工具

基于scikit-learn机器学习库的分类预测

【译】关于机器学习的11个开源工具

Python机器学习工具：Scikit-Learn介绍与实践

Scikit-Learn 备忘录

Python中Scikit-Learn库的分类方法总览

Python开源机器学习框架：Scikit-learn

收藏｜AI、深度学习、神经网络、大数据备忘录（附资料）

80377612