浅谈pandas中shift和diff函数关系
通过?pandas.DataFrame.shift命令查看帮助文档
Signature: pandas.DataFrame.shift(self, periods=1, freq=None, axis=0) Docstring: Shift index by desired number of periods with an optional time freq
该函数主要的功能就是使数据框中的数据移动,若freq=None时,根据axis的设置,行索引数据保持不变,列索引数据可以在行上上下移动或在列上左右移动;若行索引为时间序列,则可以设置freq参数,根据periods和freq参数值组合,使行索引每次发生periods*freq偏移量滚动,列索引数据不会移动
① 对于DataFrame的行索引是日期型,行索引发生移动,列索引数据不变
In [2]: import pandas as pd ...: import numpy as np ...: df = pd.DataFrame(np.arange(24).reshape(6,4),index=pd.date_range(start= ...: '20170101',periods=6),columns=['A','B','C','D']) ...: df ...: Out[2]: A B C D 2017-01-01 0 1 2 3 2017-01-02 4 5 6 7 2017-01-03 8 9 10 11 2017-01-04 12 13 14 15 2017-01-05 16 17 18 19 2017-01-06 20 21 22 23 In [3]: df.shift(2,axis=0,freq='2D') Out[3]: A B C D 2017-01-05 0 1 2 3 2017-01-06 4 5 6 7 2017-01-07 8 9 10 11 2017-01-08 12 13 14 15 2017-01-09 16 17 18 19 2017-01-10 20 21 22 23 In [4]: df.shift(2,axis=1,freq='2D') Out[4]: A B C D 2017-01-05 0 1 2 3 2017-01-06 4 5 6 7 2017-01-07 8 9 10 11 2017-01-08 12 13 14 15 2017-01-09 16 17 18 19 2017-01-10 20 21 22 23 In [5]: df.shift(2,freq='2D') Out[5]: A B C D 2017-01-05 0 1 2 3 2017-01-06 4 5 6 7 2017-01-07 8 9 10 11 2017-01-08 12 13 14 15 2017-01-09 16 17 18 19 2017-01-10 20 21 22 23
结论:对于时间索引而言,shift使时间索引发生移动,其他数据保存原样,且axis设置没有任何影响
② 对于DataFrame行索引为非时间序列,行索引数据保持不变,列索引数据发生移动
In [6]: import pandas as pd ...: import numpy as np ...: df = pd.DataFrame(np.arange(24).reshape(6,4),index=['r1','r2','r3','r4' ...: ,'r5','r6'],columns=['A','B','C','D']) ...: df ...: Out[6]: A B C D r1 0 1 2 3 r2 4 5 6 7 r3 8 9 10 11 r4 12 13 14 15 r5 16 17 18 19 r6 20 21 22 23 In [7]: df.shift(periods=2,axis=0) Out[7]: A B C D r1 NaN NaN NaN NaN r2 NaN NaN NaN NaN r3 0.0 1.0 2.0 3.0 r4 4.0 5.0 6.0 7.0 r5 8.0 9.0 10.0 11.0 r6 12.0 13.0 14.0 15.0 In [8]: df.shift(periods=-2,axis=0) Out[8]: A B C D r1 8.0 9.0 10.0 11.0 r2 12.0 13.0 14.0 15.0 r3 16.0 17.0 18.0 19.0 r4 20.0 21.0 22.0 23.0 r5 NaN NaN NaN NaN r6 NaN NaN NaN NaN In [9]: df.shift(periods=2,axis=1) Out[9]: A B C D r1 NaN NaN 0.0 1.0 r2 NaN NaN 4.0 5.0 r3 NaN NaN 8.0 9.0 r4 NaN NaN 12.0 13.0 r5 NaN NaN 16.0 17.0 r6 NaN NaN 20.0 21.0 In [10]: df.shift(periods=-2,axis=1) Out[10]: A B C D r1 2.0 3.0 NaN NaN r2 6.0 7.0 NaN NaN r3 10.0 11.0 NaN NaN r4 14.0 15.0 NaN NaN r5 18.0 19.0 NaN NaN r6 22.0 23.0 NaN NaN
通过?pandas.DataFrame.diff命令查看帮助文档,发现和shift函数形式一样
Signature: pd.DataFrame.diff(self, periods=1, axis=0) Docstring: 1st discrete difference of object
下面看看diff函数和shift函数之间的关系
In [13]: df.diff(periods=2,axis=0) Out[13]: A B C D r1 NaN NaN NaN NaN r2 NaN NaN NaN NaN r3 8.0 8.0 8.0 8.0 r4 8.0 8.0 8.0 8.0 r5 8.0 8.0 8.0 8.0 r6 8.0 8.0 8.0 8.0 In [14]: df -df.diff(periods=2,axis=0) Out[14]: A B C D r1 NaN NaN NaN NaN r2 NaN NaN NaN NaN r3 0.0 1.0 2.0 3.0 r4 4.0 5.0 6.0 7.0 r5 8.0 9.0 10.0 11.0 r6 12.0 13.0 14.0 15.0 In [15]: df.shift(periods=2,axis=0) Out[15]: A B C D r1 NaN NaN NaN NaN r2 NaN NaN NaN NaN r3 0.0 1.0 2.0 3.0 r4 4.0 5.0 6.0 7.0 r5 8.0 9.0 10.0 11.0 r6 12.0 13.0 14.0 15.0
相关推荐
mmmjyjy 2020-07-16
roamer 2020-10-29
三石 2020-08-23
QianYanDai 2020-08-16
QianYanDai 2020-07-05
QianYanDai 2020-07-05
jiahaohappy 2020-06-21
QianYanDai 2020-06-16
zhangxiaojiakele 2020-05-25
jzlixiao 2020-05-15
jiahaohappy 2020-05-12
zhangxiaojiakele 2020-05-11
jzlixiao 2020-05-08
Series是一种类似于一维数组的对象,由一组数据以及一组与之对应的索引组成。 index: 索引序列,必须是唯一的,且与数据的长度相同. 如果没有传入索引参数,则默认会自动创建一个从0~N的整数索引
jzlixiao 2020-05-09
jzlixiao 2020-08-18
QianYanDai 2020-07-04
三石 2020-10-30
三石 2020-10-29
wangquannuaa 2020-10-15