1200字范文,内容丰富有趣,写作的好帮手!
1200字范文 > pandas: pd.concat([df1 df3] axis默认=0纵向拼接) concat常用于纵向拼接 默认outer join

pandas: pd.concat([df1 df3] axis默认=0纵向拼接) concat常用于纵向拼接 默认outer join

时间:2021-06-04 15:58:36

相关推荐

pandas: pd.concat([df1 df3] axis默认=0纵向拼接) concat常用于纵向拼接 默认outer join

/pandas26/

/pandas25/

/weixin_37226516/article/details/64134643

两个Series的拼接,默认是在列上(往下)拼接,axis = 0,如果要横向往右拼接,axis = 1

concat(objs, axis=0, join=‘outer’, join_axes=None, ignore_index=False, keys=None, levels=None, names=None, verify_integrity=False, sort=None, copy=True)

s1 = pd.Series(np.arange(10,13))s2 = pd.Series(np.arange(100,103))pd.concat([s1,s2])Out[13]: 0101112120 1001 1012 102dtype: int32

pd.concat([s1,s2], keys = [1,2])Out[14]: 1 0101112122 0 1001 1012 102dtype: int32

pd.concat([s1,s2], keys = [1,2],names = ['from','ID'])Out[16]: from ID101011121012102dtype: int32

横向拼接 axis = 1

要在相接的时候在加上一个层次的key来识别数据源自于哪张表,可以增加key参数

s1 = pd.Series(np.arange(10,15))s2 = pd.Series(np.arange(100,103))pd.concat([s1,s2], axis = 1,keys = ['s1','s2'],names = ['from','ID'])Out[21]: s1s20 10 100.01 11 101.02 12 102.03 13 NaN4 14 NaN

把有相同columns的两个df拼接:Combine twoDataFrameobjects with identical columns.

练习创建df

idx = 'this is a fake data'.split()df1 = pd.DataFrame({'Country':['China','Japan','Germany','USA','UK'],'Team':['A','B','A','C','D']},index = idx)col = 'Country Team'.split()idx_2 = ['fake','world']values = [['KLR',100],['abc',200]]df2 = pd.DataFrame(values,index = idx_2, columns = col)df1Out[43]: Country Teamthis China AisJapan BaGermany AfakeUSA Cdata UK Ddf2Out[44]: Country TeamfakeKLR 100worldabc 200

默认纵向拼接:

pd.concat([df1,df2])Out[45]: Country TeamthisChina Ais Japan BaGermany Afake USA Cdata UK Dfake KLR 100worldabc 200

添加axis = 1 后的拼接,横向拼接如果index 有相同的, 会默认拼接到相同的index 上

pd.concat([df1,df2],axis = 1)Out[46]: Country Team Country TeamaGermanyANaN NaNdata UKDNaN NaNfake USACKLR 100.0is JapanBNaN NaNthisChinaANaN NaNworldNaN NaNabc 200.0

不同columns 拼接:

创建一个不同列的df3:

col = ['Team','SBF']idx_3= ['true','world']values3 = [['red','pm'],['orange','pl']]df3 = pd.DataFrame(values3,index = idx_3, columns = col)df3Out[51]: Team SBFtruered pmworld orange pl

根据列名字做拼接,默认还是在列上拼接,相同列会拼接在一起

pd.concat([df1,df3])Country SBF TeamthisChina NaN Ais Japan NaN BaGermany NaN Afake USA NaN Cdata UK NaN Dtrue NaN pmredworldNaN pl orange

根据列名字做拼接,默认还是在列上拼接,相同列会拼接在一起,但是相同index的行不会在一起:

pd.concat([df2,df3])Out[59]: Country SBF TeamfakeKLR NaN100worldabc NaN200trueNaN pmredworldNaN pl orange

当axis = 1时, index 相同的会拼接,columns 相同的不会,只是简单都左+右都放在一起

pd.concat([df2,df3],axis = 1)Out[62]: Country Team Team SBFfakeKLR 100.0NaN NaNtrueNaN NaNred pmworldabc 200.0 orange pl

抽取其中的一列做拼接:

pd.concat([df1.Team,df2.Team,df3.Team])Out[64]: thisAis Ba AfakeCdataDfake 100world 200true redworld orangeName: Team, dtype: object

如果这样写会报错:

pd.concat(df1['Team'],df2['Team'],df3['Team'])TypeError: first argument must be an iterable of pandas objects, you passed an object of type "Series"pd.concat(df1[['Team']],df2[['Team']],df3[['Team']])TypeError: first argument must be an iterable of pandas objects, you passed an object of type "DataFrame"

Pandas在做数据拼接的时候提供类似于数据库的内连接、外连接的操作。默认是outer join即外连接,可以使用参数指定连接的类型为内连接inner join(交集)。

pd.concat([df2,df3],join = 'inner')Out[73]: Teamfake100world200trueredworld orange

默认的是join = ‘outer’:

pd.concat([df2,df3],join = 'outer')pd.concat([df2,df3])Out[74]: Country SBF TeamfakeKLR NaN100worldabc NaN200trueNaN pmredworldNaN pl orange

无视index的concat:如果两个表的index都没有实际含义,使用ignore_index参数,置true,合并的两个表就睡根据列字段对齐,然后合并。最后再重新整理一个新的index。

pd.concat([df2,df3], ignore_index = True)Out[77]: Country SBF Team0KLR NaN1001abc NaN2002NaN pmred3NaN pl orange

本内容不代表本网观点和政治立场,如有侵犯你的权益请联系我们处理。
网友评论
网友评论仅供其表达个人看法,并不表明网站立场。