/pandas26/
/pandas25/
/weixin_37226516/article/details/64134643
两个Series的拼接,默认是在列上(往下)拼接,axis = 0,如果要横向往右拼接,axis = 1
concat(objs, axis=0, join=‘outer’, join_axes=None, ignore_index=False, keys=None, levels=None, names=None, verify_integrity=False, sort=None, copy=True)
s1 = pd.Series(np.arange(10,13))s2 = pd.Series(np.arange(100,103))pd.concat([s1,s2])Out[13]: 0101112120 1001 1012 102dtype: int32
pd.concat([s1,s2], keys = [1,2])Out[14]: 1 0101112122 0 1001 1012 102dtype: int32
pd.concat([s1,s2], keys = [1,2],names = ['from','ID'])Out[16]: from ID101011121012102dtype: int32
横向拼接 axis = 1
要在相接的时候在加上一个层次的key来识别数据源自于哪张表,可以增加key参数
s1 = pd.Series(np.arange(10,15))s2 = pd.Series(np.arange(100,103))pd.concat([s1,s2], axis = 1,keys = ['s1','s2'],names = ['from','ID'])Out[21]: s1s20 10 100.01 11 101.02 12 102.03 13 NaN4 14 NaN
把有相同columns的两个df拼接:Combine twoDataFrame
objects with identical columns.
练习创建df
idx = 'this is a fake data'.split()df1 = pd.DataFrame({'Country':['China','Japan','Germany','USA','UK'],'Team':['A','B','A','C','D']},index = idx)col = 'Country Team'.split()idx_2 = ['fake','world']values = [['KLR',100],['abc',200]]df2 = pd.DataFrame(values,index = idx_2, columns = col)df1Out[43]: Country Teamthis China AisJapan BaGermany AfakeUSA Cdata UK Ddf2Out[44]: Country TeamfakeKLR 100worldabc 200
默认纵向拼接:
pd.concat([df1,df2])Out[45]: Country TeamthisChina Ais Japan BaGermany Afake USA Cdata UK Dfake KLR 100worldabc 200
添加axis = 1 后的拼接,横向拼接如果index 有相同的, 会默认拼接到相同的index 上
pd.concat([df1,df2],axis = 1)Out[46]: Country Team Country TeamaGermanyANaN NaNdata UKDNaN NaNfake USACKLR 100.0is JapanBNaN NaNthisChinaANaN NaNworldNaN NaNabc 200.0
不同columns 拼接:
创建一个不同列的df3:
col = ['Team','SBF']idx_3= ['true','world']values3 = [['red','pm'],['orange','pl']]df3 = pd.DataFrame(values3,index = idx_3, columns = col)df3Out[51]: Team SBFtruered pmworld orange pl
根据列名字做拼接,默认还是在列上拼接,相同列会拼接在一起
pd.concat([df1,df3])Country SBF TeamthisChina NaN Ais Japan NaN BaGermany NaN Afake USA NaN Cdata UK NaN Dtrue NaN pmredworldNaN pl orange
根据列名字做拼接,默认还是在列上拼接,相同列会拼接在一起,但是相同index的行不会在一起:
pd.concat([df2,df3])Out[59]: Country SBF TeamfakeKLR NaN100worldabc NaN200trueNaN pmredworldNaN pl orange
当axis = 1时, index 相同的会拼接,columns 相同的不会,只是简单都左+右都放在一起
pd.concat([df2,df3],axis = 1)Out[62]: Country Team Team SBFfakeKLR 100.0NaN NaNtrueNaN NaNred pmworldabc 200.0 orange pl
抽取其中的一列做拼接:
pd.concat([df1.Team,df2.Team,df3.Team])Out[64]: thisAis Ba AfakeCdataDfake 100world 200true redworld orangeName: Team, dtype: object
如果这样写会报错:
pd.concat(df1['Team'],df2['Team'],df3['Team'])TypeError: first argument must be an iterable of pandas objects, you passed an object of type "Series"pd.concat(df1[['Team']],df2[['Team']],df3[['Team']])TypeError: first argument must be an iterable of pandas objects, you passed an object of type "DataFrame"
Pandas在做数据拼接的时候提供类似于数据库的内连接、外连接的操作。默认是outer join即外连接,可以使用参数指定连接的类型为内连接inner join(交集)。
pd.concat([df2,df3],join = 'inner')Out[73]: Teamfake100world200trueredworld orange
默认的是join = ‘outer’:
pd.concat([df2,df3],join = 'outer')pd.concat([df2,df3])Out[74]: Country SBF TeamfakeKLR NaN100worldabc NaN200trueNaN pmredworldNaN pl orange
无视index的concat:如果两个表的index都没有实际含义,使用ignore_index参数,置true,合并的两个表就睡根据列字段对齐,然后合并。最后再重新整理一个新的index。
pd.concat([df2,df3], ignore_index = True)Out[77]: Country SBF Team0KLR NaN1001abc NaN2002NaN pmred3NaN pl orange