【pandas教程】pandas.read_excel()函数header参数详解

2021/03 16 22:03
阅读(7541)

在现实过程中，excel存储着各种各样的表格数据，每个表的表头(标题)也不一样，有的是一行标题，有的是多行标题，所以利用pandas的read_excel()读取excel表格时，需要通过header参数和index_col参数来指列索引和行索引。

read_excel()函数的header参数决定DataFrame的列索引，可以有以几下种类型：

默认值：0
空类型：None
int类型：header参数为0，也就是第一行作为列索引(理解为表头)
list类型：[0,1] 多行索引MultiIndex

read_excel()函数的index_col参数决定DataFrame的列索引，可以有以几下种类型：

默认值：None。此时程序会给自动给df加一个位置索引(0、1、2、3、4…)
int类型：0、1、2分别对应第一列、二列、三列
list类型：多行索引MultiIndex

下面以下表(表名：header.xlsx)数据来进行read_excel()函数header参数的详细讲解，对应着常见的几中表头：
第一个表(Sheet1)只有数据，没有表头；
第二个表(Sheet2)有一个单行表头；
第三个表(Sheet3)表头上面有一个空行；
第四个表(Sheet4)有一个多行表头；(祥见下图)

1、直接pd.read_excel(r’header.xlsx’)，不传任何参数。

 
#没有传递sheet_name,header参数，默认为第一个表，第一行为列索引，即：sheet_name=0,header=0，行列索引都是从0开始编号。
pd.read_excel(r'header.xlsx')
</code>
<!-- /wp:shortcode --></div>
<!-- /wp:group -->

<!-- wp:image {"id":359,"sizeSlug":"large","linkDestination":"none"} -->
<figure class="wp-block-image size-large"><img src="http://www.dszhp.com/wp-content/uploads/2021/03/read_excel函数header参数01.png" alt="read_excel()函数header参数01" class="wp-image-359"/></figure>
<!-- /wp:image -->

<!-- wp:paragraph -->
<p> 因为没有传递任何参数，默认读取header.xlsx中的第1个表，并以这个表的第一行作为列索引。</p>
<!-- /wp:paragraph --></div>
<!-- /wp:column --></div>
<!-- /wp:columns -->

<!-- wp:paragraph -->
<p>2、传入hedaer=None参数，pd.read_excel(r'header.xlsx',header=None),没有指定列索引，pandas会以数字索引补充。</p>
<!-- /wp:paragraph -->

<!-- wp:shortcode -->
 
#没有传递sheet_name,默认为第一个表，header=None，pandas自动补充数字索引。
pd.read_excel(r'header.xlsx',header=None)

3、传入sheet_name=1参数，header=0(默认值)，pd.read_excel(r'header.xlsx',sheet_name=1,header=0,读取第二个表(Sheet2)，以第一行为表头。

 
#传递sheet_name=1,header=0(默认值)，读取第2个表(Sheet2)，以第一行为表头。
pd.read_excel(r'header.xlsx',sheet_name=1,header=0)

4、传入sheet_name=2参数，header=1，pd.read_excel(r'header.xlsx',sheet_name=2,header=1),读取第三个表(Sheet3)，跳过第一行空行，以第二行为表头。

 
#传递sheet_name=2,header=1，读取第3个表(Sheet3)，跳过第一行空行，以第二行为表头。
pd.read_excel(r'header.xlsx',sheet_name=2,header=0)

5、传入sheet_name=3参数，header=[0,1]一个列表，pd.read_excel(r'header.xlsx',sheet_name=3,header=[0,1]),读取第四个表(Sheet4)，以第四个表的1、2行为列索引，[...]中可以为连续的行，也可以为不连续的行，比如[0,3],这时行索引默认为了分公司。

 
#传递sheet_name=3,header=[0,1]，读取第4个表(Sheet4)，以第4个表的1-2行为表头。
pd.read_excel(r'header.xlsx',sheet_name=2,header=0)

6、传入sheet_name=2参数，header=1，pd.read_excel(r'header.xlsx',sheet_name=2,header=1,index_col=0),读取第三个表(Sheet3)，跳过第一行空行，以第二行为表头，并以开课日期这列为行索引。

 
#传递sheet_name=1,header=0(默认值)，读取第2个表(Sheet2)，以第一行为表头，以开课日期为行索引。
pd.read_excel(r'header.xlsx',sheet_name=1,header=0,index_col=0)

pandas读取excel所用的read_excel()函数参数header与index_col的用法是相似的，通过掌握header参数与index_col参数的用法，相信对于大部分不同表结构的excel表都能通过pandas读取，为日常的数据处理带来方便。

 
#没有传递sheet_name,header参数，默认为第一个表，第一行为列索引，即：sheet_name=0,header=0，行列索引都是从0开始编号。
pd.read_excel(r'header.xlsx')
</code>
<!-- /wp:shortcode --></div>
<!-- /wp:group -->

<!-- wp:image {"id":359,"sizeSlug":"large","linkDestination":"none"} -->
<figure class="wp-block-image size-large"><img src="http://www.dszhp.com/wp-content/uploads/2021/03/read_excel函数header参数01.png" alt="read_excel()函数header参数01" class="wp-image-359"/></figure>
<!-- /wp:image -->

<!-- wp:paragraph -->
<p> 因为没有传递任何参数，默认读取header.xlsx中的第1个表，并以这个表的第一行作为列索引。</p>
<!-- /wp:paragraph --></div>
<!-- /wp:column --></div>
<!-- /wp:columns -->

<!-- wp:paragraph -->
<p>2、传入hedaer=None参数，pd.read_excel(r'header.xlsx',header=None),没有指定列索引，pandas会以数字索引补充。</p>
<!-- /wp:paragraph -->

<!-- wp:shortcode -->
 
#没有传递sheet_name,默认为第一个表，header=None，pandas自动补充数字索引。
pd.read_excel(r'header.xlsx',header=None)

3、传入sheet_name=1参数，header=0(默认值)，pd.read_excel(r'header.xlsx',sheet_name=1,header=0,读取第二个表(Sheet2)，以第一行为表头。

 
#传递sheet_name=1,header=0(默认值)，读取第2个表(Sheet2)，以第一行为表头。
pd.read_excel(r'header.xlsx',sheet_name=1,header=0)

4、传入sheet_name=2参数，header=1，pd.read_excel(r'header.xlsx',sheet_name=2,header=1),读取第三个表(Sheet3)，跳过第一行空行，以第二行为表头。

 
#传递sheet_name=2,header=1，读取第3个表(Sheet3)，跳过第一行空行，以第二行为表头。
pd.read_excel(r'header.xlsx',sheet_name=2,header=0)

 
#传递sheet_name=3,header=[0,1]，读取第4个表(Sheet4)，以第4个表的1-2行为表头。
pd.read_excel(r'header.xlsx',sheet_name=2,header=0)

 
#传递sheet_name=1,header=0(默认值)，读取第2个表(Sheet2)，以第一行为表头，以开课日期为行索引。
pd.read_excel(r'header.xlsx',sheet_name=1,header=0,index_col=0)

 
#没有传递sheet_name,header参数，默认为第一个表，第一行为列索引，即：sheet_name=0,header=0，行列索引都是从0开始编号。
pd.read_excel(r'header.xlsx')
</code>
<!-- /wp:shortcode --></div>
<!-- /wp:group -->

<!-- wp:image {"id":359,"sizeSlug":"large","linkDestination":"none"} -->
<figure class="wp-block-image size-large"><img src="http://www.dszhp.com/wp-content/uploads/2021/03/read_excel函数header参数01.png" alt="read_excel()函数header参数01" class="wp-image-359"/></figure>
<!-- /wp:image -->

<!-- wp:paragraph -->
<p> 因为没有传递任何参数，默认读取header.xlsx中的第1个表，并以这个表的第一行作为列索引。</p>
<!-- /wp:paragraph --></div>
<!-- /wp:column --></div>
<!-- /wp:columns -->

<!-- wp:paragraph -->
<p>2、传入hedaer=None参数，pd.read_excel(r'header.xlsx',header=None),没有指定列索引，pandas会以数字索引补充。</p>
<!-- /wp:paragraph -->

<!-- wp:shortcode -->
 
#没有传递sheet_name,默认为第一个表，header=None，pandas自动补充数字索引。
pd.read_excel(r'header.xlsx',header=None)

3、传入sheet_name=1参数，header=0(默认值)，pd.read_excel(r'header.xlsx',sheet_name=1,header=0,读取第二个表(Sheet2)，以第一行为表头。

 
#传递sheet_name=1,header=0(默认值)，读取第2个表(Sheet2)，以第一行为表头。
pd.read_excel(r'header.xlsx',sheet_name=1,header=0)

4、传入sheet_name=2参数，header=1，pd.read_excel(r'header.xlsx',sheet_name=2,header=1),读取第三个表(Sheet3)，跳过第一行空行，以第二行为表头。

 
#传递sheet_name=2,header=1，读取第3个表(Sheet3)，跳过第一行空行，以第二行为表头。
pd.read_excel(r'header.xlsx',sheet_name=2,header=0)

 
#传递sheet_name=3,header=[0,1]，读取第4个表(Sheet4)，以第4个表的1-2行为表头。
pd.read_excel(r'header.xlsx',sheet_name=2,header=0)

 
#传递sheet_name=1,header=0(默认值)，读取第2个表(Sheet2)，以第一行为表头，以开课日期为行索引。
pd.read_excel(r'header.xlsx',sheet_name=1,header=0,index_col=0)

由o郭二爷o原创或整理--转载请注明: http://www.dszhp.com/read_excel_header.html

一	二	三	四	五	六	日
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

发表回复 取消回复

发表回复取消回复