[For beginners] Read Excel / CSV files into DataFrame with Google Colaboratory

Hi, this is CE Sabo.

This is Qiita's first post.

I want to analyze data using Python.

In such a case, the first thing that beginners get stuck in is "reading data." (I also stumbled at first.)

What should I do if the data I want to analyze is table data (Excel data, CSV data, etc.)?

This time, I will briefly explain how to read Excel files (.xlsx) and CSV files (.csv) that you will use most often.

The real code is just ** 2 lines **. Let's finish it quickly and move on to the world of data analysis.

Development environment

・ Google Colaboratory

Use Google Colaboratory, which anyone with a Google account can do.

First, import the required libraries

Python has many libraries that you can use to analyze your data.

It's relatively easy to implement.

This time, only "pandas" is OK.

#Import pandas
import pandas as pd

You can use any character string by setting "as ~" to the imported one.

Generally, pandas is abbreviated as pd.

Upload file to Google Colaboratory

Upload the file you want to read to Google Colaboratory. other ① How to write code ② How to read a local file ③ It seems that there is a method to mount and load Google Drive (I personally recommend it), but this time I will introduce the easiest method.

procedure

① Click the file icon on the far left ② Click upload (red frame in the image) and select the file you want to read, or drag and drop it.

google colab.png

If the amount of data is not very large, it will end soon, so you are ready to go.

Use pd.read_excel and pd.read_csv to read data

Let's do it now. The code is one line.

Use the pandas functions read_excel and read_csv.

How to use For Excel files pd.read_excel (file path) For CSV file pd.read_csv (file path) is.

This time, we will load Excel / CSV into DataFrame, so let's name it df and df2 and load it.

I uploaded the 2020 date data date_2020.xlsx and date_2020.csv to Google Colaboratory this time, so the path can be read only by the file name.

The method of ①②③ mentioned above will be a little longer.


#Load Excel / CSV file into DataFrame

df = pd.read_excel("date_2020.xlsx")

df2 = pd.read_csv("date_2020.csv")

Try to display whether it was actually read with head ()

e? I'm worried if I could read it because of this?

If there are no errors, you can read it, but let's check it just in case.

The first 5 lines can be displayed by using the defined DataFrame.head ().

#Show first line
df.head()

Output result ↓

df.head.png

It seems that it was read firmly.

reference

You can also learn details and applied usage ↓

  1. Read csv / tsv file with pandas (read_csv, read_table)
  2. Read Excel file (xlsx, xls) with pandas (read_excel)
  3. How to read an Excel file with read_excel of Pandas

Recommended Posts

[For beginners] Read Excel / CSV files into DataFrame with Google Colaboratory
Stylish technique for pasting CSV data into Excel with Python
Handle Excel CSV files with Python
INSERT into MySQL with Python [For beginners]
[Python] Read images with OpenCV (for beginners)
How to load files in Google Drive with Google Colaboratory
Drop all CSV files under any directory into DataFrame
[Introduction for beginners] Reading and writing Python CSV files
Read excel with openpyxl
■ Kaggle Practice for Beginners --Introduction of Python --by Google Colaboratory
[Python] The biggest weakness / disadvantage of Google Colaboratory [For beginners]
Read csv with python pandas
Study Python with Google Colaboratory
Try OpenCV with Google Colaboratory
How to import CSV and TSV files into SQLite with Python
[For beginners] Script within 10 lines (3. Data acquisition / csv conversion with datareader)
Create and return a CP932 CSV file for Excel with Chalice
Handle csv files with Django (django-import-export)
OpenCV feature detection with Google Colaboratory
100 language processing knock 2020 "for Google Colaboratory"
Read files in parallel with Python
Excel, csv import, export with Django
Read CSV file with Python and convert it to DataFrame as it is
Error due to UnicodeDecodeError when reading CSV file with Python [For beginners]