OverView
Summarize the basic usage of the specifications when acquiring each aggregation result in Pandas.
Hereafter, data is used as a DataFrame object.
data[<index>]
#Get row at index 1
# (Since the index starts from 0, it will be the second line.)
data[1]
data[<column_name>]
#Column name'name'To get the column of
data['name']
data[<Boolean per index>]
The boolean image for each index is as follows. In this example, only the rows with indexes 0 and 2 are extracted.
0     True
1    False
2     True
3    False
An example of index unit boolean generation is as follows. You can use the Python judgment formula as it is.
#The value of the age column is 20 or more
data['age'] >= 20
#The name column is'Including the character of ‘ro’
data['name'].str.contains('Ro')
#The name column is unique
data[~data['name'].duplicated()]
Let's output various response formats based on "Get only the lines that meet the conditions **" that was dealt with at the end of the basic edition.
Here, the statistical data stored in data is as follows.
| index | height | class | grade | weight | 
|---|---|---|---|---|
| 0 | 178.820262 | a | 2 | 65.649452 | 
| 1 | 172.000786 | b | 5 | 55.956723 | 
| 2 | 179.337790 | a | 4 | 56.480630 | 
| 3 | 181.204466 | b | 1 | 62.908190 | 
| 4 | 169.483906 | a | 4 | 65.768826 | 
| 5 | 174.893690 | b | 4 | 56.188545 | 
First, let's review the basics.
data[data['class'] == 'a']
       height class grade weight
0   178.820262   a   2  65.649452
2   179.337790   a   4  56.480630
4   169.483906   a   4  65.768826
By executing .values, each value of one record is made into one list, and a double list that has it as an element for each record is acquired.
data[data['class'] == 'a'].values
[[178.8202617298383 'a' 2 65.64945209116877]
 [179.33778995074982 'a' 4 56.48062978465752]
 [169.4839057410322 'a' 4 65.76882607944115]]
Execute .loc to get the index and the specified column value.
This format can be passed to the matplotlib plot.
data.loc[data['class'] == 'a', 'height']
0     178.820262
2     179.337790
4     169.483906
#Plot depiction using args
args = data.loc[data['class'] == 'a', 'height']
fig = plt.figure()
ax = fig.add_subplot(1,1,1)
ax.plot(args)
plt.show()

If the result of .loc is .values, only the specified column values can be obtained in list format.
This is available in hist on matplotlib.
data.loc[data['class'] == 'a', 'height'].values
[178.82026173 179.33778995 169.48390574]
fig = plt.figure()
ax = fig.add_subplot(1,1,1)
ax.hist(data.loc[data['class'] == 'a', 'height'].values)
plt.show()

The more records you have, the more brilliant the graph will be!
Recommended Posts