I tried scraping food recall information with Python to create a pandas data frame

I decided to keep a learning record of python, so I started Qitta. Since my job is a non-IT company, python is really a hobby ... I mean, I'm learning with interest.

Right now, I'm working on food hygiene, so I wonder if I can do something with python, so let's analyze the food recall information data! I thought.

As the first step, I tried to create a data frame of food recall information by scraping. The source of the data is a site called Recall Plus.

food_recall_info.py


from bs4 import BeautifulSoup
import requests
import re
import csv
import time
import pandas as pd

def recalls(url):
    res = requests.get(url)
    soup = BeautifulSoup(res.text, 'html.parser')
    recall_soup = soup.findAll("tr",{"class":{"return","info","apology"}})
    
    campany_list = []
    recall_list = []
    action_list = []
    recall_date = []

    for j in range(len(recall_soup)):
        #Get company name
        campany_list.append(recall_soup[j].find("a", href=re.compile("/company/*")).get_text())
        #Recall details
        recall_list.append(recall_soup[j].find("a", {"style":"float:left"}).get_text())
        #How to respond
        keyword = re.compile(r'Recovery|Recovery&Refund|Recovery&Refund/Exchange|Recovery&Exchange|Refund|Exchange|点検&Exchange|Notice|Violation of the prize labeling law|apology|Refund/Exchange|Send')
        action_list.append(re.search(keyword, str(recall_soup[j])).group())
        #Accrual date
        recall_date.append(recall_soup[j].find("td", {"class":"day"}).get_text().replace("\n        ","20"))
    return campany_list,recall_list,action_list,recall_date

campany_lists = []
recall_lists = []
action_lists = []
recall_dates = []

for i in range(1,20):
    resl = recalls("https://www.recall-plus.jp/category/1?page={}".format(i))
    campany_lists.extend(resl[0])
    recall_lists.extend(resl[1])
    action_lists.extend(resl[2])
    recall_dates.extend(resl[3])
    
recall_df = pd.DataFrame({'company name':campany_lists,'Recall details':recall_lists,'Correspondence':action_lists,'Accrual date':recall_dates})

Execution result


recall_df.head()
Company name Recall details Correspondence date
0 Kobe Bussan Business Supermarket Imo Stick Some products are mixed with resin pieces Recovery 2020/03/17
1 Marubun Marubun Domestic soybeans used Yose tofu Expiration date mislabeled Collection 2020/03/17
2 AEON Hitachiomiya store...Allergen for pork loin seasoned tonteki(milk)Missing display Apology 2020/03/18
3 Tsuruya Karuizawa store Delicious white fish Fry Allergen Milk ingredient display missing Recovery 2020/03/16
4 Hatanaka Koiya Hatanaka Koiya Koiya Sweet boiled allergen "wheat" missing display Recovery 2020/03/13

Apparently, I think it could be stored in the pandas data frame.

At first, I thought about writing the listing in the data frame each time, but I gave up because I didn't know how to do it. First of all, I made a list of each column and then tried to incorporate it into pandas.

I'm an amateur, so I remembered one thing and added the value with append () at first. However, it was added in list format and I couldn't import it into pandas.

After a lot of research, I found that I could use extend () to add only the values in the list.

I learned one thing.

Now that I have created the data safely, I would like to analyze the data. What can be analyzed with this data now is (1) Percentage of collections and returns when recalls occur ② Is there a time when recalls are likely to occur?

Is not it. I would like to make various trials and errors.

Recommended Posts

I tried scraping food recall information with Python to create a pandas data frame
I tried to create a list of prime numbers with python
[Pandas] I tried to analyze sales data with Python [For beginners]
I tried to get CloudWatch data with Python
I tried to create a program to convert hexadecimal numbers to decimal numbers with python
I tried fMRI data analysis with python (Introduction to brain information decoding)
[Outlook] I tried to automatically create a daily report email with Python
I tried scraping with Python
I tried scraping with python
I want to give a group_id to a pandas data frame
I tried to draw a route map with Python
I tried to automatically generate a password with Python3
I tried to analyze J League data with Python
When I tried to create a virtual environment with Python, it didn't work
I tried to automatically create a report with Markov chain
I tried web scraping with python.
[Python] I tried to automatically create a daily report of YWT with Outlook mail
I tried to make a function to retrieve data from database column by column using sql with sqlite3 of python [sqlite3, sql, pandas]
[5th] I tried to make a certain authenticator-like tool with python
[2nd] I tried to make a certain authenticator-like tool with python
[3rd] I tried to make a certain authenticator-like tool with python
[Python] A memo that I tried to get started with asyncio
I tried to make a periodical process with Selenium and Python
I tried to create Bulls and Cows with a shell program
I tried to make a todo application using bottle with python
[4th] I tried to make a certain authenticator-like tool with python
[1st] I tried to make a certain authenticator-like tool with python
I tried to create a linebot (implementation)
I tried to create a linebot (preparation)
I tried scraping Yahoo News with Python
Python: I tried to make a flat / flat_map just right with a generator
[Data science basics] I tried saving from csv to mysql with python
I tried to communicate with a remote server by Socket communication with Python.
I tried to create a plug-in with HULFT IoT Edge Streaming [Development] (2/3)
I tried to make a traffic light-like with Raspberry Pi 4 (Python edition)
I tried to discriminate a 6-digit number with a number discrimination application made with python
I tried to create CSV upload, data processing, download function with Django
I tried to get the movie information of TMDb API with Python
I tried to create a plug-in with HULFT IoT Edge Streaming [Setup] (1/3)
I tried to build a Mac Python development environment with pythonz + direnv
I tried to create a sample to access Salesforce using Python and Bottle
Create a frame with transparent background with tkinter [Python]
I tried to save the data with discord
I tried to output LLVM IR with Python
Steps to create a Twitter bot with python
I tried to automate sushi making with python
Make holiday data into a data frame with pandas
I want to write to a file with Python
I tried to make a periodical process with CentOS7, Selenium, Python and Chrome
I tried to make a simple mail sending application with tkinter of Python
I tried to create a class that can easily serialize Json in Python
[Patent analysis] I tried to make a patent map with Python without spending money
[ES Lab] I tried to develop a WEB application with Python and Flask ②
I tried to create a button for Slack with Raspberry Pi + Tact Switch
I tried to create a model with the sample of Amazon SageMaker Autopilot
I tried to implement Minesweeper on terminal with python
I tried to get started with blender python script_Part 01
I tried to touch the CSV file with Python
I tried to solve the soma cube with python
I tried to implement a pseudo pachislot in Python
I tried to get started with blender python script_Part 02