Summary of the flow until extracting the element of the specified URL using python selenium on EC2.
--Install chrome driver --Install chrome --Installing selenium --Installation of Japanese fonts --Extract the text of the specified URL (text.py) --Get a screen capture of the specified URL (capture.py)
-Connected to an EC2 instance using ssh. -Python3 is already installed.
How to connect to an EC2 instance using ssh How to build python3 environment on EC2
(1) Move to the DL page of the version you want to download from the Official page of Chrome Driver.
(2) Copy the link address for linux64.
③ DL and decompress
python
#Move to tmp directory
$ cd/tmp/
#Download chromedriver (URL is copy)
$ wget https://chromedriver.storage.googleapis.com/83.0.4103.39/chromedriver_linux64.zip
#Defrost
$ unzip chromedriver_linux64.zip
#Unzipped file/user/Move under bin
$ sudo mv chromedriver /usr/bin/chromedriver
#Complete chrome installation in one sentence
$ curl https://intoli.com/install-google-chrome.sh | bash
Complete! <-Successful installation
Successfully installed Google Chrome!
#Rename file
$ sudo mv /usr/bin/google-chrome-stable /usr/bin/google-chrome
#Check version
$ google-chrome --version && which google-chrome
Google Chrome 83.0.4103.61 <- --execution result of version
/usr/bin/google-chrome <-Execution result of which
$ pip3 install selenium
If you do not install it, the characters will be garbled when you capture the screen.
① Create a text.py file in the user folder
python
$ cd ~
$ touch text.py
$ vi text.py
② The vim editor will start up, so copy and paste the following. └ Press the "i" key to enter insert mode. └ Copy and paste is "shift + ins" (or right-click and select paste)
python
#-*- coding: utf-8 -*-
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
options = Options()
options.headless = True
driver = webdriver.Chrome(options=options)
#Specifying the URL
driver.get("https://www.google.co.jp/")
#Specify the element to scrape
element_text = driver.find_element_by_id("hptl").text
print(element_text)
driver.quit()
③ After pasting, save the vim editor below and finish.
esc + :wq + Enter
④ Execute the created file
$ python3 text.py
#Success if the following is displayed
About Google Store
Scraping the text on the top right of google top is complete.
① Create a capture.py file in the user folder
python
$ cd ~
$ touch capture.py
$ vi capture.py
② The vim editor will start up, so copy and paste the following. └ Press the "i" key to enter insert mode. └ Copy and paste is "shift + ins" (or right-click and select paste)
python
#-*- coding: utf-8 -*-
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
options = Options()
options.headless = True
#Specify the screen size to capture
options.add_argument('--window-size=1280,1024')
driver = webdriver.Chrome(options=options)
#Specify URL
driver.get("https://www.google.co.jp/")
#Specify the capture file name and extension
driver.save_screenshot('googletop.png')
driver.quit()
③ After pasting, save the vim editor below and finish.
esc + :wq + Enter
④ Execute the created file
$ python3 capture.py
#Success if the following files are created in the same directory
$ ls
googletop.png
Recommended Posts