Scraping using Python 3.5 async / await

Because there was such an article before. http://postd.cc/fast-scraping-in-python-with-asyncio/ Let's do this with the new Python 3.5 syntax.

Ah, the original article scraped html, but I wanted to access a URL that returned a similar amount of response at the same time, so I tried RSS. It's not scraping at all. Well, it's the same thing I'm doing ...

import asyncio
import aiohttp
import feedparser
import time

async def print_first_title(url):
    response = await aiohttp.request('GET', url)
    body = await response.text()
    d = feedparser.parse(body)
    print(d.entries[0].title)

rss = [] #Somehow an array of RSS URLs. I did about 10 Yahoo news

if __name__ == '__main__':
    start = time.time()
    loop = asyncio.get_event_loop()
    loop.run_until_complete(asyncio.wait([print_first_title(url) for url in rss]))
    end = time.time()
    print("{0} ms".format((end - start) * 1000))
390.4871940612793 ms

Well, is it certainly easier to read? But with this alone, it just looks like the decorator and the yield from guy had some sort of syntax. So far I have nothing more than that. Is this amazing?

Rather, like the original article, I use a library called aiohttp to make the communication part asynchronous, but this is very convenient! I did not know! !! !!

Speed comparison with the one who does not use coroutines because of

import urllib.request
import feedparser
import time

def print_first_title(url):
    response = urllib.request.urlopen(url)
    body = response.read()
    d = feedparser.parse(body)
    print(d.entries[0].title)

rss = []

if __name__ == '__main__':
    start = time.time()
    [print_first_title(url) for url in rss]
    end = time.time()
    print("{0} ms".format((end - start) * 1000))
1424.4353771209717 ms

Slow! That's it!

Recommended Posts

Scraping using Python 3.5 async / await
Scraping using Python 3.5 Async syntax
Scraping using Python
python async / await curio
Web scraping using Selenium (Python)
[Scraping] Python scraping
Hide websockets async / await in Python3
[Python] Asynchronous request with async / await
Lightweight thread performance benchmark using async / await implemented in Python 3.5
Python scraping notes
Python Scraping get_ranker_categories
Scraping with Python
Scraping with Python
[Beginner] Python web scraping using Google Colaboratory
Start using Python
Play Python async
Python Scraping eBay
Python Scraping get_title
Scraping a website using JavaScript in Python
Python: Scraping Part 1
[Python] Scraping a table using Beautiful Soup
Python: Scraping Part 2
I tried web scraping using python and selenium
Pharmaceutical company researchers summarized web scraping using Python
Scraping with Python (preparation)
Summary about Python scraping
Try scraping with Python.
Operate Redmine using Python Redmine
Fibonacci sequence using Python
UnicodeEncodeError:'cp932' during python scraping
Data analysis using Python 0
Basics of Python scraping basics
Scraping with Python + PhantomJS
Data cleaning using Python
Using Python #external packages
WiringPi-SPI communication using Python
Age calculation using python
Search Twitter using Python
Scraping with Selenium [Python]
Python web scraping selenium
Scraping with Python + PyQuery
Name identification using python
Notes using Python subprocesses
Try using Tweepy [Python2.7]
Scraping RSS with Python
Convert callback-style asynchronous API to async / await in Python
Python asynchronous processing ~ Full understanding of async and await ~
Python notes using perl-ternary operator
Flatten using Python yield from
I tried scraping with Python
Save images using python3 requests
Web scraping with python + JupyterLab
Scraping with selenium in Python
Scraping with Selenium + Python Part 1
[S3] CRUD with S3 using Python [Python]
[Python] Scraping in AWS Lambda
python super beginner tries scraping
Web scraping notes in python3
[Python] Try using Tkinter's canvas
Scraping with chromedriver in python
Festive scraping with Python, scrapy