Get the source of the page to load infinitely with python.

Overview

In this article, I created a page for infinite loading. On such pages, you can't get the source with curl or requests, which is often used, so you need to do something special. This time, I will write the code to get the contents of this page.

environment

*python 3.8.1

code

Get the source for the page created by the code in this article. When the time and the number of acquired bytes limit is reached, the code acquired so far is output.

get_inf_page.py


import requests
import timeout_decorator

r_bytes = b""
def main():
    url = "http://localhost:8000"

    r = requests.get(url, stream=True, timeout=20)

    byte_limit = 30
    @timeout_decorator.timeout(100)
    def load_bytes(r):
        global r_bytes
        for l in r.iter_content():
            r_bytes += l
            if len(r_bytes) % 500 == 0:
                print(f"loaded:{len(r_bytes)}/{byte_limit}")
            if len(r_bytes) > byte_limit:
                r.close()
                print("reached size limit")
                break

    try:
        load_bytes(r)
    except timeout_decorator.timeout_decorator.TimeoutError:
        print("timeout")
        pass

    print(r_bytes)

if __name__ == "__main__":
    main()

Operation check (stopped when the number of loaded bytes is exceeded)

Please move the above code while the code of this article is running in another terminal. It will be displayed as follows.

reached size limit
b'<p>Hello World ! 0</p><p>Hello '

Operation check (stop when load time is over)

Change the 11th and 12th lines for the following and check the operation in the same way as above.

    byte_limit = 1000
    @timeout_decorator.timeout(5)

Only the output amount is displayed within 5 seconds after startup.

timeout
b'<p>Hello World ! 0</p><p>Hello World ! 1</p><p>Hello World ! 2</p>'

that's all.

Recommended Posts

Get the source of the page to load infinitely with python.
PhytoMine-I tried to get the genetic information of plants with Python
I tried to get the authentication code of Qiita API with Python.
Get the number of visits to each page with ReportingAPI + Cloud Functions
I tried to get the movie information of TMDb API with Python
Easy way to check the source of Python modules
How to get the number of digits in Python
Try to get the contents of Word with Golang
Get the operation status of JR West with Python
Note: How to get the last day of the month with python (added the first day of the month)
How to get a list of files in the same directory with python
[Introduction to Python] How to get the index of data with a for statement
I tried to find the entropy of the image with python
Try to get the function list of Python> os package
Link to get started with python
Minimum knowledge to get started with the Python logging module
Get information equivalent to the Network tab of Chrome developer tools with Python + Selenium
Get the weather with Python requests
Get the weather with Python requests 2
How to get the Python version
[Part.2] Crawling with Python! Click the web page to move!
How to get started with Python
Try to automate the operation of network devices with Python
[For beginners] Web scraping with Python "Access the URL in the page to get the contents"
How to get into the python development environment with Vagrant
A memo of misunderstanding when trying to load the entire self-made module with Python3
[Introduction to Python] How to get data with the listdir function
How to get the information of organizations, Cost Explorer of another AWS account with Lambda (python)
[Python] How to get the first and last days of the month
I want to output the beginning of the next month with Python
Output the contents of ~ .xlsx in the folder to HTML with Python
From the introduction of JUMAN ++ to morphological analysis of Japanese with Python
I tried to improve the efficiency of daily work with Python
Try to get CloudWatch metrics with re: dash python data source
The fastest way to get camera images regularly with python opencv
Check the existence of the file with python
The road to compiling to Python 3 with Thrift
I want to extract an arbitrary URL from the character string of the html source with python
Memo of the program to get the date in two digits with javascript, Ruby, Python, shell script
Get a capture of the entire web page in Selenium Python VBA
How to crop the lower right part of the image with Python OpenCV
Get the number of searches with a regular expression. SeleniumBasic VBA Python
How to get the date and time difference in seconds with python
Try to image the elevation data of the Geographical Survey Institute with Python
[Introduction to Python] How to sort the contents of a list efficiently with list sort
Get the number of articles accessed and likes with Qiita API + Python
I tried to streamline the standard role of new employees with Python
Get the return value of an external shell script (ls) with python3
Get the contents of git diff from python
[Python] Read the source code of Bottle Part 2
[Python] Get the files in a folder with Python
Load the network modeled with Rhinoceros in Python ③
Prepare the execution environment of Python3 with Docker
2016 The University of Tokyo Mathematics Solved with Python
[Python] Get / edit the scale label of the figure
Color page judgment of scanned image with python
[Note] Export the html of the site with python.
[Python] Get the main topics of Yahoo News
Get the caller of a function in Python
Specify the Python executable to use with virtualenv
Create a page that loads infinitely with python