The standard library for handling time in Python is datetime
.
It's a library that can be used without any problems even if you use it properly, and you don't even have to worry about performance if you just call it a little.
However, when it is necessary to generate datetime tens of thousands or tens of millions of times, the bottleneck becomes apparent.
So, I found that the performance can be improved a little by just paying attention to a little thing, so I would like to introduce it.
The conclusion is that we will use the standard library timezone
to generate datetime
.
Suddenly write only the conclusion. After that, please see if you are interested.
I think it is better to generate datetime as follows.
The point is whether to specify timezone
. .. .. That's all.
from datetime import datetime, timedelta, timezone
#Time zone generation
JST = timezone(timedelta(hours=+9), 'JST')
# GOOD,The time zone is specified. early
datetime.now(JST)
datetime.fromtimestamp(UNIX time, JST)
# NG,The time depends on the environment. Slow compared to not specifying a timezone
datetime.now()
datetime.fromtimestamp(UNIX time)
Immediately, I will try to generate datetime
by various methods.
Measure the processing time when datetime
is generated 10 million times.
How much the performance changes depending on whether or not the time zone is specified. I hope you can refer to it.
--Execution environment: - OS: Mac - CPU: Core i5 1.6Ghz --Memory: 4GB DDR3 --Language: Python3.6.2
This is the fastest pattern (as far as I know).
zikan1.py
from datetime import datetime, timedelta, timezone
JST = timezone(timedelta(hours=+9), 'JST')
for _ in range(10000000):
datetime.now(JST)
$ time python zikan1.py
real 0m7.581s
user 0m7.167s
sys 0m0.114s
It looped 10 million times and the result was 7 seconds.
If you do not specify the time zone, it will be slightly slower. Slightly. ..
zikan2.py
from datetime import datetime
for _ in range(10000000):
datetime.now()
$ time python zikan2.py
real 0m9.609s
user 0m9.149s
sys 0m0.111s
It's about 9 seconds. It's a little late.
pytz is a library that is often used to specify the time zone in Python2 series. That's because Python2 didn't yet implement the timezone class. .. ..
zikan3.py
import pytz
from datetime import datetime
# third party
JST = pytz.timezone('Asia/Tokyo')
# performance testing
for _ in range(10000000):
datetime.now(JST)
$ time python zikan3.py
real 1m9.173s
user 1m6.999s
sys 0m0.584s
It was much slower than I expected. We will discuss this later.
Since it is a good idea, I also tried benchmarking with Python 2 system.
In Python2 series, only the interface class for the time zone called tzinfo
is provided, so you have to implement it yourself. Tedious. The pytz
may have become popular because of the trouble.
zikan4.py
from datetime import datetime, timedelta, tzinfo
class JST(tzinfo):
def utcoffset(self, dt):
return timedelta(hours=9)
def dst(self, dt):
return timedelta(0)
def tzname(self, dt):
return 'JST'
for _ in range(10000000):
datetime.now(JST())
$ time python zikan3.py
real 0m55.416s
user 0m51.131s
sys 0m1.532s
slow. .. .. ..
Time zone specified (7s) <Time zone not specified (9s) <python2 (51s) <pytz (66s) It became a feeling.
Both the timezone class of the standard library and the timezone class of pytz
are implementation classes of tzinfo
. However, there is a difference between heaven and earth in performance.
Why?
I haven't come up with a definite answer, but profiling makes the difference between the two obvious.
$ python -m cProfile -s cumtime zikan1.py
10001072 function calls (10001061 primitive calls) in 9.107 seconds
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(function)
2/1 0.000 0.000 9.107 9.107 {built-in method builtins.exec}
1 3.149 3.149 9.107 9.107 zikan1.py:1(<module>)
10000000 5.950 0.000 5.950 0.000 {built-in method now}
3/1 0.000 0.000 0.009 0.009 <frozen importlib._bootstrap>:958(_find_and_load)
3/1 0.000 0.000 0.009 0.009 <frozen importlib._bootstrap>:931(_find_and_load_unlocked)
$ python -m cProfile -s cumtime zikan3.py
70022021 function calls (70021903 primitive calls) in 83.138 seconds
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(function)
14/1 0.001 0.000 83.138 83.138 {built-in method builtins.exec}
1 3.185 3.185 83.138 83.138 zikan5.py:1(<module>)
10000000 9.225 0.000 79.868 0.000 {built-in method now}
10000000 17.973 0.000 70.643 0.000 tzinfo.py:179(fromutc)
20000000 43.347 0.000 43.347 0.000 {method 'replace' of 'datetime.datetime' objects}
The part I want to pay attention to is when running zikan3.py
10000000 17.973 0.000 70.643 0.000 tzinfo.py:179(fromutc)
20000000 43.347 0.000 43.347 0.000 {method 'replace' of 'datetime.datetime' objects}
It is the part of.
pytz has been called datetime.replace ()
and has been executed 20 million times. Not only that, the fromutc
function is being called.
In other words, the process tz.fromutc (datetime.now (). replace (tzinfo = tz))
is running.
Generate datetime of time in UTC => Generate datetime with timezone => Convert to datetime of timezone. .. That's right.
On the other hand, when the time zone of the standard library is given as an argument, it seems that {built-in method now}
processes that side, and although the internal specifications are unknown, it seems to process efficiently. I can tell you that.
By the way ... In other words, let's use the standard library timezone! that's all!
Please do not hesitate to point out any inaccurate content such as typographical errors.
Recommended Posts