There are several ways to determine a prefix match for a string that is Ptyhon. Among them, the following three typical speed comparisons are performed.
Implement in the following execution environment
| item | value | 
|---|---|
| Python Version | 3.8.2 | 
| OS | Ubuntu 20.04 | 
Check the operation based on the following program. The roles of each variable and each function are as follows. Change the variable according to the characteristics you want to measure.
| variable/function | Description | 
|---|---|
| time_logging | Decorator for measuring time | 
| compare_regex | Compare each of the list of argument strings with a regular expression | 
| compare_startswith | Each of the list of argument stringsstartswithCompare by method | 
| compare_str | The first string in each of the list of argument strings istarget_wordCompare if equal to | 
| target_word | Character string to be compared | 
| match_word | target_wordString prefix that matches | 
| not_match_word | target_wordString prefix that does not match | 
| compare_word_num | Total number of strings to compare | 
| compare_func | Function to measure | 
| main | Function to be called | 
import re
import time
def time_logging(func):
    def deco(*args, **kwargs):
        stime = time.time()
        res = func(*args, **kwargs)
        etime = time.time()
        print(f'Finish {func.__name__}. Takes {round(etime - stime, 3)}s.', flush=True)
        return res
    return deco
@time_logging
def compare_regex(compare_words):
    pattern = re.compile(f'^{target_word}')
    for word in compare_words:
        if pattern.match(word):
            pass
@time_logging
def compare_startswith(compare_words):
    for word in compare_words:
        if word.startswith(target_word):
            pass
@time_logging
def compare_str(compare_words):
    length = len(target_word)
    for word in compare_words:
        if word[:length] == target_word:
            pass
target_word = f'foo'
match_word = f'{target_word}'
not_match_word = f'bar'
compare_word_num = 100_000_000
match_rate = 50
compare_func = compare_regex
def main():
    compare_words = []
    for index in range(compare_word_num):
        if index % 100 <= match_rate:
            compare_words.append(f'{match_word}_{index}')
        else:
            compare_words.append(f'{not_match_word}_{index}')
    compare_func(compare_words)
if __name__ == '__main__':
    main()
Since the tendency of execution speed may change depending on the length of the character string to be compared,
Measure the execution speed of compare_regex, compare_startswith, and compare_str when target_word is changed to 5, 10, 50, 100, and 500 characters, respectively.
Unit (seconds)
| function\word count | 5 | 10 | 50 | 100 | 500 | 
|---|---|---|---|---|---|
| compare_regex | 11.617 | 12.044 | 16.126 | 18.837 | 66.463 | 
| compare_startswith | 6.647 | 6.401 | 6.241 | 6.297 | 6.931 | 
| compare_str | 5.941 | 5.993 | 4.87 | 5.449 | 8.875 | 

In terms of speed, it should be implemented with starts with or str [: word_length] for any number of characters. The most recommended is starts with, which is the least affected by the string to be compared.
I also like it the most in terms of readability.
Recommended Posts