I tried OmniSci (formerly MapD), which is also a pioneer of SQL database using GPU, on CentOS 7. I used a sample database called flights_2008_7M. The GPU is PNY's RTX 2080Ti 11GB.
The base computer uses a Core i5 7th generation, 16GB DDR 4 RAM, 2TB Crucial SATA SSD.
How to install OmniSci: https://docs.omnisci.com/v4.4.1/4_centos7-yum-gpu-ce-recipe.html It is in.
It contains 7 million data of various flights. The case where the model is Boeing 777 or 737 is extracted by the SQL statement.
The code used (Python 3) is:
omnisci-test.py
from pymapd import connect
import pandas as pd
import time
start = time.time()
con = connect(user="admin", password="PasswordHere", host="localhost", dbname="omnisci")
df = pd.read_sql("SELECT plane_model, uniquecarrier, deptime, arrtime from flights_2008_7M where plane_model like '777%' or plane_model like '737%' limit 1000", con)
print(df.to_string())
path = './omnisci-data.txt'
with open(path, 'w') as f:
        print(df.to_string(), file=f)
end = time.time()
print("Time elapsed: " + str(end - start))
I am measuring the execution time of two prefix LIKE searches or, but it was 0.324 seconds because it is also written to SSD. It was 0.2803 seconds when I measured the part not written on the SSD.
Execution result:
...
985     737-7H4            WN     1029     1126
986     737-7H4            WN     1720     1817
987     737-5H4            WN      754     1154
988     737-3H4            WN     1408     1511
989     737-7H4            WN     2039     2141
990     737-76Q            WN      622      813
991     737-5H4            WN     2125     2226
992     737-7H4            WN     2040     2156
993     737-3H4            WN      757      945
994     737-7H4            WN      641      739
995     737-3Q8            WN     1617     1903
996     737-3A4            WN     1304     1455
997     737-7H4            WN      737      843
998     737-7H4            WN     1952       42
999     737-5H4            WN     1729     1925
Time elapsed: 0.3249530792236328
I tried LIKE search as one. (Boeing 777 only)
df part
df = pd.read_sql("SELECT plane_model, uniquecarrier, deptime, arrtime from flights_2008_7M where plane_model like '777%' limit 1000", con)
The result here was 0.2931640148162842 seconds. (Before writing to SSD)
Write to SSD was 0.29114460945129395 seconds. You can see that the execution speed does not change much with just two LIKEs.
Tsubasa Kato Inspire Search Co., Ltd. CEO
Recommended Posts