Create "operation log" CSV formatting tool in 5 days with Python Pandas PyInstaller

A record of making one tool in 5 days without punch lines or mountains

I want to do something new. What you can use at work anyway → I wonder if I should shape the OPLOG.

Oplog is our product term. It is an operation log that contains extra data such as date and time, operator, operation function in csv. Sometimes I get it when something goes wrong. "Operation log".

Pursuing your own happiness is the best motivation. For the time being, I'm happy just to sort by date and time. I decided to start at my own pace.

This is our calendar: Works Human Intelligence Advent Calendar 2020 that embodies Develop fun! Works Human Intelligence # 2 Advent Calendar 2020 that embodies Develop fun!

What to use

Candidates are around here

1. Go because I have never touched it

https://qiita.com/noborus/items/f253961cca6f4465f20c https://golang.org/

2. Data processing seems to be python, so Python

https://qiita.com/ysdyt/items/9ccca82fc5b504e7913a

3. Java normally

https://www.delftstack.com/ja/howto/java/how-to-sort-objects-in-arraylist-by-date-in-java/

That's all for the first day.

the 2nd day

https://www.python.org/downloads/ I installed Python 3.9 for the time being.

https://qiita.com/AI_Academy/items/b97b2178b4d10abe0adb I don't really want to use it for 5 hours, but let's take a look at it.

test.py


#This line is a comment. This line will not be executed.

print("Hello, Python") 

#This line is a comment. This line will not be executed.
#This line is a comment. This line will not be executed.

I got angry when I made and typed test.py at the command prompt

C:\workspace\Python\playground>test.py
SyntaxError: Non-UTF-8 code starting with '\x82' in file C:\Users\works\Desktop\workspace\Python\playground\test.py on line 1, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details

Is it an amateur? Resave in UTF-8 and Hello, Python succeed from?

py -m pip install pandas

https://data-flair.training/blogs/install-pandas-on-windows/ Did you install Pandas now?

https://note.nkmk.me/python-pandas-to-csv/ I want to try this area for the time being, but before that, what is a good python editor? → I put VS code. https://code.visualstudio.com/

Without thinking

firstpandas.py


import pandas as pd

I tried to write, but I got an error

RuntimeError: The current Numpy installation ('C:\\Users\\works\\AppData\\Local\\Programs\\Python\\Python39\\lib\\site-packages\\numpy\\__init__.py') fails to pass a sanity check due to a bug in the windows runtime. See this issue for more information: https://tinyurl.com/y3dm3h86

Third day

RuntimeError: The current Numpy installation (Omission) fails to pass a sanity check due to a bug in the windows runtime. See this issue for more information: https://tinyurl.com/y3dm3h86

Investigate the cause of. See for more information below. https://developercommunity.visualstudio.com/content/problem/1207405/fmod-after-an-update-to-windows-2004-is-causing-a.html use numpy==1.19.3 works

https://qiita.com/bear_montblanc/items/b4b75dfd77da98076da5 If you google this I don't know much about what I believe → The error disappeared. Eh.

I want to sort CSV

https://note.nkmk.me/python-pandas-sort-values-sort-index/ Return to. As per the tutorial

firstpandas.py


import pandas as pd
df = pd.read_csv('sample_pandas_normal.csv', index_col=0)
print(df)

Then run.

C:\workspaces\playground>firstpandas.py 
         age state  point 
name 
Alice     24    NY     64 
Bob       42    CA     92 
Charlie   18    CA     70 
Dave      68    TX     70 
Ellen     24    CA     88 
Frank     30    NY     57 

Oh, it was. Let's sort this. Is it okay to do this?

firstpandas.py


import pandas as pd 
df = pd.read_csv('sample_pandas_normal.csv', index_col=0) 
df.sort_values('age')
print(df)

As a result, it doesn't change. It's not good to do it with a sense. .. First of all, the official document is around here https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.sort_values.html

once again. Writing from the conclusion, this gave me an image.

firstpandas.py


import pandas as pd

df = pd.read_csv('sample_pandas_normal.csv')
print(df)

df_s = df.sort_values('age')
print(df_s)
C:\workspaces\playground>firstpandas.py
      name  age state  point
0    Alice   24    NY     64
1      Bob   42    CA     92
2  Charlie   18    CA     70
3     Dave   68    TX     70
4    Ellen   24    CA     88
5    Frank   30    NY     57
      name  age state  point
2  Charlie   18    CA     70
0    Alice   24    NY     64
4    Ellen   24    CA     88
5    Frank   30    NY     57
1      Bob   42    CA     92
3     Dave   68    TX     70

It ’s better to review the Python grammar.

Python3 cheat sheet (basic) Python cheat sheet basic elements (@IT) Translated Pandas Official Cheat Sheet

Once here.

Try sorting by date

Now, I changed the CSV used a little and added the date column.

sample_pandas_date.csv


name,age,state,point,birthday
Alice,24,NY,64,1996/1/2
Bob,42,CA,92,1978/2/2
Charlie,18,CA,70,2002/3/4
Dave,68,TX,70,1952/1/1
Ellen,24,CA,88,1996/1/5
Frank,30,NY,57,1990/5/15

firstpandas.py


import pandas as pd

df = pd.read_csv('sample_pandas_date.csv')
print(df)

df_s = df.sort_values('birthday')
print(df_s)

What happens with this is below the result.

C:\workspaces\playground>firstpandas.py
      name  age state  point   birthday
0    Alice   24    NY     64   1996/1/2
1      Bob   42    CA     92   1978/2/2
2  Charlie   18    CA     70   2002/3/4
3     Dave   68    TX     70   1952/1/1
4    Ellen   24    CA     88   1996/1/5
5    Frank   30    NY     57  1990/5/15
      name  age state  point   birthday
3     Dave   68    TX     70   1952/1/1
1      Bob   42    CA     92   1978/2/2
5    Frank   30    NY     57  1990/5/15
0    Alice   24    NY     64   1996/1/2
4    Ellen   24    CA     88   1996/1/5
2  Charlie   18    CA     70   2002/3/4

It's becoming like that.

It's time to use the real operation log. When I put the actual CSV, I got an error. This is the end of the excitement.

C:\workspaces\playground>firstpandas.py
Traceback (most recent call last):
  File "C:\workspaces\playground\firstpandas.py", line 3, in <module>
    df = pd.read_csv('oplog20201112.csv')
  File "C:\Users\works\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\io\parsers.py", line 688, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "C:\Users\works\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\io\parsers.py", line 454, in _read
    parser = TextFileReader(fp_or_buf, **kwds)
  File "C:\Users\works\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\io\parsers.py", line 948, in __init__
    self._make_engine(self.engine)
  File "C:\Users\works\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\io\parsers.py", line 1180, in _make_engine
    self._engine = CParserWrapper(self.f, **self.options)
  File "C:\Users\works\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\io\parsers.py", line 2010, in __init__
    self._reader = parsers.TextReader(src, **kwds)
  File "pandas\_libs\parsers.pyx", line 537, in pandas._libs.parsers.TextReader.__cinit__
  File "pandas\_libs\parsers.pyx", line 740, in pandas._libs.parsers.TextReader._get_header
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x83 in position 0: invalid start byte

Day 4

I got an error when I read the CSV

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x83 in position 0: invalid start byte

CSV is like this (masked)

oplog20201112.csv


User ID,client name,Windows login ID,Terminal ID,IP Address,MAC Address,Domain name,Login time,Logout time,Login status,action,Function name,Executable file name(Shell),argument(Command line),Execution time,Execution state
"all","client-name","works","client-name","xx.xx.xx.xx","xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx     xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx  xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx ","123-456","2020/11/12 13:18:56","2020/11/12 13:23:38","success","Program","Job monitoring","Job.exe","-context:*****","2020/11/12 13:19:23","success"
"all","client-name","works","client-name","xx.xx.xx.xx","xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx     xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx  xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx ","123-456","2020/11/12 13:18:56","2020/11/12 13:23:38","success","Program","Main menu","Companyxx.exe","-cfg","2020/11/12 13:18:56","success"
"all","client-name","works","client-name","xx.xx.xx.xx","xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx     xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx  xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx ","123-456","2020/11/12 13:18:56","2020/11/12 13:23:38","success","Program","System setting","Maintenance.exe","-context:*****","2020/11/12 13:19:19","success"
"all","client-name","works","client-name","xx.xx.xx.xx","xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx     xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx  xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx ","123-456","2020/11/12 13:18:56","2020/11/12 13:23:38","success","Batch job","mst13","svc.sh","userid/password","2020/11/12 13:19:32","success"
"all","client-name","works","client-name","xx.xx.xx.xx","xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx     xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx  xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx ","123-456","2020/11/12 13:18:56","2020/11/12 13:23:38","success","Batch job","mst13","test.sh","userid/password 0 0","2020/11/12 13:19:29","success"
"all","client-name","works","client-name","xx.xx.xx.xx","xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx     xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx  xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx ","123-456","2020/11/12 13:18:56","2020/11/12 13:23:38","success","Batch job","mst13","test.sh","userid/password 0 0","2020/11/12 13:19:30","success"
"all","client-name","works","client-name","xx.xx.xx.xx","xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx     xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx  xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx ","123-456","2020/11/12 13:18:56","2020/11/12 13:23:38","success","Batch job","mst13","out.sh","userid/password %JAVA% 0","2020/11/12 13:19:31","success"
"all","client-name","works","client-name","xx.xx.xx.xx","xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx     xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx  xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx ","123-456","2020/11/12 13:18:56","2020/11/12 13:23:38","success","Program","Job management","quevw.exe","-context:*****","2020/11/12 13:19:20","success"
"all","client-name","works","client-name","xx.xx.xx.xx","xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx     xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx  xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx ","123-456","2020/11/12 13:18:56","2020/11/12 13:23:38","success","Program","batchjob jobid:498298","test.sh","userid/password 0 0","2020/11/12 13:19:56","success"
"all","client-name","works","client-name","xx.xx.xx.xx","xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx     xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx  xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx ","123-456","2020/11/12 13:18:56","2020/11/12 13:23:38","success","Program","batchjob jobid:498301","svc.sh","userid/password","2020/11/12 13:21:39","success"

Well, Japanese was included. Encoding required.

https://techacademy.jp/magazine/21128

It doesn't matter at all

Process data with pandas based on the csv file of the new Python corona

That's why it looks interesting.

Once the text was changed to UTF-8 with an editor for confirmation, the reading was successful. Oh, would you like to sort by date and time column as it is?

firstpandas.py


import pandas as pd

df = pd.read_csv('oplog20201112.csv')
print(df)

df_s = df.sort_values('Execution time')
print(df_s)

As a result, they sorted by execution time quite normally. So that's it. Let's save the CSV with a different name.

Return to this page. https://note.nkmk.me/python-pandas-to-csv/

firstpandas.py


import pandas as pd

df = pd.read_csv('oplog20201112.csv')
# print(df)

df_s = df.sort_values('Execution time')
# print(df_s)

df_s.to_csv('out.csv')

The encoding problem mentioned earlier is dealt with below. https://note.sngklab.jp/?p=435

firstpandas.py


import pandas as pd

df = pd.read_csv('oplog20201112.csv',encoding="SHIFT-JIS")
print(df)

df_s = df.sort_values('Execution time')
print(df_s)

df_s.to_csv('out.csv')

I want to bring the execution time column to the far left

Then how to move the column. https://note.nkmk.me/python-pandas-reindex/

As you can see, it looks like that.

firstpandas.py


import pandas as pd

df = pd.read_csv('oplog20201112.csv',encoding="SHIFT-JIS")
# print(df)
df_s = df.sort_values('Execution time')
# print(df_s)
df_s = df_s.reindex(columns=['Execution time', 
                             'Function name', 
                             'User ID', 
                             'client name', 
                             'Windows login ID', 
                             'Terminal ID', 
                             'Login time', 
                             'Logout time'])
df_s.to_csv('out.csv')

Tool distribution

I also want to see how to make it easier to use. There is something called pyintaller.

pyInstaller # Let's actually create it

set path=C:\Users\works\AppData\Local\Programs\Python\Python39\Scripts;%path%

Please note that is required first. At the command prompt

C:\workspaces\playground>pyinstaller firstpandas.py --onefile
67 INFO: PyInstaller: 4.0
67 INFO: Python: 3.9.0
69 INFO: Platform: Windows-10-10.0.19041-SP0
70 INFO: wrote C:\workspaces\playground\firstpandas.spec
(Omission)

Then, Exe was successfully completed.

image.png

https://news.mynavi.jp/article/python-28/

Create * .bat

test.bat


cd /d %~dp0

call firstpandas.exe

The same folder looks like this image.png

There is a firstpandas.exe that just reads in.csv and makes it out.csv.

The file I actually get is not in.csv, so I wonder if I should change the csv received as an argument with * .bat to in.csv.

Column existence check

By the way, there were several types of file formats, so I changed them.

How to check if a column exists in Pandas https://stackoverflow.com/questions/24870306/how-to-check-if-a-column-exists-in-pandas

firstpandas.py


import pandas as pd

df = pd.read_csv('oplog20201112.csv',encoding="SHIFT-JIS")
# print(df)

if 'Execution time' in df:
    df_s = df.sort_values('Execution time')
    df_s = df_s.reindex(columns=['Execution time', 
                                 'Function name', 
                                 'User ID', 
                                 'client name', 
                                 'Windows login ID', 
                                 'Terminal ID', 
                                 'Login time', 
                                 'Logout time'])
if 'PRC_DATE' in df:
    df_s = df.sort_values('PRC_DATE')
    df_s = df_s.reindex(columns=['PRC_DATE', 
                                 'DETAIL1', 
                                 'USERID', 
                                 'TERM_ID'])


df_s.to_csv('out.csv')

File encoding

The following may appear.

UnicodeDecodeError: 'shift_jis' codec can't decode byte 0x87 in position 22224: illegal multibyte sequence

I thought,

Points to note when letting pandas read csv of excel output

https://minus9d.hatenablog.com/entry/2015/07/30/225841 https://stackoverflow.com/questions/6729016/decoding-shift-jis-illegal-multibyte-sequence

The encoding is further changed according to.

firstpandas.py


import pandas as pd

# df = pd.read_csv('in.csv',encoding="SHIFT-JIS")
df = pd.read_csv('in.csv',encoding="shift_jisx0213")
# print(df)

if 'Execution time' in df:
    df_s = df.sort_values('Execution time')
    df_s = df_s.reindex(columns=['Execution time', 
                                 'Function name', 
                                 'User ID', 
                                 'client name', 
                                 'Windows login ID', 
                                 'Terminal ID', 
                                 'Login time', 
                                 'Logout time'])
if 'PRC_DATE' in df:
    df_s = df.sort_values('PRC_DATE')
    df_s = df_s.reindex(columns=['PRC_DATE', 
                                 'DETAIL1', 
                                 'USERID', 
                                 'TERM_ID'])


df_s.to_csv('out.csv')

Other reference https://techacademy.jp/magazine/23367

Finally

formatter.bat


cd /d %~dp0

copy %1 in.csv

call ofmt.exe

echo "see out.csv!"

pause

It looks like this image.png

Recommended Posts

Create "operation log" CSV formatting tool in 5 days with Python Pandas PyInstaller
Make a CSV formatting tool with Python Pandas PyInstaller
Read csv with python pandas
Make apache log csv with python
Load csv with duplicate columns in pandas
Automatic operation of Chrome with Python + Selenium + pandas
Read Python csv data with Pandas ⇒ Graph with Matplotlib
Create a virtual environment with conda in Python
Log in to Yahoo Business with Selenium Python
Create an image with characters in python (Japanese)
Create a new page in confluence with Python
Csv in python
Create an exe file that works in a Windows environment without Python with PyInstaller
[Python] Read Japanese csv with pandas without garbled characters (and extract columns written in Japanese)
Create Amazon Linux with AWS EC2 and log in
How to convert JSON file to CSV file with Python Pandas
Output log in JSON format with Python standard logging
[Xlsxwriter] Create conditional formatting Excel sheet with pandas + xlsxwriter [pandas] Memo
[Python] A memo to write CSV vertically with Pandas
Process csv data with python (count processing using pandas)
Create a fake Minecraft server in Python with Quarry
Create SpatiaLite in Python
Create a new csv with pandas based on the local csv
[Python] Open the csv file in the folder specified by pandas
Create a list in Python with all followers on twitter
Let's create a script that registers with Ideone.com in Python.
[Python] Plotly draws Pandas dataframes in one shot with Cufflinks
Create a simple video analysis tool with python wxpython + openCV
Create youtube ad auto skip tool with python and OCR
Create Heroku, Flask, Python, Nyanko bulletin boards with "csv files"
Create a tool to check scraping rules (robots.txt) in Python
[Python Kivy] How to create an exe file with pyinstaller
How to create dataframes and mess with elements in pandas
How to log in to AtCoder with Python and submit automatically
Scraping with selenium in Python
Create a function in Python
Create a dictionary in Python
Working with LibreOffice in Python
Debugging with pdb in Python
Create 3d gif with python3
Create gif video in Python
S3 operation with python boto3
Working with sounds in Python
[Python] Change dtype with pandas
Scraping with Tor in Python
Tweet with image in Python
Combined with permutations in Python
Write to csv with Python
Create a directory with python
Difference in behavior of transparent Frame made with tkinter in pyinstaller [Python]
How to create a heatmap with an arbitrary domain in Python
Python script written in PyTorch is converted to exe with PyInstaller