Extract only lines that do not contain a specific string.
In such cases, python's in is convenient.
Whether or not a string A contains a string B can be determined by the following syntax.
[String B] in [String A]
The return value is a logical value.
Use in to extract only the lines that do not contain a particular string.
For example, suppose you want to get only the lines that do not contain a specific word from the following sentence.
example.txt
There is an apple on the desk.
There is an apple on the desk.
There is an apple on the table.
There is a banana on the desk.
There is a banana on the desk.
There is a banana on the table.
The list of words you want to exclude is as follows.
filter.txt
Apple
apple
banana
The script in such a case is as follows.
fitrHavingLine.py
#!/usr/local/bin/python3
# -*- coding: utf-8 -*-
"""
Output a line that does not contain data on the reference file.
"""
__author__  = "Kazuki Nakamae <[email protected]>"
__version__ = "0.00"
__date__    = "2 Jun 2017"
import sys
def fitrHavingLine(infn,reffn,outfn):
    """
    @function   fitrHavingLine();
Output a line that does not contain data on the reference file.
    @param  {string} infn :Input file
    @param  {string} refdir :Reference file
    @param  {string} outfn :Output file
    """
    inf = open(infn, 'r')
    for infline in inf:
        isNothing = True
        ref = open(reffn, 'r')
        #If the character string on the reference file exists, set the judgment to False.
        for refline in ref:
            if refline.strip() in infline:
                isNothing=False
                break
        ref.close()
        if isNothing:
            outf    =   open(outfn, 'a')
            outf.write(infline)
            outf.close()
    inf.close()
if __name__ == '__main__':
    argvs = sys.argv
    argc = len(argvs)
    if (argc != 4):   #Argument check
        print("USAGE : python3 fitrHavingLine.py <INPUT FILE> <REFERENCE FILE> <OUTPUT FILE>")
        quit()
    fitrHavingLine(argvs[1],argvs[2],argvs[3])
quit()
Enter the following in bash:
python3 fitrHavingLine.py example.txt filter.txt out.txt
The output is as follows.
out.txt
There is an apple on the desk
There is a banana on the desk
There is a banana on the table.
if refline.strip() in infline:
                isNothing=False
                break
On the contrary, if you change the part of, it will be possible to process the line containing a specific character or only a specific line.
It may be useful when handling data with various items such as csv files.
that's all. Thank you very much.
Recommended Posts