Hello, this is Misuda of engineers. This time, I will describe that I tried to write and read UTF-8 with BOM by csv import!
When developing a business system, we receive a request "I want to create data!" In a batch. .. .. At that time, I think about importing with csv first. Creating an API also costs the other party to develop.
** The problem here is how to edit csv. ** ** Are you using * Microsoft Excel *? After all, it's easy to edit!
When considering the use of * Microsoft Excel *, you can edit it by creating a CSV with Shift-JIS. If the DB is UTF-8, it is necessary to convert the character code on the server side. When this happens, it is a battle with the character code. To be honest, I don't feel like winning.
In such a case, UTF-8 with BOM (byte order mark) seems to open with * Microsoft Excel * without garbled characters!
This time, JAVA will generate the file. In the case of UTF-8, the beginning of the file will be [0xEF 0xBB 0xBF].
import java.io.*;
import java.util.Arrays;
import java.util.List;
public class Main {
    
    /**
     *Create a CSV file with BOM (character code is UTF)-8)
     *
     * @param 
     * @return 
     */
    public static void main(String[] args) {
        File file = new File("File path");
        List header = Arrays.asList("Apple","Mandarin orange","banana","Strawberry","melon","Grape");
        try(FileOutputStream fos = new FileOutputStream(file);
            OutputStreamWriter osw = new OutputStreamWriter(fos, "UTF-8");
            PrintWriter writer = new PrintWriter(osw)){
            //BOM grant
            fos.write(0xef);
            fos.write(0xbb);
            fos.write(0xbf);
            header.forEach(c -> {
                writer.print(c);
                writer.print(",");
            });
        } catch (IOException e) {
            System.out.println("Failed to generate the file.");
        }
    }
}
It's okay if the generated file is definitely UTF-8 with BOM, but sometimes it isn't. Enter the judgment and read.
import java.io.*;
import java.nio.charset.StandardCharsets;
import org.apache.commons.codec.binary.Hex;
public class Main {
    /**
     *Read CSV file with BOM (character code is UTF)-8)
     *
     * @param
     * @return
     */
    public static void main(String[] args) {
        File file = new File("File path");
        try (FileInputStream fs = new FileInputStream(file);
             InputStreamReader isr = new InputStreamReader(fs, StandardCharsets.UTF_8);
             LineNumberReader lnr = new LineNumberReader(isr)) {
            //The first line
            String row = lnr.readLine();
            if (row != null && !row.isEmpty()) {
                //Get the first character
                String bom = row.substring(0, 1);
                //Convert first character to byte to character(Use Apache Commons Codec Hex class)
                String bomByte = new String(Hex.encodeHex(bom.getBytes()));
                if ("efbbbf".equals(bomByte)) {
                    //Eliminate BOM
                    row = row.substring(1);
                }
                System.out.println(row);
            }
            //Split information from the second line
        } catch (Exception e) {
            System.out.println("Failed to read the file.");
        }
    }
}
Both MacOS and WindowsOS were opened in * Microsoft Excel * and were not garbled and could be edited! After that, I think that he is editing using a text file. I wonder if there is no choice but to support it.
Recommended Posts