• Go

Go CSV

标准库中的encoding/csv包提供了用于读取和写入CSV文件的功能。

从CSV文件读取记录

让我们从CSV文件中读取股票报价:

date,open,high,low,close,volume,Name
2013-02-08,15.07,15.12,14.63,14.75,8407500,AAL
2013-02-11,14.89,15.01,14.26,14.46,8882000,AAL
2013-02-12,14.45,14.51,14.1,14.27,8126000,AAL
2013-02-13,14.3,14.94,14.25,14.66,10259500,AAL
buf := bytes.NewBufferString(csvData)

r := csv.NewReader(buf)
var record []string
nRecords := 0
var err error
for {
    record, err = r.Read()
    if err != nil {
        if err == io.EOF {
            err = nil
        }
        break
    }
    nRecords++
    if nRecords < 5 {
        fmt.Printf("Record: %#v\n", record)
    }
}
if err != nil {
    log.Fatalf("r.Read() failed with '%s'\n", err)
}
fmt.Printf("Read %d records\n", nRecords)

Record: []string{“date”, “open”, “high”, “low”, “close”, “volume”, “Name”}
Record: []string{“2013-02-08”, “15.07”, “15.12”, “14.63”, “14.75”, “8407500”, “AAL”}
Record: []string{“2013-02-11”, “14.89”, “15.01”, “14.26”, “14.46”, “8882000”, “AAL”}
Record: []string{“2013-02-12”, “14.45”, “14.51”, “14.1”, “14.27”, “8126000”, “AAL”}
Read 5 records

As per Go best practices, CSV reader operates on io.Reader interface, which allows it to work on files, network connections, bytes in memory etc.

Read() method reads one CSV line at a time and returns []string slice with all fields in that line and an error.

Returning io.EOF as an error signifies successfully reaching end of file.

Reading all records from CSV file Instead of calling Read() in a loop, we could read all records in one call:

r := csv.NewReader(f)
records, err := r.ReadAll()
if err != nil {
    log.Fatalf("r.ReadAll() failed with '%s'\n", err)
}
// records is [][]string
fmt.Printf("Read %d records\n", len(records))

This time we don’t have to special-case io.EOF as ReadAll does that for us.

Reading all records at once is simpler but will use more memory, especially for large CSV files.

将记录写入CSV文件 现在,让我们将简化的股票报价写入CSV文件:

func writeCSV() error {
    f, err := os.Create("stocks_tmp.csv")
    if err != nil {
        return err
    }

    w := csv.NewWriter(f)
    records := [][]string{
        {"date", "price", "name"},
        {"2013-02-08", "15,07", "GOOG"},
        {"2013-02-09", "15,09", "GOOG"},
    }
    for _, rec := range records {
        err = w.Write(rec)
        if err != nil {
            f.Close()
            return err
        }
    }

    // csv.Writer might buffer writes for performance so we must
    // Flush to ensure all data has been written to underlying
    // writer
    w.Flush()

    // Flush doesn't return an error. If it failed to write, we
    // can get the error with Error()
    err = w.Error()
    if err != nil {
        return err
    }
    // Close might also fail due to flushing out buffered writes
    err = f.Close()
    return err
}

date,price,name
2013-02-08,“15,07”,GOOG
2013-02-09,“15,09”,GOOG

Error handling here is not trivial.

We need to remember to Flush() at the end of writing, check if Flush() failed with Error() and also check that Close() didn’t fail.

The need to check Close() errors is why we didn’t use a simpler defer f.Close(). Correctness and robustness sometimes require more code.

Nalues that had , in them were quoted because comman is used as field separator.

In production code we would also delete the CSV file in case of errors. No need to keep corrupt file around.

Writing all records to CSV file Just like we can read all records at once, we can write multiple records at once:

w := csv.NewWriter(f)
err = w.WriteAll(records)
if err != nil {
    f.Close()
    return err
}

Configuring CSV parsing and writing

CSV is not a well-defined format. It doesn’t have a specification and there are many variants.

Package encoding/csv supports most common CSV formats and allows tweaking reading and writing process.

Configuring CSV Reader After you create csv.Reader with csv.NewReader(), you can set the following fields to change the behavior.

Comma

Most CSV files use , to separate records but other characters are used too. If you have a file that uses ; as a separator you can configure a reader with r.Comma = ‘;'.

Comment

If you want to treat some CSV as comments and ignore them during reading, you can set a comment character.

For example if CSV file is:

Comment

2013-02-08,15.07,AAL you can ignore comment lines by setting r.Comment = ‘#'.

By default CSV reader doesn’t detect comments and will return an error trying to parse comment line.

FieldsPerRecord

Each line in a CSV file (a record) can have a different number of fields.

If you know that e.g. CSV file you’re parsing always has 5 fields in each record (line) then set r.FieldsPerRecord = 5. Read() will return an error if there’s a mismatch.

If you don’t know how many fields there are but know that it’s always the same number, use r.FieldsPerRecord = 0. This is a default so you don’t have to do it explicitly.

In that case csv.Reader will use the first line to detect number of fields and will return an error if subsequent records have a different number of fields.

If you want to allow a variable number of fields per record, set r.FieldsPerRecord = -1.

LazyQuotes

By default false.

If true, csv.Reader is more lax about parsing of quoted values i.e. a quote may appear in an unquoted field and a non-doubled quote may appear in a quoted field.

TrimLeadingSpace

By default false.

If true, csv.Reader leading white space in a field is ignored.

ReuseRecord

By default false.

If true, the []string slice returned by Read() might be re-used across Read() calls.

This is faster but you have to be more careful when using the result.

Configuring CSV writer After you create csv.Writer with csv.NewWriter(), you can set the following fields to change the behavior.

Comma

Field delimiter, , by default.

UseCRLF

False by default.

If true, uses Windows style line terminator (CRLF i.e. \r\n).

By default uses Unix style line terminator (LF i.e. \n).


相关

最新