Python CSV Module Oddity

A Python Oddity

I was using Python to encode a CSV file using a custom dialect recently when I noticed something odd. I noticed that the csv writer class takes an optional argument that enables you to change the line terminator.  That’s ok, however, the csv reader class does not honor the argument. So, through the default api, you can create CSV that you cannot read back in with the default api. According to the documentation, this applies to Python 2.7 through 3.7.  It also probably applies to versions < 2.7 but that documentation is no longer online.

Python.org Documentation

A Simple Script

I wrote the following simple script to illustrate this oddity.  Basically it has a list of lists that it converts to CSV and back using various line terminators, if the output differs from the source, the difference is displayed.

import csv
import io
import re
def convert_to_csv_and_back(rows: list, lineterminator: str):
    with io.StringIO() as o:
        writer = csv.writer(o, lineterminator=lineterminator)
        for row in rows:
            writer.writerow(row)
        with io.StringIO(o.getvalue()) as i:
            reader = csv.reader(i, lineterminator=lineterminator)
            return list(reader)
source = [
    ['this', 'is', 'row', '1'],
    ['this', 'is', 'row', '2'],
    ['this', 'is', 'row', '3']
]
lineterminators = ['|', ':', '\t', '\r\n']
for terminator in lineterminators:
    output = convert_to_csv_and_back(source, terminator)
    source_set = set(map(tuple, source))
    output_set = set(map(tuple, output))
    difference = source_set.symmetric_difference(output_set)
    to_from_csv_failed = len(difference)
    print(f'Line Terminator: {repr(terminator)}')
    print(f'To/From CSV Worked: {"No" if to_from_csv_failed else "Yes"}')
    print(f'Source: {source}')
    print(f'Output: {output}')
    if to_from_csv_failed:
        print(f'Difference: {difference}')
    print('--------------------------')

Results

As you can see from the results here; with the exception of ‘\r\n’, all the various line terminators  failed.

Line Terminator: '|'
To/From CSV Worked: No
Source: [['this', 'is', 'row', '1'], ['this', 'is', 'row', '2'], ['this', 'is', 'row', '3']]
Output: [['this', 'is', 'row', '1|this', 'is', 'row', '2|this', 'is', 'row', '3|']]
Difference: {('this', 'is', 'row', '1|this', 'is', 'row', '2|this', 'is', 'row', '3|'), ('this', 'is', 'row', '1'), ('this', 'is', 'row', '2'), ('this', 'is', 'row', '3')}
--------------------------
Line Terminator: ':'
To/From CSV Worked: No
Source: [['this', 'is', 'row', '1'], ['this', 'is', 'row', '2'], ['this', 'is', 'row', '3']]
Output: [['this', 'is', 'row', '1:this', 'is', 'row', '2:this', 'is', 'row', '3:']]
Difference: {('this', 'is', 'row', '1'), ('this', 'is', 'row', '2'), ('this', 'is', 'row', '1:this', 'is', 'row', '2:this', 'is', 'row', '3:'), ('this', 'is', 'row', '3')}
--------------------------
Line Terminator: '\t'
To/From CSV Worked: No
Source: [['this', 'is', 'row', '1'], ['this', 'is', 'row', '2'], ['this', 'is', 'row', '3']]
Output: [['this', 'is', 'row', '1\tthis', 'is', 'row', '2\tthis', 'is', 'row', '3\t']]
Difference: {('this', 'is', 'row', '1'), ('this', 'is', 'row', '1\tthis', 'is', 'row', '2\tthis', 'is', 'row', '3\t'), ('this', 'is', 'row', '3'), ('this', 'is', 'row', '2')}
--------------------------
Line Terminator: '\r\n'
To/From CSV Worked: Yes
Source: [['this', 'is', 'row', '1'], ['this', 'is', 'row', '2'], ['this', 'is', 'row', '3']]
Output: [['this', 'is', 'row', '1'], ['this', 'is', 'row', '2'], ['this', 'is', 'row', '3']]
--------------------------

Closing Thoughts

So, why would you want to specify the line terminators with the CSV module? There are probably only a handful of reasons, my only reason would be a custom dialect where I wanted to ensure that there were no carriage returns or line feeds in.  99.9% of the time, you want ‘\r\n’.

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply