Convert file encoding using python

1. Introduction

In this post, I would demo how to convert a text file’s encoding by using python.

2. Environments

  • python 2.7

3. The requirement

  • There is a text file named a.dat whose encoding is not utf-8
  • The a.dat contains lines of texts
  • You want to convert the whole file’s encoding to utf-8

4. The code

#coding:utf-8
import io
fname = 'a.dat'

def process_file():
    text = None
    with io.open(fname, 'r', encoding='latin_1', newline='\n') as fin:
        text = fin.read()
    with io.open(fname+'_out', 'w', encoding='utf-8', newline='\n') as fout:
        fout.write(text)
    pass
if __name__ == '__main__':
    process_file()
  • We read the file with latin_1 encoding to variable text
  • We write the file content text to a file with encoding utf-8

5. The io.open

Here we used the io.open to read and write file content with encodings, the io.open is:

io.open(file, mode=’r’, buffering=-1, encoding=None, errors=None, newline=None, closefd=True) Open file and return a corresponding stream. If the file cannot be opened, an IOError is raised.

encoding is the name of the encoding used to decode or encode the file. This should only be used in text mode. The default encoding is platform dependent (whatever locale.getpreferredencoding() returns), but any encoding supported by Python can be used. See the codecs module for the list of supported encodings.

It’s so easy, do you think so?