UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 360: character maps to <undefined> #34

narasimha1805 · 2020-04-19T20:03:03Z

Getting 'UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 360: character maps to ' error while importing word_topic_vectors from nlpia.book.examples.ch04_catdog_las*

Below is the error:

UnicodeDecodeError Traceback (most recent call last)
in
----> 1 from nlpia.book.examples.ch04_catdog_lsa_3x6x16 import word_topic_vectors

d:\python\lib\site-packages\nlpia\book\examples\ch04_catdog_lsa_3x6x16.py in
68 tfidfer = TfidfVectorizer(min_df=2, max_df=.6, stop_words=None, token_pattern=r'(?u)\b\w+\b')
69
---> 70 corpus = get_data('cats_and_dogs')[:NUM_DOCS]
71 docs = normalize_corpus_words(corpus, stemmer=None)
72 tfidf_dense = pd.DataFrame(tfidfer.fit_transform(docs).todense())

d:\python\lib\site-packages\nlpia\loaders.py in get_data(name, nrows, limit)
1111 return filepaths[name]
1112 elif name in DATASET_NAME2FILENAME:
-> 1113 return read_named_csv(name, nrows=nrows)
1114 elif name in DATA_NAMES:
1115 return read_named_csv(DATA_NAMES[name], nrows=nrows)

d:\python\lib\site-packages\nlpia\loaders.py in read_named_csv(name, data_path, nrows, verbose)
1003 name = DATASET_NAME2FILENAME[name]
1004 if name.lower().endswith('.txt') or name.lower().endswith('.txt.gz'):
-> 1005 return read_text(os.path.join(data_path, name), nrows=nrows)
1006 else:
1007 return read_csv(os.path.join(data_path, name), nrows=nrows)

d:\python\lib\site-packages\nlpia\futil.py in read_text(forfn, nrows, verbose)
416 """
417 tqdm_prog = tqdm if verbose else no_tqdm
--> 418 nrows = wc(forfn, nrows=nrows) # not necessary when nrows==None
419 lines = np.empty(dtype=object, shape=nrows)
420 with ensure_open(forfn) as f:

d:\python\lib\site-packages\nlpia\futil.py in wc(f, verbose, nrows)
48 tqdm_prog = tqdm if verbose else no_tqdm
49 with ensure_open(f, mode='r') as fin:
---> 50 for i, line in tqdm_prog(enumerate(fin)):
51 if nrows is not None and i >= nrows - 1:
52 break

d:\python\lib\encodings\cp1252.py in decode(self, input, final)
21 class IncrementalDecoder(codecs.IncrementalDecoder):
22 def decode(self, input, final=False):
---> 23 return codecs.charmap_decode(input,self.errors,decoding_table)[0]
24
25 class StreamWriter(Codec,codecs.StreamWriter):

UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 1592: character maps to

woo9904 · 2020-12-02T09:26:36Z

It is a UnicodeDecodeError.
Maybe this example can make some help for understanding about the error.

file = open(filename, encoding="utf8")

For a solution, find futil.py file installed in your computer. (d:\python\lib\site-packages\nlpia\futil.py)

find function named ensure_open and edit some code.

fin = f
if isinstance(f, basestring):
    if len(f) <= MAX_LEN_FILEPATH:
        f = find_filepath(f) or f
        if f and (not hasattr(f, 'seek') or not hasattr(f, 'readlines')):
            if f.lower().endswith('.gz'):
                return gzip.open(f, mode=mode,encoding='UTF-8')
            return open(f, mode=mode,encoding='UTF-8')
        f = fin  # reset path in case it is the text that needs to be opened with StringIO
    else:
        f = io.StringIO(f)
elif f and getattr(f, 'closed', None):
    if hasattr(f, '_write_gzip_header'):
        return gzip.open(f.name, mode=mode,encoding='UTF-8')
    else:
        return open(f.name, mode=mode,encoding='UTF-8')
return f

I just added ",encoding='UTF-8'" when open() comes out.

danielgran · 2021-05-16T05:16:59Z

Doesn't work for me either, whats the problem?

danielgran · 2021-05-16T05:17:47Z

It is a UnicodeDecodeError.
Maybe this example can make some help for understanding about the error.

file = open(filename, encoding="utf8")

For a solution, find futil.py file installed in your computer. (d:\python\lib\site-packages\nlpia\futil.py)

find function named ensure_open and edit some code.
fin = f
if isinstance(f, basestring):
    if len(f) <= MAX_LEN_FILEPATH:
        f = find_filepath(f) or f
        if f and (not hasattr(f, 'seek') or not hasattr(f, 'readlines')):
            if f.lower().endswith('.gz'):
                return gzip.open(f, mode=mode,encoding='UTF-8')
            return open(f, mode=mode,encoding='UTF-8')
        f = fin  # reset path in case it is the text that needs to be opened with StringIO
    else:
        f = io.StringIO(f)
elif f and getattr(f, 'closed', None):
    if hasattr(f, '_write_gzip_header'):
        return gzip.open(f.name, mode=mode,encoding='UTF-8')
    else:
        return open(f.name, mode=mode,encoding='UTF-8')
return f
I just added ",encoding='UTF-8'" when open() comes out.

Unfortunately that prints this error:
File "gensim/_matutils.pyx", line 1, in init gensim._matutils
ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject

danielgran · 2021-05-16T05:26:33Z

It is a UnicodeDecodeError.
Maybe this example can make some help for understanding about the error.
file = open(filename, encoding="utf8")
For a solution, find futil.py file installed in your computer. (d:\python\lib\site-packages\nlpia\futil.py)
find function named ensure_open and edit some code.
fin = f
if isinstance(f, basestring):
    if len(f) <= MAX_LEN_FILEPATH:
        f = find_filepath(f) or f
        if f and (not hasattr(f, 'seek') or not hasattr(f, 'readlines')):
            if f.lower().endswith('.gz'):
                return gzip.open(f, mode=mode,encoding='UTF-8')
            return open(f, mode=mode,encoding='UTF-8')
        f = fin  # reset path in case it is the text that needs to be opened with StringIO
    else:
        f = io.StringIO(f)
elif f and getattr(f, 'closed', None):
    if hasattr(f, '_write_gzip_header'):
        return gzip.open(f.name, mode=mode,encoding='UTF-8')
    else:
        return open(f.name, mode=mode,encoding='UTF-8')
return f
I just added ",encoding='UTF-8'" when open() comes out.
Unfortunately that prints this error:
File "gensim/_matutils.pyx", line 1, in init gensim._matutils
ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject

ah, nvm. this fixes it. thank you!

hsang · 2021-06-12T20:19:37Z

It is a UnicodeDecodeError.
Maybe this example can make some help for understanding about the error.

file = open(filename, encoding="utf8")

For a solution, find futil.py file installed in your computer. (d:\python\lib\site-packages\nlpia\futil.py)

find function named ensure_open and edit some code.
fin = f
if isinstance(f, basestring):
    if len(f) <= MAX_LEN_FILEPATH:
        f = find_filepath(f) or f
        if f and (not hasattr(f, 'seek') or not hasattr(f, 'readlines')):
            if f.lower().endswith('.gz'):
                return gzip.open(f, mode=mode,encoding='UTF-8')
            return open(f, mode=mode,encoding='UTF-8')
        f = fin  # reset path in case it is the text that needs to be opened with StringIO
    else:
        f = io.StringIO(f)
elif f and getattr(f, 'closed', None):
    if hasattr(f, '_write_gzip_header'):
        return gzip.open(f.name, mode=mode,encoding='UTF-8')
    else:
        return open(f.name, mode=mode,encoding='UTF-8')
return f
I just added ",encoding='UTF-8'" when open() comes out.

Thanks, it works!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 360: character maps to <undefined> #34

UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 360: character maps to <undefined> #34

narasimha1805 commented Apr 19, 2020 •

edited

woo9904 commented Dec 2, 2020

danielgran commented May 16, 2021

danielgran commented May 16, 2021

danielgran commented May 16, 2021

hsang commented Jun 12, 2021

UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 360: character maps to <undefined> #34

UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 360: character maps to <undefined> #34

Comments

narasimha1805 commented Apr 19, 2020 • edited

woo9904 commented Dec 2, 2020

danielgran commented May 16, 2021

danielgran commented May 16, 2021

danielgran commented May 16, 2021

hsang commented Jun 12, 2021

narasimha1805 commented Apr 19, 2020 •

edited