You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was getting the following error that was fixed by installing pdfminer.six instead of pdfminer.
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/fmheu/miniconda/lib/python3.7/multiprocessing/pool.py", line 121, in worker
result = (True, func(*args, **kwds))
File "/home/fmheu/miniconda/lib/python3.7/multiprocessing/pool.py", line 44, in mapstar
return list(map(*args))
File "analyze_papers.py", line 157, in article_worker
pdf_result, text, pdf_log = process_pdf(metadata)
File "analyze_papers.py", line 116, in process_pdf
original_page_count, pages = pdf_to_text_list(first_pdf)
File "analyze_papers.py", line 36, in pdf_to_text_list
pages = layout_scanner.get_pages(file_loc, images_folder=None) # you can try os.path.abspath("output/imgs")
File "/home/fmheu/git/citation_map/layout_scanner.py", line 214, in get_pages
return with_pdf(pdf_doc, _parse_pages, pdf_pwd, *tuple([images_folder]))
File "/home/fmheu/git/citation_map/layout_scanner.py", line 37, in with_pdf
result = fn(doc, *args)
File "/home/fmheu/git/citation_map/layout_scanner.py", line 204, in _parse_pages
interpreter.process_page(page)
File "/home/fmheu/miniconda/lib/python3.7/site-packages/pdfminer/pdfinterp.py", line 841, in process_page
self.render_contents(page.resources, page.contents, ctm=ctm)
File "/home/fmheu/miniconda/lib/python3.7/site-packages/pdfminer/pdfinterp.py", line 854, in render_contents
self.execute(list_value(streams))
File "/home/fmheu/miniconda/lib/python3.7/site-packages/pdfminer/pdfinterp.py", line 869, in execute
name = keyword_name(obj).decode('ascii')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa0 in position 1: ordinal not in range(128)
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "analyze_papers.py", line 243, in <module>
result = pool.map(list_worker, list(titles_dict.items()), chunksize=5)
File "/home/fmheu/miniconda/lib/python3.7/multiprocessing/pool.py", line 268, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/home/fmheu/miniconda/lib/python3.7/multiprocessing/pool.py", line 657, in get
raise self._value
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa0 in position 1: ordinal not in range(128)
The text was updated successfully, but these errors were encountered:
Thanks for this nice tool.
I was getting the following error that was fixed by installing
pdfminer.six
instead ofpdfminer
.The text was updated successfully, but these errors were encountered: