Skip to content

Conversation

Funbucket
Copy link

Summary

This PR resolves a runtime error when using pdf2zh on certain PDF files, specifically related to incorrect access of ncs and scs attributes within the PDFPageInterpreterEx class.

When running the following command on macOS with Python 3.9 and LibreSSL:

pdf2zh "[Reading] Exploring the spillover effects of WeChat using graphical model.pdf" -li en -lo ko -s ollama

The following error is raised:

Traceback (most recent call last):
  File ".../pdf2zh", line 8, in <module>
    sys.exit(main())
  File ".../pdf2zh.py", line 264, in main
    translate(model=ModelInstance.value, **vars(parsed_args))
  File ".../high_level.py", line 355, in translate
    s_mono, s_dual = translate_stream(
  File ".../high_level.py", line 215, in translate_stream
    obj_patch: dict = translate_patch(fp, **locals())
  File ".../high_level.py", line 156, in translate_patch
    interpreter.process_page(page)
  File ".../pdfinterp.py", line 269, in process_page
    ops_base = self.render_contents(page.resources, page.contents, ctm=ctm)
  File ".../pdfinterp.py", line 299, in render_contents
    return self.execute(list_value(streams))
  File ".../pdfinterp.py", line 345, in execute
    targs = func()
  File ".../pdfinterp.py", line 178, in do_scn
    if self.ncs:
AttributeError: 'PDFPageInterpreterEx' object has no attribute 'ncs'

Cause

The ncs (non-stroking color space) and scs (stroking color space) attributes are not defined directly on PDFPageInterpreterEx but are part of the internal graphicstate object. This structure is consistent with the original pdfminer.six design. However, pdf2zh incorrectly uses self.ncs and self.scs, resulting in an AttributeError.

Fix

  • All references to self.ncs and self.scs are replaced with self.graphicstate.ncs and self.graphicstate.scs
  • Incorrect assignments like self.ncs = interpreter.ncs in do_Do are updated to self.graphicstate.ncs = interpreter.graphicstate.ncs

Result

After this fix, pdf2zh runs successfully and processes all pages without error:

100%|████████████████████████████████████████████████████████████████████████████| 21/21 [00:06<00:00,  3.21it/s]

Environment

  • macOS (Apple Silicon)
  • Python 3.9
  • pdf2zh installed via uv
  • LibreSSL 2.8.3 (a non-blocking warning from urllib3 appears, unrelated to this PR)

Notes

  • This is a bugfix-only change with no effect on output or behavior
  • Fix aligns with upstream pdfminer.six's handling of color state (SCN, scn) logic

@Byaidu
Copy link
Owner

Byaidu commented Aug 15, 2025

We have fixed pdfminer-six==20250416, so this error is not expected to occur.

We will consider updating the version of pdfminer in the future. Thank you for the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants