Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use MODSReader on string-Objects #15

Open
alexander-winkler opened this issue Oct 29, 2020 · 1 comment
Open

use MODSReader on string-Objects #15

alexander-winkler opened this issue Oct 29, 2020 · 1 comment

Comments

@alexander-winkler
Copy link

Hello!

I'm trying to apply the MODSReader not to a xml-file (as in the examples provided) but rather on requests.get-responses I've tried transforming the xml-string into a file-like object using io.StringIO (which would be the usual way to deal with the issue in etree, I guess), but I'm getting a ValueError:

  File "mods_parse.py", line 6, in <module>
    MODSReader(io.StringIO(request_opac("pica.sys=j2017").text))
  File "/home/alex/.local/lib/python3.6/site-packages/pymods/reader.py", line 58, in __init__
    super(MODSReader, self).__init__(file_location, '{0}mods'.format(NAMESPACES['mods']), parser=mods_parser)
  File "/home/alex/.local/lib/python3.6/site-packages/pymods/reader.py", line 27, in __init__
    self.iterator = parse(file_location, parser=parser).iter(iter_elem)
  File "/home/alex/.local/lib/python3.6/site-packages/pymods/reader.py", line 8, in parse
    return etree.parse(source, parser=parser)
  File "src/lxml/etree.pyx", line 3469, in lxml.etree.parse
  File "src/lxml/parser.pxi", line 1856, in lxml.etree._parseDocument
  File "src/lxml/parser.pxi", line 1871, in lxml.etree._parseMemoryDocument
ValueError: Unicode strings with encoding declaration are not supported. Please use bytes input or XML fragments without declaration.

Could you suggest me a way to pipe the xml-string directly into the parser?

Thank you very much!

@mrmiguez
Copy link
Owner

Hi Alexander,

I've always meant to make the pymods parser more flexible to various inputs. My workflows have always involved local XML files, so I never got around to implementing that feature. I'm happy to hear that someone else is using pymods, so that will bump the priority up for me a bit.

My family just welcomed our first child recently, so unfortunately I don't have much time to work on this at the moment. If you're comfortable submitting a PR implementing string parsing, I'll consider it for merging. Otherwise, it might be a little bit until I'm back in the office and ready to spend time on this.

If you need a short-term solution, you can write out requests.get(<your request url>).text and pass that to the parser. If you're working with OAI-PMH data, I've had a lot of success with with Mark Phillips' pyoaiharvester. It's a python2 utility, but it's very helpful at getting OAI-PMH data where you can use it.

Best,
-MM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants