Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add --locale option to specify fallback encoding to decode data_in with #553

Open
eMPee584 opened this issue Mar 24, 2024 · 1 comment
Open
Labels
enhancement New feature or request

Comments

@eMPee584
Copy link

When stating a directory with some latin1-encoded filenames, I was getting this error:

  Traceback (most recent call last):
    File "/usr/bin/jc", line 33, in <module>
      sys.exit(load_entry_point('jc==1.25.1', 'console_scripts', 'jc')())
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/lib/python3/dist-packages/jc/cli.py", line 965, in main
      JcCli().run()
    File "/usr/lib/python3/dist-packages/jc/cli.py", line 947, in run
      self._run()
    File "/usr/lib/python3/dist-packages/jc/cli.py", line 911, in _run
      self.standard_parse_and_print()
    File "/usr/lib/python3/dist-packages/jc/cli.py", line 803, in standard_parse_and_print
      self.create_normal_output()
    File "/usr/lib/python3/dist-packages/jc/cli.py", line 755, in create_normal_output
      self.data_out = self.parser_module.parse(
                      ^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/lib/python3/dist-packages/jc/parsers/stat.py", line 234, in parse
      jc.utils.input_type_check(data)
    File "/usr/lib/python3/dist-packages/jc/utils.py", line 460, in input_type_check
      raise TypeError("Input data must be a 'str' object but is %s.", [str(type(data))])
  TypeError: ("Input data must be a 'str' object but is %s.", ["<class 'bytes'>"])

After figuring out that piping stat's output to iconv -f ISO-8859-1 would solve that but break unicode characters instead, I crudely patched my local jc with this hack:

diff --git jc/cli.py jc/cli.py
index 41c8358d..62c63cf3 100644
--- jc/cli.py
+++ jc/cli.py
@@ -792,7 +792,7 @@ class JcCli():
             if isinstance(self.data_in, bytes):
                 self.data_in = self.data_in.decode('utf-8')
         except UnicodeDecodeError:
-            pass
+            self.data_in = self.data_in.decode('iso-8859-1')
 
         self.slicer()

I looked into adding a generic --locale option, but as I'm not familiar with the jc code base and there is no other option taking a parameter yet, I'm filing this report instead..

P.S.: .. ouch
just realized that now I'll have to invoke a separate jc instance for every single file, which incurs heavy interpreter overhead .. also tried to speed that up with pypy3 but that actually was worse even because very little computation takes place. Ah meh, at least it works™ .. "just" wanted to recursively back up a partition's ctime values.. 😅💦

@kellyjonbrazil
Copy link
Owner

kellyjonbrazil commented Mar 24, 2024

Thanks for reporting this. I’ll have to think about how this might be addressed in a general fashion.

For cases like these it might make sense to use jc as a python library and create a quick python script to do what you want. (import jc)

The stat parser also comes in a streaming output option (python generator) that can help with memory utilization. (stat_s)

@kellyjonbrazil kellyjonbrazil added the enhancement New feature or request label Apr 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants