-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ArgumentError: invalid byte sequence in UTF-8 #7
Comments
what kind of document is that? Plaintext or CSV? In that case it's quite clear because the |
@jkraemer In this particular case its a MS Word document. Also happens with PDF. |
Got this with a tiff file as well:
|
would you mind providing sample files to reproduce these errors (the utf8 error as well as the tesseract error)? |
When I try to parse a document with non UTF-8 characters I get this:
irb(main):283:0> content = Plaintext::Resolver.new(file, document_file.upload_content_type).text
ArgumentError: invalid byte sequence in UTF-8
from /var/deploy/sdm/web_head/shared/bundle/ruby/2.1.0/gems/activesupport-4.1.7/lib/active_support/multibyte/chars.rb:172:in
codepoints' from /var/deploy/sdm/web_head/shared/bundle/ruby/2.1.0/gems/activesupport-4.1.7/lib/active_support/multibyte/chars.rb:172:in
compose'from /var/deploy/sdm/web_head/shared/bundle/ruby/2.1.0/gems/plaintext-0.1.0/lib/plaintext/resolver.rb:37:in
text' from (irb):283 from /var/deploy/sdm/web_head/shared/bundle/ruby/2.1.0/gems/railties-4.1.7/lib/rails/commands/console.rb:90:in
start'from /var/deploy/sdm/web_head/shared/bundle/ruby/2.1.0/gems/railties-4.1.7/lib/rails/commands/console.rb:9:in
start' from /var/deploy/sdm/web_head/shared/bundle/ruby/2.1.0/gems/railties-4.1.7/lib/rails/commands/commands_tasks.rb:69:in
console'from /var/deploy/sdm/web_head/shared/bundle/ruby/2.1.0/gems/railties-4.1.7/lib/rails/commands/commands_tasks.rb:40:in
run_command!' from /var/deploy/sdm/web_head/shared/bundle/ruby/2.1.0/gems/railties-4.1.7/lib/rails/commands.rb:17:in
<top (required)>'from bin/rails:8:in
require' from bin/rails:8:in
The text was updated successfully, but these errors were encountered: