Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod") when using extract_text() #117

Open
1 of 3 tasks
ercrema opened this issue Mar 28, 2020 · 1 comment
Open
1 of 3 tasks
Labels

Comments

@ercrema
Copy link

ercrema commented Mar 28, 2020

Hi,
I have a problem reading a pdf file. The document is not corrupted (e.g. pdftools::pdf_text() works fine, but I prefer tabulizer as I have some pages with double columns), and similar pdfs from the same repository does not produce the error. Thanks for developing the package!

Enrico


Please specify whether your issue is about:

  • a possible bug
  • a question about package functionality
  • a suggested code or documentation change, improvement to the code, or feature request

If you are reporting (1) a bug or (2) a question about code, please supply:

  • ensure that you can install and successfully load rJava
  • a fully reproducible example using a publicly available dataset (or provide your data)
  • if an error is occurring, include the output of traceback() run immediately after the error occurs
  • the output of sessionInfo()

Put your code here:

## rJava loads successfully
# install.packages("rJava")
library("rJava")

## load package
library("tabulizer")

## code goes here
x<-extract_text("https://sitereports.nabunken.go.jp//files/attach/21/21232/15892_1_立石遺跡+大鶴A遺跡+上揚遺跡+前畑遺跡.pdf")
traceback()
## session info for your system
sessionInfo()

The full error message is:

Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl,  : 
  java.lang.ClassCastException: class org.apache.pdfbox.cos.COSObject cannot be cast to class org.apache.pdfbox.cos.COSNumber (org.apache.pdfbox.cos.COSObject and org.apache.pdfbox.cos.COSNumber are in unnamed module of loader RJavaClassLoader @372f7a8d)

Traceback

6: stop(list(message = "java.lang.ClassCastException: class org.apache.pdfbox.cos.COSObject cannot be cast to class org.apache.pdfbox.cos.COSNumber (org.apache.pdfbox.cos.COSObject and org.apache.pdfbox.cos.COSNumber are in unnamed module of loader RJavaClassLoader @372f7a8d)", 
       call = .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", 
           cl, .jcast(if (inherits(o, "jobjRef") || inherits(o, 
               "jarrayRef")) o else cl, "java/lang/Object"), .jnew("java/lang/String", 
               method), j_p, j_pc, use.true.class = TRUE, evalString = simplify, 
           evalArray = FALSE), jobj = new("jobjRef", jobj = <pointer: 0x3bd8f90>, 
           jclass = "java/lang/ClassCastException")))
5: .jcheck(silent = FALSE)
4: .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, 
       .jcast(if (inherits(o, "jobjRef") || inherits(o, "jarrayRef")) o else cl, 
           "java/lang/Object"), .jnew("java/lang/String", method), 
       j_p, j_pc, use.true.class = TRUE, evalString = simplify, 
       evalArray = FALSE)
3: .jrcall(x, name, ...)
2: stripper$getText(pdfDocument)
1: extract_text("https://sitereports.nabunken.go.jp//files/attach/21/21232/15892_1_立石遺跡+大鶴A遺跡+上揚遺跡+前畑遺跡.pdf")

SessionInfo:

R version 3.6.3 (2020-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.4 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

Random number generation:
 RNG:     Mersenne-Twister 
 Normal:  Inversion 
 Sample:  Rounding 
 
locale:
 [1] LC_CTYPE=en_GB.UTF-8          LC_NUMERIC=C                  LC_TIME=en_GB.UTF-8          
 [4] LC_COLLATE=en_GB.UTF-8        LC_MONETARY=en_GB.UTF-8       LC_MESSAGES=en_GB.UTF-8      
 [7] LC_PAPER=en_GB.UTF-8          LC_NAME=en_GB.UTF-8           LC_ADDRESS=en_GB.UTF-8       
[10] LC_TELEPHONE=en_GB.UTF-8      LC_MEASUREMENT=en_GB.UTF-8    LC_IDENTIFICATION=en_GB.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     methods   base     

other attached packages:
[1] tabulizer_0.2.2 rJava_0.9-11   

loaded via a namespace (and not attached):
[1] tabulizerjars_1.0.1 compiler_3.6.3      tools_3.6.3         yaml_2.2.1          png_0.1-7     

@tpaskhalis tpaskhalis added the bug label Mar 30, 2020
@dbampoh
Copy link

dbampoh commented Nov 26, 2020

Did you ever figure this issue out?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants