Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

t-SNE package does not seem to work with Spark 2.1 #7

Open
kartha01 opened this issue May 31, 2017 · 7 comments
Open

t-SNE package does not seem to work with Spark 2.1 #7

kartha01 opened this issue May 31, 2017 · 7 comments

Comments

@kartha01
Copy link

Hi,

Looks like the t-SNE package does not work with Spark 2.1. After importing the com.github.saurfang.* package, a simple method to compute mean etc fails:

import org.apache.spark.mllib.linalg.Vectors
import org.apache.spark.mllib.stat.{MultivariateStatisticalSummary, Statistics}

val observations = sc.parallelize(
  Seq(
    Vectors.dense(1.0, 10.0, 100.0),
    Vectors.dense(2.0, 20.0, 200.0),
    Vectors.dense(3.0, 30.0, 300.0)
  )
)

// Compute column summary statistics.
val summary: MultivariateStatisticalSummary = Statistics.colStats(observations)
println(summary.mean)  // a dense vector containing the mean value for each column
println(summary.variance)  // column-wise variance
println(summary.numNonzeros)  // number of nonzeros in each column

and that fails with:

Name: Compile Error
Message: <console>:50: error: type mismatch;
 found   : org.apache.spark.rdd.org.apache.spark.rdd.org.apache.spark.rdd.org.apache.spark.rdd.org.apache.spark.rdd.RDD[org.apache.spark.mllib.linalg.Vector]
 required: org.apache.spark.rdd.org.apache.spark.rdd.org.apache.spark.rdd.org.apache.spark.rdd.org.apache.spark.rdd.RDD[org.apache.spark.mllib.linalg.Vector]
       val summary: MultivariateStatisticalSummary = Statistics.colStats(observations)
                                                                         ^
StackTrace: 

Any thoughts on resolving this?

Thanks,
Rajesh

@erwinvaneijk
Copy link
Contributor

I've looked into this, and am currently working on a patch.

@kartha01
Copy link
Author

kartha01 commented Jun 2, 2017

Thanks for replying Erwin.
That is great! please let me know how it goes, glad to test it out.

Regards,
Rajesh

@erwinvaneijk
Copy link
Contributor

erwinvaneijk commented Jun 4, 2017 via email

@kartha01
Copy link
Author

kartha01 commented Jun 8, 2017

Thanks EJ, will try your patch out and let you know.

-Rajesh

@kartha01
Copy link
Author

kartha01 commented Jun 9, 2017

Wonder by the X2Helper.scala resides in the org/apache/spark/mllib package, is that really the need - any thoughts?
I am not yet sure if that could be causing some issues in our environment. The MNIST example in the code seems to throw the same old error.

@erwinvaneijk
Copy link
Contributor

Hi Rajesh - no idea, but it shouldn't give you the message. Your exact code is in the new test, so there's probably something else wrong in your code or build. I'll take a look if you want?

@kartha01
Copy link
Author

kartha01 commented Jun 13, 2017

Thanks Erwin.

I was trying to run it in the cloud env and gave up. Now I am trying it on a regular Hadoop+Spark 2.0 cluster with spark-shell and I while running the MNIST example, I am getting:

java.lang.NoSuchMethodError: breeze.linalg.DenseMatrix$.ones$mDc$sp(IILscala/reflect/ClassTag;Lbreeze/storage/Zero;Lbreeze/math/Semiring;)Lbreeze/linalg/DenseMatrix;
  at com.github.saurfang.spark.tsne.impl.BHTSNE$.tsne(BHTSNE.scala:38)
  ... 92 elided

The breeze jars that I have in my Spark2.0 instance are:
breeze_2.11-0.11.2.jar
breeze-macros_2.11-0.11.2.jar

Wonder if there is any specific jar that I need to pick up for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants