Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Remove dependency on scipy #256

Open
marcio-ao opened this issue Jun 7, 2017 · 26 comments
Open

RFC: Remove dependency on scipy #256

marcio-ao opened this issue Jun 7, 2017 · 26 comments

Comments

@marcio-ao
Copy link
Contributor

marcio-ao commented Jun 7, 2017

Hello,

I have been looking into simplying the build process for Cura. At present, it is quite difficult to build Cura from scratch. One large part of this is the requirement for Scipy, which not only is an enormous library, but requires, among other things, a Fortran compiler.

I have investigated the problem and have identified there are only a handful of places where Cura uses scipy. These are:

  • Uranium/UM/Math/Polygon.py
  • Uranium/UM/Mesh/MeshData.py
  • Uranium/plugins/Tools/ScaleTool/ScaleTool.py

The ScaleTool uses "scipy.linalg.svd", but this function is now provided by numpy, so could trivially be replaced with "numpy.linalg.svd".

The Polygon and MeshData files rely on "scipy.spatial.ConvexHull" and I can see two possible ways to eliminate this dependency:

  • Implement a Convex hull algorithm in pure Python.
  • Use Pyhull, which is based on QHull, the exact same underlying code as scipy's ConvexHull.

Obviously option 1 would perform less efficiently, but would not require a compilation step. Option 2 still requires compilation, but it would be significantly less painful than compiling the entirety of Scipy.

Please let me know what your thoughts are on this and whether, if we make this change, you would be willing to accept a pull request for it. This is not good change for to make unless you are on board and can assure us that additional dependencies on Scipy would not be added in the future.

Thank you,

-- Marcio

@awhiemstra
Copy link
Contributor

One of the reasons for starting to depend on SciPy ~1 year ago is that we needed a fast convex hull algorithm done in C. Back then, I evaluated some options but in the end chose SciPy because it was the only one that was clearly being maintained. If someone stepped up to maintain PyHull again we could evaluate switching.

@marcio-ao
Copy link
Contributor Author

I understand your reasoning, but it seems to me as if a ConvexHull algorithm is not something that is prone to evolve over time, so it would seem as if the lack of updates to PyHull would not necessarily be a red flag.

Anyhow, since we are encountering significant issues building Scipy for Windows (related to Fortran), I have begun to investigate building with alternative libraries. I am looking to refactor the calls to ConvexHull into a separate "ConvexHull.py" file, which could then can be customized as needed to use whatever libraries may be available (one option could be to provide a straight Python implementation as a fallback, although I have not yet evaluated the performance impact of doing so).

One option I will consider is whether it makes sense for us to become maintainers for PyHull. This may prove to be a better alternative than working through the problems with SciPy.

Thank you for the feedback!

-- Marcio

@nallath
Copy link
Member

nallath commented Jun 8, 2017

We really need the some form of linalg.svd. Getting the right scale & rotation from a matrix is not possible otherwise. If Numpy provides the same results, i don't have any qualms with switching.

If someone is going to pick that up, make very very sure that rotating an object (say 20 deg on one axis) and then scaling on one axis works. That was the problem that was fixed by introducing the singular value decomposition.

As for using a codebase that isn't getting any updates; I don't see a very big issue with that. Calculating a hull is something that won't change much over time. PyHull is also pretty limited in what it does (especially compared to scipy), so that could explain the lack of updates.

@awhiemstra
Copy link
Contributor

awhiemstra commented Jun 8, 2017

While the algorithm for a convex hull does not change much, the environment this algorithm lives in does change. In case of Python modules, there are Python version changes that need to be kept up with. The change from Python 3.4 to Python 3.5 required quite some modules to adapt to the new version, for example. That is why unmaintained Python modules are a problem.

And yes, the situation with SciPy on Windows is ugly, because there is no proper free Fortran compiler that supports the same ABI as MSVC and Python does not build without MSVC. I tried for quite some time to get things to build on Windows and while NumPy can be achieved, SciPy just cannot be built right now.

@marcio-ao
Copy link
Contributor Author

marcio-ao commented Jun 8, 2017

The concern about the Python binding changes makes sense, which is one thing in favor of using a Python implementation whenever possible rather than relying on C code. Although for a 3D mesh, I do suspect that may be impractical. I found a Python-only ConvexHull implementation and I may drop it in just to get a sense of how it performs, although my expectations are not very high.

For the time being, I have been able to write a shim that allows Cura to compile against Pyhull with no code modifications. It does so by implementing a "fake" scipy module and routing the ConvexHull calls to Pyhull. So far seems to work, although since I do not know all the places that Cura relies on ConvexHulls, I may not have tested all the corner cases.

A third possibility may be to only pull the required code out of scipy and make a "scipyLite" just for Cura. I initially started with this approach, but ran into some difficulties breaking the code apart (although this probably has more to do with my relative inexperience with Python; I've only been using it for a few weeks). I might revisit that approach, as it probably the best way to benefit from whatever work is being done upstream in scipy, while avoiding the nightmare of Fortran.

@nallath
Copy link
Member

nallath commented Jun 9, 2017

I think pulling it apart is going to be a big challenge.

As for the python only hull; Don't get your hopes up. I don't think it will perform well at all.

There is another option that we could try; Find a light c++ lib that can handle convex hulls and create some bindings (sip?)

@marcio-ao
Copy link
Contributor Author

As for C implementations, the QHull authors write "If you need a short code for convex hull, Delaunay triangulation, or Voronoi volumes consider Clarkson's hull program."

My initial attempts with a Python-only implementation were not promising. The code I found works okay for 3D printing simple polyhedron -- anything more than that and I gave up waiting. It's a bit surprising how slowly it performed; it's possible the implementation just needs some tuning, or possibly it's not so much python at fault, but that I need to try a better algorithm, as the one I found may be exponential with regard to vertex count.

@marcio-ao
Copy link
Contributor Author

FYI, I created a minor pull request regarding the convex hull code. The pull request resolves an issue I ran into when testing alternative ConvexHull algorithms.

#257

@awhiemstra
Copy link
Contributor

I am not surprised the Python implementation is slow. My experience with Python is that it performs pretty badly when iterating over large lists, even if it is just a linear progression. So unless you can push things to C/C++ with libraries like Numpy, things are not going to perform too well.

@nallath
Copy link
Member

nallath commented Jun 12, 2017

Math related things (in my experience) are easily 10 times as fast in C++ compared to python.

@marcio-ao
Copy link
Contributor Author

I am a relatively new Python developer, so I do not know the ins and outs of optimizing Python, but I've done quite a bit of 3D and VR programming in Javascript. My experience there taught me that generally things can be made very fast, but memory allocation and garbage collection are the big killers. Most high level languages such as Javascript (and I presume Python) make it easy to allocate small objects, which is where a lot of code gets tripped up, with things like allocating vectors as objects, for example. Hence I believe that it may be possible to come up with an algorithm that performs fast, but it may require odd programming tricks. C and C++ make it easier to write fast code.

@marcio-ao
Copy link
Contributor Author

marcio-ao commented Jun 13, 2017

BTW, I received the following notification on GitHub:

"Note that we already have a fallback for computing the convex hull based on Numpy: https://github.com/Ultimaker/Uranium/blob/master/UM/Math/Polygon.py#L327"

I can't find the original thread online, so I will reply here.

Yes, this is true. Polygon.py has a 2D implementation of a ConvexHull. However, MeshData.py relies on Scipy for computing the hull of a 3D mesh. There is no alternative implementation there.

That said, it appeared to me as if Cura was only relying on the ConvexHull to determine placement of objects on the bed, and that as such the 3D convex hull appears to get flattened onto a plane at some point in the code. As a hack, I did try replacing the 3D algorithm in MeshData.py with an implementation that simply ignored the Z coordinate and performed a 2D convex hull on the points. As far as I could tell, it worked well enough for Cura to run and performed well, even using a Python implementation.

So one question may be whether Cura really needs a full 3D convex hull at all.

@fieldOfView
Copy link
Contributor

As far as I remember, @sedwards2009 should be able to tell you all about the switch from the 2d python implementation to the 3d scipy implementation.

@awhiemstra
Copy link
Contributor

We also use the convex hull to calculate the bounding box, I believe. Since the bounding box is 3D we need a 3D convex hull for that.

@marcio-ao
Copy link
Contributor Author

You could just as easily create a bounding box from a non-convex hull. Just a few max/min operations on the points.

@marcio-ao
Copy link
Contributor Author

Furthermore, I noticed Cura only uses the points and vertices array from the ConvexHull object, not the faces (or simplex info, as scipy likes to call it). Those two arrays just give you a point cloud in 3D and is not sufficient to recreate a ConvexHull mesh. So it begs the question of why Cura is performing such a complex operation only to discard much of the useful information.

@nallath
Copy link
Member

nallath commented Jun 14, 2017

You can; but it's just way more faster to do it with the convex hull.

The reason we create a convex hull is twofold;

  1. We need a quick way to find the "shadows" that an object drop on the build plate (you don't need points for that).
  2. We need to find the bounding box. As we can just use the convex hull verts with some clever min max magic, we again don't need the faces.

The reason why we do this is because we need to recompute the bounding box & shadow after every transformation. If we were to do this with all points, this gets expensive real fast. So instead we calculate a convex hull on load (and accept a hit there) and use the convex hull to recompute.

@marcio-ao
Copy link
Contributor Author

So, if I understand this correctly, the ConvexHull is simply used as a decimation routine to generate a smaller point cloud for subsequent shadow and bounding box operations? I suppose if the ConvexHull operation didn't involve bringing in the dependency of scipy, I would agree that this is a clever solution.

This does bring up the possibility that there may be other algorithms that may give better results than a ConvexHull for this purpose. It sounds like the shadows being a hull is not a requirement, but a side effect of the choice of algorithms. I can tell you that the current solution does lead to problems. I've heard one complaint, for example, that in multi-part prints, Cura does not allow objects to be placed within nooks and crannies of another. This is a symptom of using the convex hull to compute overlap on the bed.

It would seem like a general mesh decimation routine could be better suited for this purpose. Since there is no requirement that the decimated points be re-assembled into a mesh, such a decimation could be simplified over standard algorithms that need to keep track of face connectivity. Such an algorithm might even be fast enough to run in Python.

-- Marcio

@marcio-ao
Copy link
Contributor Author

marcio-ao commented Jun 14, 2017

Incidentally, one place where it occurs to me the 3D ConvexHull faces would be useful is in figuring out how to lay an object flat. It would merely be the case of choosing a face from the hull closest to the bed and using that face to level the object.

In so far as the current version of Cura does not read face information from the ConvexHull, could someone explain to me how the "Lay flat" feature currently works?

@awhiemstra
Copy link
Contributor

awhiemstra commented Jun 14, 2017

This does bring up the possibility that there may be other algorithms that may give better results than a ConvexHull for this purpose.

It might, however the convex hull guarantees the outer boundary of the object stays intact, which is why the convex hull is also suitable for calculating the bounding box. Using a different algorithm may not have the same guarantee so you might end up with an incorrect bounding box.

This is a symptom of using the convex hull to compute overlap on the bed.

Actually, no, it is a symptom of our collision code requiring convex objects to properly calculate collision responses.

Such an algorithm might even be fast enough to run in Python.

Highly unlikely. Remember that we need to be able to deal with meshes with >1 million vertices. This is going to be slow in Python, no matter what kind of operation you do per vertex.

@nallath
Copy link
Member

nallath commented Jun 15, 2017

We use convex objects for collision detection because those are faster (in case you were wondering about that one :))

@marcio-ao
Copy link
Contributor Author

It sounds like there are good reasons for leaving a 3D ConvexHull code in Cura. I guess the only question right now would be whether there would be a way to have Cura not depend on Scipy for that implementation. Scipy's ConvexHull is merely a wrapper around the C++ QHull library, which is not hard to compile in itself. Would it be possible to add the Python wrappers directly to Cura so that the only external dependency is on QHull?

Perhaps the bindings from Pyhull could be used as a launching point. I have already managed to run Cura using Pyhull, but at the moment I build the Pyhull module on its own, and then add some glue code to fake the scipy.spatial module and redirect those calls to the the Pyhull module. There is a slight inefficiency in doing so since Pyhull uses an array of tuples rather than Numpy arrays so I had to do the conversion in a Python loop. It seems like the best solution overall would be for Cura to have its own bindings for QHull.

@hroncok
Copy link
Contributor

hroncok commented Jun 21, 2018

Was this ever solved?

@nallath
Copy link
Member

nallath commented Jul 24, 2018

Nope. It's somewhere far down on our priority list.

@Ghostkeeper
Copy link
Collaborator

This will become somewhat easier now that we have a dependency on Trimesh, which also provides these features.

@fieldOfView
Copy link
Contributor

Except that Trimesh has a dependency on... scipy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants