Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Symlinking instead of copying tbb libs #161

Closed
Enchufa2 opened this issue Apr 12, 2021 · 25 comments
Closed

Symlinking instead of copying tbb libs #161

Enchufa2 opened this issue Apr 12, 2021 · 25 comments

Comments

@Enchufa2
Copy link
Member

The new TBB_INC and TBB_LIB variables are great, but I noticed that RcppParallel copies tbb's .so files into the package directory. Is this really required? This is a problem for system packaging, because RPM generates a unique ID for each binary (see this), and thus copying a shared library makes these IDs collide, and the package cannot be installed. Would it be possible to symlink TBB_LIB instead?

@kevinushey
Copy link
Contributor

That seems like a good idea to me. I need to resubmit for Solaris anyhow so I'll see if we can slipstream that into the next submission.

@kevinushey
Copy link
Contributor

I now have the following on master:

kevinushey@cdrv-2:~/r/pkg/RcppParallel [master]
$ TBB_ROOT=/usr/local/opt/tbb R CMD INSTALL --preclean .
* installing to library ‘/Users/kevinushey/Library/R/4.0/library’
* installing *source* package ‘RcppParallel’ ...
** using staged installation
** preparing to cleanup package 'RcppParallel' ...
** finished cleanup for package 'RcppParallel'
** preparing to configure package 'RcppParallel' ...
*** configured file: 'src/Makevars.in' => 'src/Makevars'
** finished configure for package 'RcppParallel'
** libs
clang++ -mmacosx-version-min=10.13 -std=gnu++11 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG -I../inst/include -I/usr/local/opt/tbb/include  -I/usr/local/include  -std=gnu++11 -DRCPP_PARALLEL_USE_TBB=1 -DTBB_SUPPRESS_DEPRECATED_MESSAGES=1 -fPIC  -g -O3 -pipe -Wall -pedantic -c init.cpp -o init.o
clang++ -mmacosx-version-min=10.13 -std=gnu++11 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG -I../inst/include -I/usr/local/opt/tbb/include  -I/usr/local/include  -std=gnu++11 -DRCPP_PARALLEL_USE_TBB=1 -DTBB_SUPPRESS_DEPRECATED_MESSAGES=1 -fPIC  -g -O3 -pipe -Wall -pedantic -c options.cpp -o options.o
(tbb) Using system (Intel/OneAPI) TBB library ...
(tbb) TBB_LIB = /usr/local/opt/tbb/lib
(tbb) TBB_INC = /usr/local/opt/tbb/include
clang++ -mmacosx-version-min=10.13 -std=gnu++11 -dynamiclib -Wl,-headerpad_max_install_names -undefined dynamic_lookup -single_module -multiply_defined suppress -L/Library/Frameworks/R.framework/Resources/lib -L/usr/local/lib -o RcppParallel.so init.o options.o -Wl,-L,/usr/local/opt/tbb/lib -Wl,-rpath,/usr/local/opt/tbb/lib -ltbb -ltbbmalloc -F/Library/Frameworks/R.framework/.. -framework R -Wl,-framework -Wl,CoreFoundation
installing via 'install.libs.R' to /Users/kevinushey/Library/R/4.0/library/00LOCK-RcppParallel/00new/RcppParallel
** R
** inst
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded from temporary location
** checking absolute paths in shared objects and dynamic libraries
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation path
* DONE (RcppParallel)
kevinushey@cdrv-2:~/r/pkg/RcppParallel [master]
$ tree ~/Library/R/4.0/library/RcppParallel/libs
/Users/kevinushey/Library/R/4.0/library/RcppParallel/libs
├── RcppParallel.so
└── tbb
    ├── libtbb.dylib -> /usr/local/opt/tbb/lib/libtbb.dylib
    ├── libtbbmalloc.dylib -> /usr/local/opt/tbb/lib/libtbbmalloc.dylib
    └── libtbbmalloc_proxy.dylib -> /usr/local/opt/tbb/lib/libtbbmalloc_proxy.dylib

1 directory, 4 files

Does that look okay?

@Enchufa2
Copy link
Member Author

LGTM!

@hsbadr
Copy link
Contributor

hsbadr commented Apr 12, 2021

@kevinushey I remember that we copy the libs to keep the version linked, avoiding breaking RcppParallel when the original files are moved, removed, or upgraded to an incompatible version. But, I do prefer the symlinks too. I suggest that we store the md5 hash of the shared files and check that before loading them. If they don't match, we continue with a warning that recommends reinstalling the package. This way, if the new version causes loading failure, the users will know that they've to reinstall and link to the new TBB libraries. Let me know if I can help with fixing the CRAN issues.

@Enchufa2
Copy link
Member Author

I suggest that we store the md5 hash of the shared files and check that before loading them. If they don't match, we continue with a warning that recommends reinstalling the package. This way, if the new version causes loading failure, the users will know that they've to reinstall and link to the new TBB libraries.

That won't happen with system builds, where the .rpm, .deb is always linked to the correct tbb version, but in general it's a good idea. In this way, you can ensure integrity in user installations too.

@kevinushey
Copy link
Contributor

RcppParallel 5.1.2 is now on CRAN, which uses symlinks by default. Let me know if you're still having trouble after that release propagates on CRAN.

@kevinushey
Copy link
Contributor

@hsbadr: would you be willing to file a separate issue for us to track the MD5 hashes of the TBB libraries, so we can detect if they have changed?

@hsbadr
Copy link
Contributor

hsbadr commented Apr 15, 2021

RcppParallel 5.1.2 is now on CRAN, which uses symlinks by default. Let me know if you're still having trouble after that release propagates on CRAN.

I don't run into those issues b/c I always reinstall the development version of RcppParallel, especially after TBB updates. However, loading RcppParallel will fail if, for example, it can't find a reference in libtbb after version upgrade. This can easily get fixed with relinking to the new TBB library with reinstalling the package.

@hsbadr: would you be willing to file a separate issue for us to track the MD5 hashes of the TBB libraries, so we can detect if they have changed?

Sure. We may also add a function to store the TBB version info (e.g., TBB_INTERFACE_VERSION) at compilation time and check the runtime version.

An example for this is rgdal, which tests the runtime version of GDAL:
https://github.com/cran/rgdal/blob/49b03833bce8ff907a06ced5be0b875eb79f88e1/R/AAA.R#L99-L100

@hsbadr
Copy link
Contributor

hsbadr commented Apr 15, 2021

@kevinushey #164 is not applicable to bundled dependencies, as @Enchufa2 mentioned. It's important to ensure integrity of the shared objects, before loading them, though. This will also inform the users if the package loading failed due to this issue.

@Enchufa2
Copy link
Member Author

RcppParallel 5.1.2 is now on CRAN, which uses symlinks by default. Let me know if you're still having trouble after that release propagates on CRAN.

It was successfully auto-updated a few hours ago and looks ok. There's a difference with the macOS installation above though, as you can see:

$ ll /usr/local/lib/R/library/RcppParallel/libs/
total 20
-rwxr-xr-x. 1 root root 19816 abr 16 02:41 RcppParallel.so
$ ll /usr/local/lib/R/library/RcppParallel/lib/
total 0
lrwxrwxrwx. 1 root root 34 abr 16 02:41 libtbbmalloc_proxy.so.2 -> /usr/lib64/libtbbmalloc_proxy.so.2
lrwxrwxrwx. 1 root root 28 abr 16 02:41 libtbbmalloc.so.2 -> /usr/lib64/libtbbmalloc.so.2
lrwxrwxrwx. 1 root root 22 abr 16 02:41 libtbb.so.2 -> /usr/lib64/libtbb.so.2

The links are in lib, instead of libs/tbb. Is this intended? As long as RcppParallel finds them, it doesn't matter much.

@kevinushey
Copy link
Contributor

That is intentional. I wanted to move the library paths, but unfortunately some packages (rstan / StanHeaders) hard-code the path to the bundled version of TBB in RcppParallel and so trying to move it breaks those packages. 😞

@hsbadr
Copy link
Contributor

hsbadr commented Apr 16, 2021

That is intentional. I wanted to move the library paths, but unfortunately some packages (rstan / StanHeaders) hard-code the path to the bundled version of TBB in RcppParallel and so trying to move it breaks those packages. 😞

I could update that in the development version of rstan/StanHeaders. But, why do you wanna move the TBB libs?

@kevinushey
Copy link
Contributor

Just to be consistent with the way R normally installs libraries; e.g. R packages normally install their libraries into the libs/<arch> folder, and I think it'd make sense to adhere to that for TBB as well (potentially in a sub-folder tbb).

@hsbadr
Copy link
Contributor

hsbadr commented Apr 16, 2021

I see. I had the impression that libs/ include the package library (e.g., <package_name>.so) while lib/ contains the other shared objects (e.g., dependencies such as TBB). For example:

rhdf5filters/lib
├── libH5Zblosc.so
├── libH5Zbz2.so
└── libH5Zlzf.so
rhdf5filters/libs
└── rhdf5filters.so

@kevinushey
Copy link
Contributor

kevinushey commented Apr 16, 2021

I think you're right actually. R-exts has something similar described in 5.8.1:

It is possible to link a shared object in package packA to a library provided by package packB under limited circumstances on a Unix-alike OS. There are severe portability issues, so this is not recommended for a distributed package.

This is easiest if packB provides a static library packB/lib/libpackB.a. (Note using directory lib rather than libs is conventional, and architecture-specific sub-directories may be needed and are assumed in the sample code below. The code in the static library will need to be compiled with PIC flags on platforms where it matters.) Then as the code from package packB is incorporated when package packA is installed, we only need to find the static library at install time for package packA. The only issue is to find package packB, and for that we can ask R by something like (long lines broken for display here)

This is for a static library, but the same logic holds for shared libraries I would think.

@hsbadr
Copy link
Contributor

hsbadr commented Apr 16, 2021

That makes sense to me. R automatically builds the NeedsCompilation package library into libs/ while lib$(ARCH)/ is the default for other shared objects.

@eddelbuettel
Copy link
Member

eddelbuettel commented Apr 16, 2021

@hsbadr can you check, or did you check, whether rhdf5filters is using either one or both of a) explicit dlopen() (which would surprise me) or b) explicit export declarations as described in Writing R Extension, Section 5.4 "Registering native routines"?

Also note that WRE 5.8 "Linking to other packages" says

It is not in general possible to link a DLL in package packA to a DLL
provided by package packB (for the security reasons mentioned in *note
dyn.load and dyn.unload::, and also because some platforms distinguish
between shared objects and dynamic libraries), but it is on Windows.

@hsbadr
Copy link
Contributor

hsbadr commented Apr 16, 2021

@eddelbuettel It relies on setting the HDF5_PLUGIN_PATH environment variable on load:
https://github.com/grimbough/rhdf5filters/blob/master/R/zzz.R#L2-L6

Setting the environment variable HDF5_PLUGIN_PATH to this value will allow other applications that require the the filters to use the versions distributed with this package.

Does this answer your question?

@eddelbuettel
Copy link
Member

eddelbuettel commented Apr 16, 2021

I think so. Because it points to another plugin mechanism used "with hdf5" roughly similar to what R itself uses (i.e. my a) above).

I still think the generic ability to mix and match shared library from a shared library package is still out of reach. Many of have looked for this and at this issue and I think the best approach we know is to feed static libraries to the shared per-package library when it is being built. But I could of course be overly pessimistic...

@Enchufa2
Copy link
Member Author

Enchufa2 commented May 4, 2021

The automatic build of v5.1.3 from 19 hours ago again has the tbb libs copied into the package, instead of symlinked. :-\

@kevinushey
Copy link
Contributor

Ugh. Caused by 53bb834. Which was intended to work around r-lib/devtools#2343.

@kevinushey
Copy link
Contributor

@kevinushey
Copy link
Contributor

@Enchufa2, are you able to use the patch from 2f71e52#diff-6c04d3704e1e2e4b906b5d456931358fc0128147d57b0cdfe8a56614c590d5faR20-R22 as a stop-gap fix? I'd prefer to avoid re-submitting RcppParallel to CRAN yet again ...

@Enchufa2
Copy link
Member Author

Enchufa2 commented May 6, 2021

Is there something very important in releases 5.1.3 or 5.1.4? I'd say it's not, but just to confirm. I removed the automatic builds for 5.1.3 and 5.1.4 and left 5.1.2 in the repo, because it's easier for me. But otherwise, I can make an ad-hoc 5.1.4 build with this patch, yes.

@kevinushey
Copy link
Contributor

If it's easier to just use 5.1.2 then I'd recommend doing that -- 5.1.3 and 5.1.4 just work around various bugs with Solaris / older Windows R.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants