Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A way to exclude a individual files from a module/package #8128

Open
carlosperate opened this issue Nov 26, 2023 · 3 comments
Open

A way to exclude a individual files from a module/package #8128

carlosperate opened this issue Nov 26, 2023 · 3 comments
Labels
feature Feature request

Comments

@carlosperate
Copy link

carlosperate commented Nov 26, 2023

Is your feature request related to a problem? Please describe.
In my specific use case I need to add a Python package as a hidden import, which pulls all the files from the package. This particular packages come with a large zip file that is not needed for my application, so I would like to exclude it from the build.

So, I was looking for a way to "exclude" individual files. The excludes option is for Python packages/modules, so that won't work for individual files.

This kind of feature would also help in situations like this, where a lot of .so files (maybe dlls as well?) are included, which might not be needed:

Describe the solution you'd like
Something silimar to the --exclude-module flag or Analysis(excludes=...) option for the spec files but for individual files. Perhaps a --exclude-file/Analysis(exclude-files=...) version?

Describe alternatives you've considered
At the moment a workaround can be to manipulate the output of a = Analysis(...) in the spec file, where we can create a new tuple without some of the created entries and used that to replace a.datas.

Additional context
N/A.

@carlosperate carlosperate added feature Feature request triage Please triage and relabel this issue labels Nov 26, 2023
@rokm
Copy link
Member

rokm commented Nov 26, 2023

Yeah, it'd be nice to have an --exclude-file (and maybe rename/alias the --exclude to --exclude-module). It's one of the things on my list of things to look into, although it's not a high-priority one, since you can already achieve the same by post-processing a.binaries and a.datas as you described.


That said, in your specific case, if a hidden import pulls in an unnecessary large zip file, this means that we have a hook for that package (either here, or in contributed hooks repository) that does this collection, and should be optimized. I.e., either the data collection part should be split into a sub-hook for the module that actually requires that data file (if any), or the data collection part should be optimized a bit.

So, can you tell us which package it is, and which is the problematic zip file?

@rokm rokm removed the triage Please triage and relabel this issue label Nov 26, 2023
@bwoodsend
Copy link
Member

I'm for this, but there's some things to consider:

  1. Do you give --exclude-file the source path or the destination path? The source path is more intuitive but it depends on the location of the Python environment's site-packages directory so it would be a menace to script or use in CI/CD or even just to explain to someone else how to build your application. Maybe the parameter shouldn't even be a filename – something like --exclude-file=numpy.testing:some-unwanted-data-file.txt although we'd need a variant that does use filenames for the Linux users wanting to exclude certain system packages?
  2. What do we do if there's an overlap between --exclude-file and --add-data? Do we feel obliged to diligently track the order of the arguments and create a gitignore-style hierarchal file filter or can we get away with just collecting all the data files into a list and then throwing out anything matching an exclude pattern?
  3. How do we handle case sensitivity? Presumably we match the OS and use case insensitive on Windows and macOS and case sensitive everywhere else? That's always a joyless processes...
  4. Presumably, we'll need glob support for packages with lots of files? We'll have the choice of using glob.glob() or pathlib.Path().glob() – neither of which are implementations that match UNIX shell style globbing. (I'm not looking forward to the invalid bug report that boils down to the user not quoting their glob.)
  5. Should an --exclude-file=/some/directory affect Python files added as data files (e.g. to keep torch happy)? I'd say no but that's not how the code is likely to organically work out.

@rokm
Copy link
Member

rokm commented Nov 26, 2023

My current idea is that --exclude-file should take destination file paths (or rather, patterns). As you noted, source paths may vary between environments, but destination path should be invariant. It is also what the user sees in the generated frozen application bundle (unless they are using onefile). Furthermore, this gives us a tool to manually resolve potential duplicates caused by user explicitly adding a binary via --add-binary and the same binary being collected via binary dependency analysis (with different destination path).

I imagine the filter would need to be applied at several different stages; first, after initial binaries and datas are collected from input parameters and hooks. Then, it would also need to be applied during binary dependency analysis (because excluded binary should not trigger collection of its dependencies). And then, for the good measure, at the end of analysis as well.

So exclusion takes precedence over anything added via --add-data/--add-binary, over anything added via hooks, over stuff that would be pulled in during binary dependency analysis (e.g., a dependency of a binary that was not excluded), and also over .py files that are collected due to module collection mode settings. This way, it would behave similarly as if the binaries / datas TOCS were post-processed in the .spec, with important distinction that excluded binaries would not trigger collection of their dependencies.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Feature request
Projects
None yet
Development

No branches or pull requests

3 participants