Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[IO-783] Remove NTFS ADS separator check in FilenameUtils.getExtension #629

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

baezzys
Copy link

@baezzys baezzys commented May 19, 2024

Summary

This PR addresses issue IO-783 by removing the NTFS ADS separator (colon) check from the FilenameUtils.getExtension method to ensure OS independence.

Changes

  • Removed the NTFS ADS separator (colon) validation from the indexOfExtension method in FilenameUtils.
  • Updated the FilenameUtils.getExtension method to process filenames regardless of the operating system.

…nsion on Windows

- Removed the NTFS ADS separator (colon) validation from the indexOfExtension method.
- Ensured FilenameUtils.getExtension method works OS independently.

This fix addresses the issue where the presence of a colon in filenames caused IllegalArgumentException on Windows only.
@garydgregory
Copy link
Member

I don't think we should ping back and forth between releases in behavior... While this class suffers from some fundamental issues, this change will cause more problems since a feature now disappeared...

@baulea
Copy link

baulea commented Jan 11, 2025

@garydgregory
In https://issues.apache.org/jira/projects/IO/issues/IO-783 Jochen Wiedmann pointed out, that filenames like "C:file.txt" without backslash are valid under Windows (relative path from current directory).

But currently FilenameUtils.getExtension("C:file.txt") and FilenameUtils.indexOfExtension("C:file.txt") throw an IllegalArgumentException because of the NTFS ADS separator check.

Would you please rethink, if that shouldn't be fixed?

If you agree, I could create a revised pull request, which adds unit tests for this case and also removes private methods now being superfluous. You might check the diff here: https://github.com/baulea/commons-io/compare/a9a6fd607fb4c8def8e0f781b9ee45368efb5120..250ebec60fbe1acb30282f593bbd95c34bc6be86

@jkbkupczyk
Copy link
Contributor

@baulea @baezzys
What about marking this method as deprecated and making a new one, that is OS independent? Currently you are breaking code - some programs may rely on that feature.

@baulea
Copy link

baulea commented Jan 16, 2025

@jkbkupczyk You are right, deprecating the current method and adding a new one is the smarter solution.
@garydgregory Do you agree?
@baezzys Would you adjust your pull request or should I provide a new one?

@garydgregory
Copy link
Member

garydgregory commented Jan 17, 2025

-1:

  • The current PR is completely wrong on Windows:
    • The PR returns part of the stream name or type, NOT the file extension!
    • The format is <filename>:<stream name>:<stream type>
    • The file name extension for "foo.exe:bar.txt" is "exe", not "txt".
  • The test needs an explicit check for "C:file.txt" since this is the common example given in the Jira and here.
  • The test should check for each combination of <filename>:<stream name>:<stream type>
  • Also, with and without driver letters.

Tangent: Note that we also have the class FileSystems for OS-specific implementations but it does not have this kind of API.

See:

@baulea
Copy link

baulea commented Feb 1, 2025

@garydgregory If I understand NTFS Streams correctly, then NTFS streams always contain two colons (or none) inside the filename and may not contain only a single colon.

You may just use <filename> without anything else or <filename>:<stream name>:<stream type> where you may omit <stream name> and <stream type>, but not the colons.

Currently the IllegalArgumentException is thrown, if a single colon is found after the last separator (/ or \). Couldn't we just throw the IllegalArgumentException in org.apache.commons.io.FilenameUtils#indexOfExtension only on two (or more) colons after the last separator?

For the same reason "foo.exe:bar.txt" in the unit test is no valid NTFS stream name. But it is a valid filename under Linux and therefore it should not cause an exception and org.apache.commons.io.FilenameUtils#getExtension("foo.exe:bar.txt") should always return "txt".

What do you think?

@garydgregory
Copy link
Member

Hello @baezzys

This all feels confusing because I don't think we follow strict behavior to match the names we use in methods and parameter names.

For example, strictly speaking, on Windows, "foo.txt" is a file name but "c:dir\foo.txt" is not a file name, is a path pointing to a file. Then you have components, including drive volumes also know as drive designator, directory names, and so on.

It's going to be delicate to change this code without breaking existing apps to provide bug fixes. So let's be careful.

IMO, moving this along is going to need a pragmatic approach with unit tests checking all these variations. Just talking about tweaking some method in PR comments isn't going to work for me, you have to run the whole test suite to check for unintended effects.

It's a valuable line of experimentation and I'm sure there's room for improvement.

HTH

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants