Skip to content

Set LC_ALL=C for all usages of comm(1), sort(1), tr(1) #133

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

sideeffect42
Copy link
Member

For tr(1) I only set $LC_ALL when character classes or ranges are used.

All three utilities are dependant on the locale setting. And especially when locale can differ between the local system and the target, sorting order can vary, leading to unexpected bugs.
Also regex character classes and ranges can vary depending on locale.

For tr(1) I only set $LC_ALL when character classes or ranges are used.

All three utilities are dependant on the locale setting. And especially when
locale can differ between the local system and the target, sorting order can
vary, leading to unexpected bugs.
Also regex character classes and ranges can vary depending on locale.
@sideeffect42 sideeffect42 requested a review from 4nd3r April 11, 2025 18:06
@sideeffect42
Copy link
Member Author

Hmm, now that I'm thinking about it, wouldn't the locale selection also affect globs?

@4nd3r
Copy link
Member

4nd3r commented Apr 12, 2025

wouldn't the locale selection also affect globs

Sort order.

Want do that separately? Approved the PR, tho.

@sideeffect42
Copy link
Member Author

Want do that separately?

Question is what to do in the case that we would have to set the locale for all globs.
I think this would be going overboard and I was thinking about maybe setting the locale to C in skonfig instead, for all scripts.
I think using the POSIX locale is what most people expect and if some other locale is required it can always be overridden.

I think I need to set up an Estonian VM to test this a bit 😃.

@sideeffect42
Copy link
Member Author

Quick test suggests that only the Korn shell (not MirBSD Korn Shell (mksh), though) respects the locale in globs.
Though for Estonian A-Z also matches T U V W X Y Z. 😕.
bash, dash, zsh seem to treat globs as in POSIX locale always.

Test e.g.:

# alternatively: LC_ALL=et_EE.UTF-8
LC_ALL=de_CH.UTF-8 ksh -c 'for c in A B C D E F R S T U V W X Y Z Ä Ö Ü; do case $c in ([A-Z]) echo "$c";; esac; done'

@4nd3r
Copy link
Member

4nd3r commented Apr 12, 2025

setting the locale to C in skonfig instead

I think we should do this.

@sideeffect42
Copy link
Member Author

setting the locale to C in skonfig instead

I think we should do this.

Yes, the more I'm thinking about this problem I also come to this conclusion.

I close this PR and then try to modify skonfig accordingly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants