-
Notifications
You must be signed in to change notification settings - Fork 4.5k
Add common systemd extensions to INI #7267
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…yntax highlighting
This appears to have overlap with #7163 and like that PR I think this should really be rolled into INI. As you're moving the |
Sounds like a plan. Updated to:
The addition of |
|
Take 3! Added
Requiring both narrows down the list a bit vs the last commit — looks like it eliminates more false negatives than false positives, but that’s probably where we’d want to lean. Latest counts: |
lib/linguist/heuristics.yml
Outdated
# e.g., "[foobar]" or "# [commented out]" | ||
- pattern: '^[#\s]*\[[^\]]+\]\s*$' | ||
# e.g., "fizz = buzz" or "# disabled=i am" | ||
- pattern: '^[#\s]*\S+\s?=.+$' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This regex is vulnerable to ReDoS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. Both now check out as safe according to this ReDoS checker. Narrows .conf
candidates a tad to ~360k files.
lib/linguist/heuristics.yml
Outdated
# e.g., "[foobar]" or "# [commented out]" | ||
- pattern: '^[#\s]*\[[^\]]+\]\s*$' | ||
# e.g., "fizz = buzz" or "# disabled=i am" | ||
- pattern: '^(?:#\s*)?[^\s=]+\s*=.+$' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really have to check if there is something after the "="? Should a file like this be considered INI?
https://github.com/Coaixy/weiban-tool/blob/c6a7dfc871eab9eb2549c8f694e2ee100c024791/ai.conf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, we already have the named pattern "key_equals_value", it's probably better to use it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed… changed to use “key_equals_value”
Regarding the heuristics for the Now, some of these files might benefit from being highlighted as an INI file since a lot of their content matches the usual INI syntax, but if we want to avoid mixing formats here, we could include the following rule after the 2 patterns currently used to match INI format.
This regex simply tries to match any non-valid INI syntax ie. a row that is neither a comment, an empty row, a section or a key-value pair. Adding this would mean that any file that contains an invalid line would not be considered INI. Notes on the regexFirst term: /[^#;=\[]+/ This simply matches any character except comments characters ("#" and ";") as well as "=" and "[" which are needed for key-value pairs and sections. Second term: /[^ \n=]/ This is just to make sure that we don't match: |
Agreed on both the negative pattern and using the existing |
It's looking good for the The heuristics could be the exact same as the ones for |
Sounds good! Since all of the files (including the existing ones) are “just” INI files, I included all of the ones listed in |
Woah 😳, when I said "the others", I only meant the ones that are being added in this PR 😅. To keep the scope of this PR more managable, I'd suggest only including a heuristics for those new extensions. Also note that some of the existing ones like |
I got a little excited there! 😉 Switched to only the new file extensions in the heuristic. |
Looking at the Desktop Entry Specification, there could be an argument made that it should remain a seperate entry from the generic INI language. I'd suggest keeping it as a seperate language, but adding it to the INI group instead (which might have been what @Alhadis meant by his comment). desktop:
+ group: INI
type: data
extensions:
- ".desktop"
- ".desktop.in"
- - ".service"
tm_scope: source.desktop
ace_mode: text
language_id: 412 An example of where the Desktop Entry format is different from the generic INI format is with Localized values for keys (see an example file). These aren't valid in the generic INI format since |
Oh yes. Nice find @DecimalTurn. I think your suggestion makes sense with that in mind. |
@@ -181,6 +181,14 @@ disambiguations: | |||
rules: | |||
- language: Gerber Image | |||
pattern: '^[DGMT][0-9]{2}\*(?:\r?\n|\r)' | |||
- extensions: ['.conf', '.container', '.mount', '.network', '.path', '.service', '.socket', '.target', '.timer'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep in mind, this heuristic will not take effect for those extensions that are only associated with one language in Linguist.
We also need a test for this heuristic, especially as .conf
is so generic and needs a pretty concise match.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yeah, they would have to be added to generic.yml. 🤔 Maybe not all of them should after all. @lildude, is there a rule of thumb we can use to determine which ones should be marked as generic in a case like this? Less than 50% of all files with the same extension maybe?
Extension | Number of INI files | Total | INI as share of total |
---|---|---|---|
.container | ~2.3k | ~2.6k | ~88%🟢 |
.mount | ~1.5k | ~2.4k | ~63% 🟢 |
.network | ~3.8k | ~17.1k | ~22% 🔴 |
.path | ~1.1k | ~78.6k | ~1% 🔴 |
.socket | ~2.7k | ~3.7k | ~73% 🟢 |
.target | ~17.4K | ~40.7k | ~43% 🔴 |
.timer | ~9.4K | ~11.1k | ~84% 🟢 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't really thought of a rule of thumb other than noting if an extension is generic enough to not easily convey a language or association with a language and is easily confirmed by the search results returning all sort of file formats as is the case with .conf
. I don't really want to put a value to this as it's already a battle with the usage requirements.
Generic extensions are supposed to be used when the filetype in question has an unambiguous pattern to differentiate it from clearly-unrelated files with the same extension. INI files, already being a poorly-standardised file format with innumerable variations in syntax, can't be identified based on patterns that aren't likely to occur in other ad-hoc configuration formats. This means that adding
I know it's tempting to have |
Add a “systemd unit” language for common file extensions related to
systemd
andpodman
. Utilizes the existing.ini
syntax highlighting rules, and reclassifies the most common.service
file extension from “dictionary”.Description
systemd unit files are used by
systemd
andpodman
for configuration. Their syntax is essentially the same as the old DOS/Windows.ini
syntax. The most common file extensions related,.service
, is currently classified as “desktop”. This PR reclassifies.service
as a new “systemd unit file” language and adds the most commonsystemd
andpodman
file extensions.This fixes #7161 .
@vorburger apologies — I noticed that you had opened PR #7163 after I had created this one.
Checklist:
.service
(reclassified; ~209k files).container
(new; ~2k files).mount
(new; ~1.5k files).network
(new; ~3.7k files).path
(new; ~1.1k files).socket
(new; ~2.7k files).target
(new; ~17.4K files).timer
(new; ~9.4K files)