Skip to content

Add common systemd extensions to INI #7267

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

dave-atx
Copy link

@dave-atx dave-atx commented Mar 6, 2025

Add a “systemd unit” language for common file extensions related to systemd and podman. Utilizes the existing .ini syntax highlighting rules, and reclassifies the most common .service file extension from “dictionary”.

Description

systemd unit files are used by systemd and podman for configuration. Their syntax is essentially the same as the old DOS/Windows .ini syntax. The most common file extensions related, .service, is currently classified as “desktop”. This PR reclassifies .service as a new “systemd unit file” language and adds the most common systemd and podman file extensions.

This fixes #7161 .

@vorburger apologies — I noticed that you had opened PR #7163 after I had created this one.

Checklist:

  • I am adding a new extension to a language.
    • The new extension is used in hundreds of repositories on GitHub.com
    • I have included a real-world usage sample for all extensions added in this PR:
      • Sample source(s):
        • 2 samples per new extension included. Each file has a link to the original at the top.
      • Sample license(s):
        • GPL (from the systemd project) and Apache 2.0 (from the podman project)
    • I have included a change to the heuristics to distinguish my language from others using the same extension.
      • not needed

@dave-atx dave-atx requested a review from a team as a code owner March 6, 2025 02:22
@lildude
Copy link
Member

lildude commented Mar 6, 2025

This appears to have overlap with #7163 and like that PR I think this should really be rolled into INI. As you're moving the .service extension @Alhadis's comment might be worth considering and take this opportunity to roll Desktop into INI too along with adding all of these to INI rather than creating a new language - normally so only create a new language with nearly identical formats if they need a different grammar. This isn't the case here… you're straight out duplicating INI with a different name.

@dave-atx
Copy link
Author

dave-atx commented Mar 6, 2025

Sounds like a plan. Updated to:

  • Merge desktop into INI
  • Add the below extensions to INI:

The addition of .conf reconciles this PR and #7163, so this PR now covers all extensions that hit the usage minimums between the two.

@lildude
Copy link
Member

lildude commented Mar 6, 2025

The addition of .conf reconciles this PR and #7163, so this PR now covers all extensions that hit the usage minimums between the two.

.conf is faaar too generic and isn't necessarily INI or related to systemd. If you want to add the extension, I recommend adding it to generic.yaml and add a heuristic, but this might be very difficult as the heuristic needs to be bulletproof so it doesn't catch other languages.

@dave-atx
Copy link
Author

dave-atx commented Mar 6, 2025

Take 3! Added .conf to generic.yml. Heuristic boils down to: a .conf INI file must have both:

  1. A line of the form [section] (either commented out or not)
  2. A line of the form key = value (either commented out or not)

Requiring both narrows down the list a bit vs the last commit — looks like it eliminates more false negatives than false positives, but that’s probably where we’d want to lean. Latest counts:

# e.g., "[foobar]" or "# [commented out]"
- pattern: '^[#\s]*\[[^\]]+\]\s*$'
# e.g., "fizz = buzz" or "# disabled=i am"
- pattern: '^[#\s]*\S+\s?=.+$'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This regex is vulnerable to ReDoS.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Both now check out as safe according to this ReDoS checker. Narrows .conf candidates a tad to ~360k files.

# e.g., "[foobar]" or "# [commented out]"
- pattern: '^[#\s]*\[[^\]]+\]\s*$'
# e.g., "fizz = buzz" or "# disabled=i am"
- pattern: '^(?:#\s*)?[^\s=]+\s*=.+$'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really have to check if there is something after the "="? Should a file like this be considered INI?
https://github.com/Coaixy/weiban-tool/blob/c6a7dfc871eab9eb2549c8f694e2ee100c024791/ai.conf

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, we already have the named pattern "key_equals_value", it's probably better to use it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed… changed to use “key_equals_value”

@DecimalTurn
Copy link
Contributor

DecimalTurn commented Mar 7, 2025

Regarding the heuristics for the .conf extension, with the current configuration, there are potentially file types that are similar to INI that could be mixed-in with INI (false positives). For instance, in these search results (44k+ files), there are files that have at least a section and a key-value pair, but they include other lines with non-INI syntax as well.

Now, some of these files might benefit from being highlighted as an INI file since a lot of their content matches the usual INI syntax, but if we want to avoid mixing formats here, we could include the following rule after the 2 patterns currently used to match INI format.

- negative_pattern: '^[^#;=\[]+[^ \n=]$'

This regex simply tries to match any non-valid INI syntax ie. a row that is neither a comment, an empty row, a section or a key-value pair. Adding this would mean that any file that contains an invalid line would not be considered INI.

Notes on the regex

First term:

/[^#;=\[]+/

This simply matches any character except comments characters ("#" and ";") as well as "=" and "[" which are needed for key-value pairs and sections.

Second term:

/[^ \n=]/

This is just to make sure that we don't match:
a) A row with only whitespace
b) Two empty rows (without \n in there, we would match \n\n)
c) A key-value pair without the value specified

@dave-atx
Copy link
Author

dave-atx commented Mar 7, 2025

Agreed on both the negative pattern and using the existing key_equals_value pattern. Made both of those two changes. Now at ~414k .conf candidates for anyone scoring at home.

@DecimalTurn
Copy link
Contributor

It's looking good for the .conf extension! For the others, I'd say a heuristic is probably also warranted, especially .target.

The heuristics could be the exact same as the ones for .conf or the ones used in the search queries for each extension.

@dave-atx
Copy link
Author

dave-atx commented Mar 8, 2025

Sounds good! Since all of the files (including the existing ones) are “just” INI files, I included all of the ones listed in languages.yml in the heuristic. If we have file format name collisions with those extensions, we should also ignore those.

@DecimalTurn
Copy link
Contributor

Woah 😳, when I said "the others", I only meant the ones that are being added in this PR 😅. To keep the scope of this PR more managable, I'd suggest only including a heuristics for those new extensions. Also note that some of the existing ones like .url and .pro already have their own heuristic and we don't want to interfere with that.

@dave-atx
Copy link
Author

dave-atx commented Mar 8, 2025

I got a little excited there! 😉 Switched to only the new file extensions in the heuristic.

@lildude lildude changed the title Add "systemd unit" language for common file extensions Add common systemd extensions to INI Mar 10, 2025
@DecimalTurn
Copy link
Contributor

DecimalTurn commented Mar 10, 2025

Looking at the Desktop Entry Specification, there could be an argument made that it should remain a seperate entry from the generic INI language. I'd suggest keeping it as a seperate language, but adding it to the INI group instead (which might have been what @Alhadis meant by his comment).

desktop:
+ group: INI
  type: data
  extensions:
  - ".desktop"
  - ".desktop.in"
- - ".service"
  tm_scope: source.desktop
  ace_mode: text
  language_id: 412

An example of where the Desktop Entry format is different from the generic INI format is with Localized values for keys (see an example file). These aren't valid in the generic INI format since [ and ] can't appear in the key.

@lildude
Copy link
Member

lildude commented Mar 11, 2025

Oh yes. Nice find @DecimalTurn. I think your suggestion makes sense with that in mind.

@@ -181,6 +181,14 @@ disambiguations:
rules:
- language: Gerber Image
pattern: '^[DGMT][0-9]{2}\*(?:\r?\n|\r)'
- extensions: ['.conf', '.container', '.mount', '.network', '.path', '.service', '.socket', '.target', '.timer']
Copy link
Member

@lildude lildude Mar 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep in mind, this heuristic will not take effect for those extensions that are only associated with one language in Linguist.

We also need a test for this heuristic, especially as .conf is so generic and needs a pretty concise match.

Copy link
Contributor

@DecimalTurn DecimalTurn Mar 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah, they would have to be added to generic.yml. 🤔 Maybe not all of them should after all. @lildude, is there a rule of thumb we can use to determine which ones should be marked as generic in a case like this? Less than 50% of all files with the same extension maybe?

Extension Number of INI files Total INI as share of total
.container ~2.3k ~2.6k ~88%🟢
.mount ~1.5k ~2.4k ~63% 🟢
.network ~3.8k ~17.1k ~22% 🔴
.path ~1.1k ~78.6k ~1% 🔴
.socket ~2.7k ~3.7k ~73% 🟢
.target ~17.4K ~40.7k ~43% 🔴
.timer ~9.4K ~11.1k ~84% 🟢

Copy link
Member

@lildude lildude Mar 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't really thought of a rule of thumb other than noting if an extension is generic enough to not easily convey a language or association with a language and is easily confirmed by the search results returning all sort of file formats as is the case with .conf. I don't really want to put a value to this as it's already a battle with the usage requirements.

@Alhadis
Copy link
Collaborator

Alhadis commented Mar 11, 2025

Generic extensions are supposed to be used when the filetype in question has an unambiguous pattern to differentiate it from clearly-unrelated files with the same extension. INI files, already being a poorly-standardised file format with innumerable variations in syntax, can't be identified based on patterns that aren't likely to occur in other ad-hoc configuration formats.

This means that adding .conf as an INI extension—generic or not—is not only going to lead to inconsistent language (mis)classification, it'll also result in ugly-looking highlighting for INI-ish formats that don't conform to the syntax expected of the INI highlighting grammar. In other words, the same problem alluded to by @DecimalTurn:

An example of where the Desktop Entry format is different from the generic INI format is with Localized values for keys (Example file). These aren't valid in the generic INI format since [ and ] can't appear in the key.

I know it's tempting to have .conf files highlighted as INI, but the margin for error here isn't worth the prettier-looking files (which authors can always identify as INI using a .gitattributes file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Missing additional extensions for systemd INI format style files
4 participants