Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci: Check of knowledge document exists #1192

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

ckadner
Copy link
Collaborator

@ckadner ckadner commented Jun 13, 2024

Add a primitive check to linter to verify knowledge document exists with the specified commit sha

FYI @bjhargrave -- I thought I could do this with 5 lines of Python code. Turned into a few lines more. I mark it a draft as this still needs more thought 😊

Resolves #1188

@ckadner ckadner requested a review from bjhargrave June 13, 2024 08:06
@github-actions github-actions bot added the ci label Jun 13, 2024
Signed-off-by: Christian Kadner <[email protected]>
Signed-off-by: Christian Kadner <[email protected]>
Copy link
Contributor

@bjhargrave bjhargrave left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure this is worth the complexity since it only covers some specific scenarios and not the case in general.

"https://raw.githubusercontent.com/"
)
for file_path in patterns:
if "*" in file_path:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this only works if no glob is used on a github repo?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. I agree it's usefulness is limited

for file_path in patterns:
if "*" in file_path:
continue
url = f"{repo}/{commit}/{file_path}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will probably fail if the user supplies a proper git URL ending in .git.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, I looked at our summit bounty submissions and thought we could start with something simple, but agree this is a narrow use case

commit = parsed["document"].get("commit")
patterns = parsed["document"].get("patterns")

line = self.get_yaml_line(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could end up reporting no yq command twice for the same yaml file.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right and then we would fix our CI setup ;-)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is not on the CI where we already know yq is installed. It is on the user's workstation where they run this script that they may not have yq installed.

@@ -20,6 +20,9 @@
from referencing.exceptions import NoSuchResource
from referencing.jsonschema import DRAFT202012
from referencing.typing import URI
from urllib.request import Request, urlopen
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is a 3rd party library, then we probably also need to add it to requirements.txt, no? If it is not, then these statements should be moved under Standard.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope, built in

@@ -253,12 +247,65 @@ def check(self) -> int:
message="The file must be non-empty",
)

if "document" in parsed:
repo = parsed["document"].get("repo")
commit = parsed["document"].get("commit")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need to validate the commit value is a valid SHA value. This is probably best in the schema were we can use a regex pattern.

https://json-schema.org/understanding-json-schema/reference/string#regexp

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opened instructlab/schema#30 for this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, that is probably the most/more cost effective change here

@ckadner ckadner marked this pull request as draft June 13, 2024 16:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ci: Check commit sha for knowledge contributions
2 participants