`diff` functionality #75

venkat-raman-wday · 2024-10-28T22:44:42Z

Hi, this is a question/discussion regarding the diff functionality.

I see that the diff functionality takes into account the Location and Line property to compute the list of new findings per severity. However, per the SARIF spec, the fingerprints and partialFingerprints provide a one-stop check for uniqueness. I understand what's provided as part of diff functionality partially makes sense in the scenario perhaps where none of the fingerprints attributes are present.

What are your thoughts on the other result attributes such as codeFlow, stacks, different attributes of location attribute etc that can be used to distinguish if results are unique? Would you suggest they be considered outside the scope of this project's implementation? Any ideas on how to handle those cases? Perhaps accept a list of attributes that would serve as the order to compare results against?

The text was updated successfully, but these errors were encountered:

debonte · 2024-10-29T05:38:19Z

I agree that supporting diff based on the fingerprints property, if present, makes sense.

Checking partialFingerprints doesn't seem right to me, since this data is not a stable fingerprint for a result, but rather data that can be used by a result management system to compute a stable fingerprint. See https://docs.oasis-open.org/sarif/sarif/v2.1.0/os/sarif-v2.1.0-os.html#AppendixFingerprints

Perhaps accept a list of attributes that would serve as the order to compare results against?

I'm hesitant to add this level of complexity without feedback from more users.

venkat-raman-wday · 2024-10-29T19:14:47Z

Thanks for answering.

I'm coming from the requirement to produce a SARIF with baselineState attribute annotated for the results given a set of SARIF files. And the requirement is to do this generically to accommodate SARIF files from any tool that produce SARIF. As we know, SARIF output from tools don't carry the baselienState out of the box unless may be if it comes with a result management system (it's a long shot still to get baselineState SARIF) and these tools produce different attributes for a run depending on their domain. In this context, while I agree the spec says partialFingerprints is to be used by result management systems to differentiate logically similar findings, I see it with the same capacity as the fingerprint attribute - to logically distinguish results.

The appendix linked states,

The SARIF format provides the partialFingerprints property to allow analysis tools and other components in the SARIF ecosystem to provide additional information which a result management system can incorporate into the fingerprint that it constructs for each result.

So, perhaps using a SARIF with baselineState attribute to be consumed by result management systems wasn't thought of? But I believe it's a valid use case - compute a SARIF file with baselineState where the partialFingerprints if present is used with any fingerprint attribute present, feed the resultant SARIF to a result management system that consumes it. Thoughts on this?

I'm hesitant to add this level of complexity without feedback from more users.

I'm with you on this. We've a requirement that we handle SARIF files from any tool that supports this output. So, while it could potentially be a SARIF with any of the available attributes, it's likely to end up being only a subset of these multiple attributes.

debonte · 2024-10-29T19:41:44Z

Are you requesting that we provide a way to diff two files ("new" and "old") and update the "new" file with baselineState properties based on our diffing results? That seems like a reasonable request, but it's not clear to me if that's what you're getting at.

compute a SARIF file with baselineState where the partialFingerprints if present is used with any fingerprint attribute present, feed the resultant SARIF to a result management system that consumes it. Thoughts on this?

My read of the spec is that partialFingerprints is only intended to be used as an input for computing fingerprints, and the partialFingerprints are not (necessarily) sufficient to determine equality of two results. The following language which I copied from section 3.27.17 seems to support that:

A result management system MAY use any algorithm to combine the information contained in the various partial fingerprints. (For example, it might decide that two results are logically identically if any one of their partial fingerprints match, or only if a majority of them match, or only if all of them match.)

To make use of the information, if any, embodied in the property names, a result management system requires knowledge of the naming convention used by the SARIF producer. A result management system with that knowledge MAY use the property names to decide which partial fingerprints to include in its fingerprint computation. A result management system lacking that knowledge SHOULD NOT attempt to interpret the information embodied in the partial fingerprint names.

venkat-raman-wday · 2024-10-29T21:19:15Z

Yeah. The goal is to produce a SARIF file 'C', from SARIF files 'A' and 'B' where 'C' contains the results with baselineState property combining 'A' and 'B'. 'A' can be new and 'B' an old run. What I'm also saying is that this functionality has to support any SARIF file - with/without fingerprints, partialFiingerprints, graphs, graphTraversals or any other attributes that can be deemed appropriate to help distinguish two results logically.

I'm a bit confused about the usage of partialFingerprints - the spec says it can only be used as an input for computing fingerprints. However, fingerprints are used to distinguish results without looking at other attributes (stable identifier). So, a result management system or diff functionality for that matter can compute fingerprints for a result should partialFingerprints be present (and they can make sense of the property names) and use that to distinguish results correct?

debonte · 2024-10-29T21:31:41Z

Yeah. The goal is to produce a SARIF file 'C', from SARIF files 'A' and 'B' where 'C' contains the results with baselineState property combining 'A' and 'B'. 'A' can be new and 'B' an old run. What I'm also saying is that this functionality has to support any SARIF file - with/without fingerprints, partialFiingerprints, graphs, graphTraversals or any other attributes that can be deemed appropriate to help distinguish two results logically.

Ok, that makes sense. However, I think the initial implementation would likely only consider fingerprints (if present) plus the existing Location/Line behavior. As I mentioned before, I'd want to see for feedback from users before adding a way to specify fields to be compared as part of the diff.

So, a result management system or diff functionality for that matter can compute fingerprints for a result should partialFingerprints be present (and they can make sense of the property names) and use that to distinguish results correct?

partialFingerprints is just additional data to include when computing the fingerprint. It's not the complete set of data used to compute the fingerprint, and therefore comparing partialFingerprints is insufficient to determine equality in my mind.

venkat-raman-wday · 2024-10-29T21:51:49Z

Ok, that makes sense. However, I think the initial implementation would likely only consider fingerprints (if present) plus the existing Location/Line behavior. As I mentioned before, I'd want to see for feedback from users before adding a way to specify fields to be compared as part of the diff.

Thanks, I think we're on the same page here. I agree that you'd need feedback from users before adding a feature to specify additional fields. However, I doubt there would be such requirements from the general users right now :)

partialFingerprints is just additional data to include when computing the fingerprint. It's not the complete set of data used to compute the fingerprint, and therefore comparing partialFingerprints is insufficient to determine equality in my mind.

I agree with you on this as well. And this is not something that can be exposed generally because the properties within partialFingerprints require additional context that's not easy to express/expose to this library.

debonte added the enhancement New feature or request label Oct 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`diff` functionality #75

`diff` functionality #75

venkat-raman-wday commented Oct 28, 2024 •

edited

Loading

debonte commented Oct 29, 2024

venkat-raman-wday commented Oct 29, 2024

debonte commented Oct 29, 2024

venkat-raman-wday commented Oct 29, 2024

debonte commented Oct 29, 2024

venkat-raman-wday commented Oct 29, 2024

diff functionality #75

diff functionality #75

Comments

venkat-raman-wday commented Oct 28, 2024 • edited Loading

debonte commented Oct 29, 2024

venkat-raman-wday commented Oct 29, 2024

debonte commented Oct 29, 2024

venkat-raman-wday commented Oct 29, 2024

debonte commented Oct 29, 2024

venkat-raman-wday commented Oct 29, 2024

`diff` functionality #75

`diff` functionality #75

venkat-raman-wday commented Oct 28, 2024 •

edited

Loading