Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FASTEN server] Fix paths in input records on-the-fly #445

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from

Conversation

mir-am
Copy link
Contributor

@mir-am mir-am commented Apr 22, 2022

Description

This PR makes the following changes to the FASTEN server:

  • Adds an optional argument --pi, which is a base directory for reading input records. E.g., the MetadataDBExtension requires this argument as it reads CG files from the disk.
  • Adds the method fixPathInRecord, which adds the base directory to the relative paths so that the plug-ins can read input records.
  • Stores relative path in the dir field of payload in the output records.

This is a backward-compatible change. Therefore, output records with absolute paths can still be processed.

Motivation and context

As discussed in the previous dev. call, in output records, we would like to store relative paths to the FASTEN data on the disk such as CGs and GID graphs.

Testing

Tested with the DC locally.

Task list

  • Fix paths in input records on-the-fly
  • Add the --pi argument to the Docker compose setup.
  • Add the --pi argument to the production deployments

@mir-am mir-am self-assigned this Apr 22, 2022
@MagielBruntink
Copy link
Member

@mir-am I am a bit confused, didn't the FASTEN server have the --po option already?

@MagielBruntink
Copy link
Member

Should it say that an optional --pi argument is added? What does it mean though optional? What happens if you don't add it, will it still work with absolute paths as before?

@proksch
Copy link
Contributor

proksch commented Apr 24, 2022

I am not 100% sure yet, whether the proposed fix addresses the problem, as the broken path might be in different locations in the JsonObject and it is not necessarily in the root... I tend to say that an easy (yet more expensive) fix would be to do a replacement directly on the Json string, before the JsonObject is even created. For example, replace("/mnt/fasten/pypi-test/", "/mnt/fasten/pypi/") or replace("/mnt/fasten/pypi/pypi/", "/mnt/fasten/pypi/") or maybe we should simply remove this base path altogether (replace with "") in favor of using an environment variable.

@gdrosos or @michelescarlato: Could you chime in here and help to clarify a bit, in which cases messages currently contain broken paths and how these paths should be fixed?

One fundamental problem with our current file handling is that we have multiple base paths and it is impossible to distinguish them... we have the FASTEN base folder that is mounted (/mnt/fasten/), plus we have base folders for every plugin (e.g., the folder into which repos are checked out or call graphs are placed). If a message now contains a full path to a repository checkout, it is not clear where the FASTEN file structure ends and the repo content starts.

What I would therefore advocate for is that we should either 1) have multiple of these base path variables (e.g., baseDir, baseRepoCloner, baseCallGraphs), so plugins can use these config variables instead of constructing paths themselves or 2) what I would find even better would be to introduce an abstraction that will take care of finding the right folder... for example, we could have a RepoClonerIO that could be asked for the right path to the checked out folder of a particular Revision.

The challenge that we are facing is that replacing file paths is a fundamental change which might break multiple components in the system, so we need to be extra careful before we put this into production.

@mir-am As a minor request, please stop introducing these ultra-short names, as it becomes really hard to decipher what they mean. There is no harm in using a full name to describe the option, the only moment in which they are used is in a config file, so "readability" > "saved keystrokes".

@mir-am
Copy link
Contributor Author

mir-am commented Apr 25, 2022

@mir-am I am a bit confused, didn't the FASTEN server have the --po option already?

Sorry, I've corrected the PR text. The introduced CLI arg is --pi. The --po option was already there.

Should it say that an optional --pi argument is added? What does it mean though optional? What happens if you don't add it, will it still work with absolute paths as before?

Only plug-ins that have dir in the payload of their input records like metadataDBExtension and CallableIndexerPlugins must add --pi to specify the base path where their input records are stored.

So, if a plug-in has dir in the payload of its input record and the --pi is not given, then the FASTEN server throws RuntimeException and reminds the user to specify --pi.

It also works with absolute paths for backward compatibility.

@MagielBruntink
Copy link
Member

I agree with Sebastian that this is quite a fundamental change, which can impact many components. What path-containing-fields are we actually changing here? There is dir for plugins that output but there is also eg. sourcePath and perhaps others.

We should keep in mind that other plugins, also non-Java ones, should be able to be configured easily to account for paths becoming relative.

Since we are trying to synchronise these paths across plugins, I always thought have a sort-of "shared root" was not a bad idea, and it seems /mnt/fasten currently has this role. I would like to understand what issue it actually causes, and whether there are not other solutions.

@mir-am
Copy link
Contributor Author

mir-am commented Apr 25, 2022

I agree with Sebastian that this is quite a fundamental change, which can impact many components. What path-containing-fields are we actually changing here? There is dir for plugins that output but there is also eg. sourcePath and perhaps others.

We should keep in mind that other plugins, also non-Java ones, should be able to be configured easily to account for paths becoming relative.

Since we are trying to synchronise these paths across plugins, I always thought have a sort-of "shared root" was not a bad idea, and it seems /mnt/fasten currently has this role. I would like to understand what issue it actually causes, and whether there are not other solutions.

At the moment, there are no issues, but the issue arises, for example, when we want to move fasten data (/mnt/fasten/data) to another place, then the paths in the input records are no longer valid.

That said, I agree that this is a fundamental change and we should consider all the possible consequences and involve partners.

@gdrosos
Copy link
Member

gdrosos commented Apr 26, 2022

@gdrosos or @michelescarlato: Could you chime in here and help to clarify a bit, in which cases messages currently contain broken paths and how these paths should be fixed?

Some messages of the output topic of the cscout which is employed to produce C Call Graphs contained broken paths due to a minor bug described here. I have fixed the issue and I am redeploying the plugin on the mnt/fasten/data/debian folder.

@mir-am
Copy link
Contributor Author

mir-am commented Apr 26, 2022

@gdrosos or @michelescarlato: Could you chime in here and help to clarify a bit, in which cases messages currently contain broken paths and how these paths should be fixed?

Some messages of the output topic of the cscout which is employed to produce C Call Graphs contained broken paths due to a minor bug described here. I have fixed the issue and I am redeploying the plugin on the mnt/fasten/data/debian folder.

@gdrosos, is the path to C call graphs absolute or relative?

@gdrosos
Copy link
Member

gdrosos commented Apr 26, 2022

@gdrosos or @michelescarlato: Could you chime in here and help to clarify a bit, in which cases messages currently contain broken paths and how these paths should be fixed?

Some messages of the output topic of the cscout which is employed to produce C Call Graphs contained broken paths due to a minor bug described here. I have fixed the issue and I am redeploying the plugin on the mnt/fasten/data/debian folder.

@gdrosos, is the path to C call graphs absolute or relative?

The path specifying the sourcePath is absolute in both CScout and PyCG. Also I noticed that the messages of the MetadataBBCExtension also contain aboslute paths in the payload field

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants