-
Notifications
You must be signed in to change notification settings - Fork 683
Include remote path in FileHolder #5911
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: Robrecht Cannoodt <[email protected]>
✅ Deploy Preview for nextflow-docs-staging canceled.
|
Signed-off-by: Robrecht Cannoodt <[email protected]>
ab13b83
to
0a2ae72
Compare
Looking at the constructor of FileHolder, it's not straight forward to just add a FileHolder( Path inputFile ) {
assert inputFile
this.sourceObj = inputFile
this.storePath = real(inputFile)
this.stageName = norm(inputFile.getFileName())
}
FileHolder( def origin, Path path ) {
assert origin != null
assert path != null
this.sourceObj = origin
this.storePath = path
this.stageName = norm(path.getFileName())
}
protected FileHolder( def source, Path store, def stageName ) {
this.sourceObj = source
this.storePath = store
this.stageName = norm(stageName)
} I just added a class RemoteFileHolder which has a different set of constructors. However there are many more alternative solutions -- e.g. by adding a setter to FileHolder so it doesn't need to get passed to the constructor. @bentsherman do you have any suggestions on which solution you think would be best? |
modules/nextflow/src/main/groovy/nextflow/processor/TaskProcessor.groovy
Outdated
Show resolved
Hide resolved
This reverts commit 0a2ae72. Signed-off-by: Robrecht Cannoodt <[email protected]>
Signed-off-by: Robrecht Cannoodt <[email protected]>
4cf7ad5
to
1793efc
Compare
Looks good. Thanks for your help 😄 Do you understand how to use this in your setup? I will also try it with nf-prov. Let me know if you need help with how to test on your end |
I wonder whether something needs to be updated in the constructors: FileHolder( Path inputFile ) {
assert inputFile
this.sourceObj = inputFile
this.storePath = real(inputFile)
this.stageName = norm(inputFile.getFileName())
}
FileHolder( def origin, Path path ) {
assert origin != null
assert path != null
this.sourceObj = origin
this.storePath = path // ← does this need to become `real(path)`?
this.stageName = norm(path.getFileName())
}
protected FileHolder( def source, Path store, def stageName ) {
this.sourceObj = source
this.storePath = store // ← does this need to become `real(store)`?
this.stageName = norm(stageName)
} Cfr the use case: I'm working on a nf-lamin plugin which allows users to run pre-existing Nextflow workflows and create records in the LaminDB regarding:
This package will use the observer to detect various events and then pass that information on to LaminDB. |
I think you should have a look at this on-going effort #5715 |
I think this PR will benefit all of us -- the provenance store, nf-prov, and nf-lamin. Fundamentally it's about being able to trace staged files to their remote source, which we currently cannot do The only snag is that this PR currently will change the task hash. When a task has a foreign input file, the locally staged path is used for the hash, not the remote path. This might actually be better, as it would decouple the task hash from the foreign file staging. But I wonder if the locally staged path was used for a reason |
I have it working with nf-prov! nextflow-io/nf-prov#45 You can see the before/after in the linked PR. Turns out it doesn't affect the task hash. I think this is because the FilePorter uses the original URL to construct the hash for the stage directory, and since the file is hashed by either base name + size + timestamp (standard) or contents (deep), the remote file and staged file produce the same hash @rcannood to your point about doing |
Signed-off-by: Ben Sherman <[email protected]>
I see, thanks! @bentsherman Any updates on the state of this PR? 🙇 |
Signed-off-by: Robrecht Cannoodt <[email protected]>
@@ -1942,7 +1942,7 @@ class TaskProcessor { | |||
if( item instanceof Path || coerceToPath ) { | |||
def path = normalizeToPath(item) | |||
def target = executor.isForeignFile(path) ? foreignFiles.addToForeign(path) : path | |||
def holder = new FileHolder(target) | |||
def holder = new FileHolder(path, target) | |||
files << holder |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checking this, and I'm bit concerned on the different logic of the two constructors
FileHolder( Path inputFile ) {
assert inputFile
this.sourceObj = inputFile
this.storePath = real(inputFile)
this.stageName = norm(inputFile.getFileName())
}
FileHolder( def origin, Path path ) {
assert origin != null
assert path != null
this.sourceObj = origin
this.storePath = path
this.stageName = norm(path.getFileName())
}
Above all, note the different logic for storePath
. I believe this will change the handle on symlink above all.
If the goal is to bring the source path with could try using the first constructor and adding a method to update the source path
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand your concern.
Would you rather this get implemented as follows?
def holder = new FileHolder(target)
holder.setSourceObj(origin)
I've tried adding this
The problem however is that this invalid existing caches for pipelines depending on staged files |
Alternative solution to #5907 to implement #5905.
I didn't change any other calls of FileHolder except the one @bentsherman suggested in #5907.