Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dependencies is a set not a list #254

Open
dvot197007 opened this issue Apr 17, 2018 · 13 comments · May be fixed by #430
Open

dependencies is a set not a list #254

dvot197007 opened this issue Apr 17, 2018 · 13 comments · May be fixed by #430
Labels

Comments

@dvot197007
Copy link

dvot197007 commented Apr 17, 2018

Hello,
The documentation section titled "keywords with task metadata" says:

dependencies: list of file_dep

In trying dependencies[0] in a task, the interpreter told me that dependencies is actually a set and can't be indexed. Hence, the documentation should be changed to reflect that.

By the way, the work around suggested on stack exchange is to use list(dependencies)[0].

I haven't checked, but it may be that the keyword 'changed' also refers to a set rather than a list as stated in the documentation?

Thanks,
David

Fund with Polar
@schettino72
Copy link
Member

I think better change the code 😄

Internally doit converts file_dep to set but i think it makes more sense to pass a list as mentioned in the docs.

Should be an easy fix...

@schettino72 schettino72 added this to the 0.32 milestone May 26, 2018
@averagehat
Copy link

Is the order of dependencies passed to the command or python actions guaranteed?

I would think it would match the order it was declared in the task's dictionary. i.e. if I only specify file_dep to be a list of [a, b, c] then %(dependencies)s should guarantee a b c and not b c a.

@valrus
Copy link

valrus commented Nov 24, 2019

I just ran into this - the order seems to be arbitrary. I specified a file_depas ['a.txt', 'b.txt'] and then in the task assigned a, b = dependencies. I ended up with a == 'b.txt' and b == 'a.txt'.
This was quite surprising.

@schettino72
Copy link
Member

can you try master? i will do a release next week (hopefully).

@valrus
Copy link

valrus commented Nov 25, 2019

Just checked and it looks like the order of dependencies is maintained in master. Thanks @schettino72!

@valrus
Copy link

valrus commented Nov 25, 2019

Oops, I was mistaken - sorry. dependencies is now a list, not a set, but apparently it's still a set somewhere in the process because the order of the list isn't stable. I stuck a debugger call in a Task and ran it twice, inspecting a two-element dependencies list during each run, and the first run had ['a.txt', 'b.txt'] and the second had ['b.txt', 'a.txt'].

@schettino72 schettino72 reopened this Nov 25, 2019
@schettino72 schettino72 removed this from the 0.32 milestone Mar 4, 2021
tillahoffmann added a commit to tillahoffmann/doit that referenced this issue Jun 30, 2022
@tillahoffmann tillahoffmann linked a pull request Jun 30, 2022 that will close this issue
@gambolputty
Copy link

Just ran into this. I have a task that has several dependencies of different type (data structure). How am I supposed to process them, when the order from the file_dep-field is not preserved? I can't split it into multiple tasks.
Are file dependencies supposed to be of the same type in doit, because of set()?

tillahoffmann added a commit to tillahoffmann/doit that referenced this issue Aug 16, 2022
@sdahdah
Copy link
Contributor

sdahdah commented Oct 10, 2023

@schettino72 if the order of file_dep is not guaranteed, what's the expected/correct way to handle multiple file dependencies? I'm still struggling with this today, and I've resorted to searching the file paths for substrings.

@schettino72
Copy link
Member

@sdahdah why would you need the order of file_dep to be guaranteed?

@gambolputty just save your structure of which filename is what in a separate place. I guess you could use the meta task parameter for this.

@sdahdah
Copy link
Contributor

sdahdah commented Oct 10, 2023

@schettino72 Thanks for answering so fast. I have two different datasets from different sensors that are both required by the algorithm I'm running. One is a CSV and one is a pickle.

My current workaround is to search the dependency list for *.pickle to load the first file, and *.csv to load the second file. How would you recommend I handle this? I'd appreciate any insight you have

@schettino72
Copy link
Member

def task_hello():
    def python_hello(pickle, json, dependencies):
        print(f'Dependencies are: {dependencies}')
        print(f'pickle is: {pickle}')
        print(f'json is: {json}')

    pickle = 'foo.pickle'
    json = 'bar.json'
    return {
        'actions': [(python_hello, [pickle, json])],
        'file_dep': [pickle, json],
        'verbosity': 2,
    }

@sdahdah
Copy link
Contributor

sdahdah commented Oct 11, 2023

@schettino72 This makes sense! I was making all my Python actions have the signature python_hello(dependencies, targets). I did not realize adding extra parameters was the way to go. Thank you!

@cosineblast
Copy link

I think dependency order preservation is great when the order of the input files to a program affect the hash of the output, and you expect doit to compile your files and produce an output with a specific hash.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants