Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Starting to work on bi-directional Overleaf support #161

Draft
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

dfm
Copy link
Member

@dfm dfm commented Aug 13, 2022

This is a WIP implementation of a bi-directional Overleaf implementation. So far I just have logic to handle bringing in changes from Overleaf when there have been local changes too. It will try to rebase automatically, but if it can't, the user is required to complete the rebase manually.

Notes to self:

  • This doesn't currently handle things like figures that shouldn't be tracked in SYW. We can handle those using --exclude patterns when applying the initial diff.
  • How do we handle pushing changes back to Overleaf? I think it's nearly identical, but we'll need to think about handling merge conflicts. We could save a temporary directory with the checked out Overleaf repo.
  • Still need to think about how to handle migration from existing Overleaf integrations, and start up for new projects.
  • What about the front end? I like the idea of adding a showyourwork sync command to execute this update, and perhaps an showyourwork build --sync command that would work like the current showyourwork build, syncing every time.

closes #131 (eventually)

@dfm dfm marked this pull request as draft August 13, 2022 12:41
@MilesCranmer
Copy link
Contributor

Just to check: you are planning to use merges, rather than rebases, right? I think rebases won't work on the overleaf side since you aren't allowed to force push to their hosted repos.

How do we handle pushing changes back to Overleaf? I think it's nearly identical, but we'll need to think about handling merge conflicts. We could save a temporary directory with the checked out Overleaf repo.

The merge conflicts would be solved when merging to the SYW repo, right? If so I don't think fixing them would be necessary going back. By merge conflicts here, do you mean new merge conflicts due to other users editing the overleaf repo during the SYW build?

Still need to think about how to handle migration from existing Overleaf integrations, and start up for new projects.

Since we are assuming the repos have distinct histories, I think this part actually won't be difficult. An existing overleaf project should look basically the same as a newly created one, right?

What about the front end? I like the idea of adding a showyourwork sync command to execute this update, and perhaps an showyourwork build --sync command that would work like the current showyourwork build, syncing every time.

Those sound great to me!

@MilesCranmer
Copy link
Contributor

How will this handle auto-generated LaTeX snippets like tables (i.e., the \variable{} command)? I'm assuming those are to be treated basically the same as figures, right?

@dfm
Copy link
Member Author

dfm commented Aug 16, 2022

Thanks @MilesCranmer!

The merge conflicts would be solved when merging to the SYW repo, right? If so I don't think fixing them would be necessary going back. By merge conflicts here, do you mean new merge conflicts due to other users editing the overleaf repo during the SYW build?

Yes - this is exactly the issue I was referring to! I wasn't aware of the force push restriction, but I don't think there's a problem with that. The way I was imagining the workflow would only ever fast forward the temporary local copy of the Overleaf repo, which gets cloned at the beginning of the build. There would be an issue if we hit the race condition you mention when pushing back, and we just want to handle it gracefully. But, since that merge is always of common histories, we'd never need to rebase.

Since we are assuming the repos have distinct histories, I think this part actually won't be difficult. An existing overleaf project should look basically the same as a newly created one, right?

Not quite! The way syw currently handles new repos is that it wipes them, or refuses to push anything if it thinks their may have been changes. I think that's a fine approach to keep for new repos, but it would be nice to have a sensible migration flow for projects that are currently using the existing integration scheme. Certainly doable, but will take some thought to make a reasonable interface.

How will this handle auto-generated LaTeX snippets like tables (i.e., the \variable{} command)? I'm assuming those are to be treated basically the same as figures, right?

Yes! I haven't implemented that logic yet, but the idea is that, like in the current implementation, you would also have a set of glob patterns to push (and possibly pull, but probably not). Thinking about it more, syw should be able to generate that list automatically since it knows which files are generated and required by the manuscript. @rodluger: Is it straightforward to get that list (the paths to all the generated manuscript dependencies: figures, tables, snippets, etc) via the Python interface?

@rodluger
Copy link
Collaborator

rodluger commented Aug 17, 2022

@rodluger: Is it straightforward to get that list (the paths to all the generated manuscript dependencies: figures, tables, snippets, etc) via the Python interface?

You can get the list of all dependencies of the PDF generation step from

deps = set(config["dag_dependencies"][config["ms_pdf"]])

where config = snakemake.workflow.config.

If you need recursive dependencies (i.e., dependencies of those dependencies, which may not be explicit dependencies of the PDF generation step), you can instead do

deps = set(config["dag_dependencies_recursive"][config["ms_pdf"]])

(I don't think that's the case here, but thought I'd mention it.)

Then, you can get all files that are programmatically generated by the workflow from

outputs = set([file for job in dag.jobs for file in job.output])

where you'll need to hackily grab the workflow DAG from

from showyourwork.patches import get_snakemake_variable
dag = get_snakemake_variable("dag")

What you want, then, is the intersection of these sets:

generated_deps = deps & outputs

The easiest place to do this is in the WORKFLOW_GRAPH input function, which blocks the PDF generation rule until the DAG has been built.

EDIT: This approach will also output some hidden dependencies of the workflow, like the .showyourwork/flags/SYW__DAG flag. You could add a filter and select only files under src/tex.

exclusions

better handling of exclusions and inclusions

rename apply exception

starting to work on overleaf integration

exclusions

better handling of exclusions and inclusions

[pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

hackin

[pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

[pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci
@MilesCranmer
Copy link
Contributor

@dfm let me know if you want a guinea pig to test this out! What's the current status of things?

@dfm
Copy link
Member Author

dfm commented Sep 5, 2022

@MilesCranmer — Awesome and thanks! It's getting close and all the git logic more or less works, but it's not actually properly linked into the workflow yet. I'll ping you as soon as I have a working prototype!

@dfm dfm mentioned this pull request Feb 19, 2023
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bidirectional overleaf support
3 participants