Storing generated PDF creates rebase conflicts #91
Replies: 10 comments 20 replies
-
After some googling I found https://gist.github.com/tmaybe/4c9d94712711229cd506 which explains |
Beta Was this translation helpful? Give feedback.
-
We could keep the generated PDFs only in
I think that |
Beta Was this translation helpful? Give feedback.
-
Thanks for the ideas. Yeah, we could do away with the PDF in the dev-branch. The "branching" model we decided upon was, as least how I see it, to maintain two version of the documentation
We would continuously rebase (Maybe that needs to go somewhere, for posterity, and maybe myself ,-), the wiki?) Removing the PDF from one, or both, branches would help. But that leaves us with no "dev documents" to point to. There is a CI-job, which currently only does asciidoc validation as far as I can see. Would it be a possible route to expand on that to build the documentation too? Maybe that was our intention all along. Setting up the toolchain in Travis would be a hazzel, esp. with Then we could possibly upload the result from the branches to two different "releases", like "Official" and "Snapshot". Then those would be stable places to point to. And would contain corresponding, and updated, versions of the various documents. (Ok, only the manual would actually need the "Snapshot" version...) I think I tried something like this (uploading to github) in some other project but did not finish it, probably because of lack of time. I would be interested to dig into this. Maybe a new issue for it? |
Beta Was this translation helpful? Give feedback.
-
ALAN Manual: Dev Snapshots
Exactly. The point is that not every commit to
The PDF is really only required in ALAN Manual: Dev Strategies
After the first conflicts problems, we've been careful to ensure that the dev branches are always rebaseable unto Workflow Notes
We did jot down some notes in Since we've been using diligently Issue labels, milestones and Dashboard Projects, it should be fairly easy to find out any topic via filtered searches. But this only covers Issue, not in-repo documents or Wiki pages... Indexing the Existing KnowledgebaseWe should strengthen the Wiki of each repository, creating some sort of smart index that allows us to retrieve all these guidelines and notes. The Wiki of the alan-if/alan repository might deserve more attention in this respect, and could become the main "official Wiki", making it easier to use it a directory to find all knowledgebase articles (or links to Issues), otherwise this knowledge will be scattered across many repositories (and their Wikis). The only problem here is redundancy — i.e. some articles naturally belong to the Wiki of their repository, but having to keep an Index in every Wiki of the alan-if org might become cumbersome, unless we can come up with some sort of databse solution than can auto-generate and update all these Indexes. It might seem overkill, but the problem here is that our projects are actually over documented (whereas usually the opposite is true). Writing guidelines and tutorials take time, so it's a pity if these articles are not brought to fruition due to lack of visibility. CI Automation
It only validates code styles consistency via EditorConfig, actually.
I remember the problem being the asciidoctor-fopub part, which only works on Windows (see #66) due to Win-specific paths in our custom configuration (for fonts or styles, don't remember). The fopub configurations are not very friendly, and quite hostile toward collaborative editing and version control. Probably the best solution would be to check Asciidoctor's native PDF backend, which has been updated a lot in the meantime and probably solved all the issues that were preventing us from using it here (problems with footnotes inside tables, and a few other missing features, see: #9) ALAN Manual Dev Snapshots: Release Strategy
To achieve this we'd need to define a snapshot release strategy. Sometime only one of the two format needs updating (e.g. because we improved the CSS of the HTML doc, or the template of the PDF); other times there might be just typo fixes, we don't necessarily call for an updated version. Possible solutions could be:
The problem with (1) is that end users might have to wait to gain access to new juicy stuff; whereas with (2) the problem is ensuring that we update the contents before releasing the new Alpha, since the build jobs would be automated via some CI cross-repo communications, base on release tags or branch merges.
Personally, I'm for the manual approach, after all we are always aware when new juicy contents are added to the Manual, so each one of us is free to update the HTML snapshot (and even the PDF) based on whether he thinks it's worth sharing it with bleeding-edge authors — it just boils down to deciding whether to add the built docs to the commit stage or not, so no big deal there. Since the snapshot preview links are pointing to the
We've had a long discussion on this in #6 (now closed), we could reuse some of the text material from that thread if it saves us some typing. About PDF Rebase ConflictsBack to the original question of this Issue, which type of conflicts does rebasing on What I usually do when these conflicts come up (inevitably for both HTML and PDF builds, due to date changes or just Asciidoctor template changes) is to simply solve the conflict with either "our" or "theirs" and then rebuild both docs on the fly and add them to the stage before committing the rebase — bear in mind that when rebasing or merging we should always rebuilt the docs from scratch (if we include them) because of the way dates and version numbers are handled, and because Asciidoctor updates often introduce CSS changes. So the best strategy is to always manually rebuild the docs. This can be probably achieved by some Git hooks/filters (when carrying out specific operations on |
Beta Was this translation helpful? Give feedback.
-
Thanks for digging up #6, and some good thoughts on information handling, although the concrete approach for that is still up for discovery, decision and implementation ;-)
Good. But I feel we have slightly differing ideas about how to manage the actual "results" of the branches. I read your comments as thinking around releases. I also think "releases" are important, but I also strive for "continuous deployment" to lessen the cognitive load on us to remember to create new "builds" only for "releases", and deliver the value of the change as soon as possible. E.g. before extracting the manual to this ...
As already indicated I'd suggest "generate new documentation on every commit". To me, the release process is different from the continuous builds and deployment. The important thing is that the information in a document generated from A release of Alan will thus also render some additional, manual, work when it comes to
Just stating this, to ensure we are on the same page here, but I'm confident we are. (I'm ignoring the actual build here, since that is implied, being manual or not. I'm also ignoring the actual merge-branch of So when it comes to releases, to me, only the update of the documentation for the new functionality is important, Any other types of changes are inconsequential when it comes to releases and can be done, and published, at any time. An improvement should not need to wait for the next release. But again, I think we think about this in slightly different ways. So let me put it this way: What would be the worst thing that could happen if we build a new set of fully usable documentation on every commit? It's not like there are API incompatibilities that breaks things, as for a software release. We also use SemVer-like semantics for the two branches, as does the Alan SDK with the official releases and the development snapshots, so the correlation between them are clear. To be very clear, I'm not forcing the issue. Instead I think it is interesting that it seems that we have differing reasoning here, and interested in learning more about your viewpoint. At the end of writing all this I realise that my concern is primarily the content, and even more so, specifically the manual. But you have worked hard, even struggling, with the toolchain, layout and such things, and also for the other documents. A random change in a tool configuration or version might actually trash things, warranting a "proof-reading" before actually releasing. Is this your concern? If so, I think I can feel where you are coming from... |
Beta Was this translation helpful? Give feedback.
-
As far as I can remember, the reason we didn't manage to come up with a solution was because of a number of unsolved questions that were preventing a CI solution:
Also, so far we had only a single Manual release on But let's recapitulate the problematic points of the above list... asciidoctor-fopub ProblemsUnless we can find a solution to the fopub problem, it's going to be hard to come with any CI solution. The current problem prevents using the configuration files on both Windows and Linux, and we're now using Windows for the builds. Possible solutions:
Alternatively, switch to Asciidoctor's native PDF backend (if it now support all the needed features) — but this would require some extra work before it becomes usable in CI production:
Manual DateI remember you proposed that the It would mean that if the docs are rebuilt at every commit, the date would also change in the docs, regardless of whether contents have changed. Especially for the PDF edition (which is usually intended for download, whereas the HTML doc more for online consultation), because end users might end up download it again even if no real changes occurred (which translates to downloading and replacing their local copy, wherever that is stored). IMO the last updated Dynamic ExamplesAlthough right now the Manual doesn't use these (other docs here do!), in the nearby future we'll be adopting this approach always more, since it proved successful for the StdLib Manual (see AnssiR66/AlanStdLib#82). The idea is that ALAN code snippets in the docs should be extracted directly from a real source file (via This would ensure that:
This reduces the maintenance work of the examples, by allowing us to "set them and forget them", and have the toolchain automatically produce the correct results. But it also mean that we must ensure that we're using the correct ALAN binaries, both locally and on the CI server, which introduces the problem of having to use the latest Beta on
You mean on every commit the docs should be rebuilt and commite to the
I'm 100% with you on this, and I'm also a great fan of all things "auto-magic". It's just that I think that there are still some major problems with the Asciidoctor build that need to be addressed before we can setup a CI toolchain of this type. Also, I believe you use Circle CI, which I don't know anything about (I use only Travis CI). In the meantime, GitHub Actions have also entered the scene, which seem an interesting way to handle CI tasks, especially with the Marketplace offering ready-made solution which are maintained by the creators of these GH Actions — especially when the actions are build by the creators of the tools which are involved.
I don't think we have different visions on this, is just that each of us is focusing on different problems that are preventing this to happen. Whereas you're more focused on how to interconnect this to the ALAN release cycle, I'm more focused on the current unsolved problems of the Asciidoctor toolchain (which prevent any reliable CI build, right now). Bear in mind that I've been spending quite some time trying to find solutions to a number of Asciidoctor toolchain related problems, and how to come up with a good solution that would work across different repositories that use Asciidoctor for ALAN — so I tend to be more aware of how far these solutions are. Just to mention briefly one problem: ISO-8859-1 validation! ECLint simply fails to validate ISO-8859-1 files, raising false positive for valid files. There doesn't seem to be a bullet-proof way to validate files for ISO-8859-1 encoding, you can mostly proof they are not UTF-8, or that they are single-char encoded. Yet we need assurances that our sources are valid ISO files, especially since modern editors tend to break ISO encoding with almost any paste operations. The problem even gets worst when we don't have some ALAN specific file extensions for ALAN related files (e.g. transcripts and solutions), because extensions like
Not really, I mean ... the contents are usually well polished whenever we commit them, and the If you had been struggling with the "spurious whitespace bug" that has afflicted the StdLib and Alan Italian repos (it suddenly disappeared in the latter, for unknown reasons) you would know how frustrating it can be to work with Git and ALAN sources when things go wrong — at every run the transcripts change, even if nothing was changed, so there's at play some complex interaction between the ISO encoding and Git's lack of support for it here, possibly due to a small bug that spits out a char sequence which to Git is broken UTF-8. The problem is that these changes show up in Git's work space, and these are a nightmare on any CI job, since they prevent many Git operations. |
Beta Was this translation helpful? Give feedback.
-
Publishing Dev Snapshots on an Orphan BranchI've been giving some thought to the whole problem of how and where to "publish" the dev snapshots of the ALAN Manual (both PDF and HTML). I think that committing them to the A possible solution would be to tweak the build scripts so that they are branch aware (via a simple We could create this special branch as an orphan branch, which only stores snapshot previews documents and doesn't share any history with the main repo, so no possible conflicts could come from it, but we'd still be able to offer live preview links of the latest dev docs to end users. This approach should lend itself well to the various CI tasks, and then we'd only have to focus on auto-rebuild the Manual on Also, being an orphan branch, we could simply force commit at each documents build, effectively resetting the branch at every new snapshot, since we won't really be needing a commit history there; which means that its size would never bloat, even if we rebuild them at every single commit on In any case, I think it's important that dev-snapshots of the Manual should be build with different names from those on Does it make sense to you? |
Beta Was this translation helpful? Give feedback.
-
Here's a quick attempt for some starting points for the discussion about the "release process" and other related matters in the Alan documentation project, but they are primarily from the perspective of releases of the Alan SDK and The Manual. I decided to do this in a more structured manner so that we can pick one of the items and, thanks to the threading of discussions, focus on one item in one thread. Hopefully much better than our previous comment tennis. Principles
Implications1. Don't store generated files in the repositoryA. Firstly this means that we need to generate the output at some point. Automatically or manual, with a preference for the former ;-) The natural suggestion would be on each commit. (Practical problems will be discussed later ;-) B. We need to figure out where to store the generated output. I initially thought we could upload them somewhere, The Home Page maybe, I experimented with Github Releases (don't know why that was so hard...), but that would only solve the problem of discrete files, like the PDF, not the HTML file tree. Orphan branches, a.k.a GH Pages, sure looks like one solution we can try. (I don't understand why you would even need a branch for this, but I suppose it's just the way Github decided to do it at one time...) C. We need to decide on which "releases" of which documents and in which formats to store, and how to separate them. As usual, I mostly think about The Manual. And from that viewpoint I truly believe that we only need two simultaneous versions, the currently best version matching the latest official Alan SDK release, and the currently best version of the manual matching the development snapshots for the upcoming release of the SDK. This means that we should allow improvements in both those two branches and their respective generated output, possibly (preferably?) overwriting the previous. D. At some point, probably at the release of a new SDK, we also need to "archive" a version of the manual for the just outdated official release. 2. Generated output should be possible to identifyAgain from the perspective of the manual, there are two different identifications that are relevant for a document. A. The version of the "thing" it describes. For the manual this is fairly obvious, namely the release of the Alan SDK. The same goes for documentation of the standard library. It is more interesting to think about what that means for The Guide and other such documents. Maybe the Alan SDK is the primary reference here, but in these cases is it the first release that supports what's in the guide? B. The version of the document itself. The date of the generation or the commit would probably suffice. The date for the latest of the "content bearing" files would be another alternative. It would depend on your view of both what are meaningful changes to a document, and does a reader actually care about that change, but also would a reader use that information for anything? I personally think that the build date (or commit) would be good enough here. 3. A reader of a document is always interested in getting the best version possible as long as it still describes the same "thing"A. The fundamental implication of this is that users prefer an updated online version to downloading. B. There need not be a "stable" version of a document, it can always be updated. C. A user is not interested to get back to an earlier version since the one she is looking at is the best ever ;-) (This might not be an implication but an assumption that could be discussed if we find a use case that contradicts it.) D. The best version of a document is the latest and the best version for a previous version of the SDK is the last version before introducing descriptions of new features. This implies that archiving for the previous release can be done at release of the new SDK (or what ever the thing it describes). E. Pointers to documentation (e.g. on The Home Pages) should always go to generated output at the same place/url. ProblemsI suggest that we discuss problems with, and suggestions for implementation for, problematic points above in separate threads under a suitable heading. |
Beta Was this translation helpful? Give feedback.
-
OK, I'm commenting here in the root of the thread, just to cover the overall changes to the system, and make sure I'm getting it all right — we can later on extrapolate the main point in a more concise manner and add the to the sub-threads, to keep the conversation cleaner.
Unfortunately, subsequent posts from a same author are being joined together, that's the current limitation. I should post on GitHub Community and ask for a new option to allow splitting or joining such follow up comments. Now, regarding the new system ... It will introduce some subtle but significant changes, which in some respect change everything in the current workflow. I agree with all that you wrote, and will now share some extra thoughts, details and perplexities. ALAN Manual Releases Scheme(let's focus only on the ALAN Manual right now, since it's the centre publication of the project, and because other docs are not updated so often, or are not available in all formats, or are not SDK dependent. Also, if it works for the Manual, it will work for them too.) In the old workflow, it was up to us to decide when to commit (and therefore publish) an update version of the Manual. With the new system, any commit will update the documents on the new branch which serves them to end users. And this is fine. I mean, there's really no point of tracking a Manual release, we only need (as you said) to provide two versions: one for the Beta SDK (which currently means stable), and one for the Alpha SDK (which currently means whatever latest Alpha SDK is available). If every commit results in these two documents being updated, it also means that we've solved the original problem of developers having access to a commonly shared build document (PDF or HTML) which they can use to check problems and point them out — they'll simply be available online, via GitHubPages (at some point), and/or via the dedicated branch anyway. Version Number/Release DateAs for the release versioning scheme, it seems we don't need one after all, the version of each document is the ALAN SDK to which it refers. The build script should just define the date/timestamp attribute which the AsciiDoc source will incorporate into the final document — and thanks to Asciidoctor conditional evaluations, we'll be able to add any required text, e.g. for the Alpha version being a developer snapshot, or provide the correct SDK download links, etc. So that's not a problem, we are free to handle it whichever way we like. So, as far the version info goes, each document (PDF and HTML) should mention the ALAN SDK and the last updated timestamp, which corresponds to the CI's time of execution. There's no risk of an older version replacing a new one here, since it's all automated. As for end users' consumption, let's assume that whenever they decide to work on a new adventure they'll just re-download the latest documents (Beta or Alpha), and they shall be good with them for quite a while. Nothing prevents us from having different build scripts in the repository, e.g. one used by CI, one used by developers to quickly produce a local preview — indeed, this might be a desirable feature, since when building a local preview one wishes for the HTML doc not to embed images (so it will refresh them if they are tweaked), and want a single-document output (whereas in the future we might probably also build a chunked HTML online version, for consultation, at least for the Beta Manual). Alan repo vs Alan-DocsHere I'm not sure I've understood what the CI will be doing... Historically, the ALAN repository also included the Manual sources (as ODT) and will have to automate creating the new PDF and including it in the various packages, at every new release. Now that the Manual (and other docs) have been moved to the ALAN Docs repository, things have changed. I understand that whenever a new ALAN Beta is released, there's the need to merge Since the build script acts on branch-awareness, merging should be sufficient, and the script will handle all the correct SDK references. All we need to do is ensure that inside the Manual source we never explicitly mention the current SDK, but use an attribute instead (e.g. This, of course, has also some implications on the Manual's workflow, for it means that we must ensure that the What I don't understand, is whether you're also planning to auto-include the PDF and/or HTML Manual in the distribution packages — again, this shouldn't be a problem, since the CI can simply build them on Alan-Docs or extract them from the (updated) publishing branch. In this case, again, version number shouldn't be a problem: the latest Beta will contain the latest Beta Manual from the updated So, basically, we have the CI jobs on Do you foresee some complications in this process? or some implications that might restrict the workflow on ALAN Docs? Some considerations on the finer details being discussed... Branch-Aware ScriptsThe introduction of branch-awareness in the build scripts will not only solve the CI problem, but also the afore-mentioned problem of dynamic code examples and transcripts requiring the correct ALAN SDK at build time. Since these dynamic examples are most likely going to be used more in the ALAN Manual in the future, let's just summarize how they now work on the StdLib. In the StdLib repo we have the same problem: The way I envision this feature being supported is by having inside the I believe that the same scripts can be used for the CI, and that they shouldn't pose any problem at all, after all the CI builds also want to ensure that the correct SDK binaries are being used, and that these will also have to be downloaded on the CI virtual machine in order to test and build the documents. So, this is another problem that should be considered fixed, both for the build toolchain and the CI. Re-Building at Every CommitI've understood that your approach is that with every new commit the docs should be rebuild. I'm a bit concerned about the free minutes limitation that was recently introduced by Travis CI (and GitHub Actions), and the possibility that having a full tests plus conversion(s) job might actually exceed these free minutes (I don't remember exactly how many free minutes there are, but they are not too many when dealing with big jobs, and they apply to the whole account). Also, when it comes to I might exaggerating my worries here, it's just that since Travis introduced the free minutes limitation I started to be a bit paranoid about it. One thing is if you exceed these minutes and just renounce on the EditorConfig validation (the only Travis CI job currently in Alan Docs), another is if you're skipping the update process when a new ALAN Beta release is out! The asciidoctor-foub toolchain is quite slow and time-consuming, and if we add up all the validation tests and conversions that make take place at every commit, I'm not sure of how many Travis minutes each commit will amount to. There are periods where there isn't much activity on the repo, and other where all the efforts are invested in a single time span, so its hard to tell. The point is that if you build the whole toolchain on a CI system, it's worth considering the consequences that exceeding the free minutes might have on the whole ALAN echo-system (e.g. including the wrong version of the documentation in the generated packages). Furthermore, whereas with |
Beta Was this translation helpful? Give feedback.
-
Renaming the
|
Beta Was this translation helpful? Give feedback.
-
I know there are some advantages of storing a generated version of the PDF:s in the repo, e.g. there is always a version that can be pointed to from the website.
But just having been through the hazzle of rebasing the dev-man branch onto a recent chunk of text changes, and been forced to handle a conflict for each and every commit (and probably also forcing a lot of that onto whoever wants to pull now), I think we need to re-think this. (I had no other problems...)
I propose that we figure out a way to upload the generated files to repo releases or somewhere else (website), so that we can get rid of the generated binary files from the repo to streamline the editing of the two branches.
I quickly looked for a git option to always ignore conflicts for some files but didn't find much. Possibly we could use a (custom?) merge-driver. Never heard about it before so that's why I added the "research" tag...
Beta Was this translation helpful? Give feedback.
All reactions