Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I think this repository can support more languages #326

Open
Jzow opened this issue Aug 9, 2022 · 21 comments · May be fixed by #356
Open

I think this repository can support more languages #326

Jzow opened this issue Aug 9, 2022 · 21 comments · May be fixed by #356

Comments

@Jzow
Copy link

Jzow commented Aug 9, 2022

have this plan at present?

@Jzow
Copy link
Author

Jzow commented Aug 9, 2022

#126 I can do some translation for Chinese. At present, many people in China are studying driver develop. It happens that rust language has advantages in this respect. I think it can be translated into Chinese for easy reading

@eldruin
Copy link
Member

eldruin commented Aug 9, 2022

We would be glad for translations of the book into Chinese and any other language!
mdbook does not have multilingual support yet so I think an easy way to do this for now would be to fork this repository and then translate it. You can then publish it to GitHub pages and we will link it here.
This is what @tomoyuki-nakabayashi has done for the Japanese version of the discovery book
Does that sound reasonable to you?
We can help you if some steps do not work.

@burrbull
Copy link
Member

burrbull commented May 4, 2023

https://crates.io/crates/mdbook-i18n-helpers

@mgeisler
Copy link
Contributor

Hi there, I wrote mdbook-i18n-helpers and we've been using it for some months now for Comprehensive Rust 🦀. I would be happy to help with setting it up for this book.

I see that there are already two hand-written translations available. Getting those converted back to the Gettext PO format will be a non-perfect process. I can try writing a convert tool which takes two Markdown files as input and outputs a PO file. If the translations have kept the Markdown structure intact (same number of paragraphs, same number of list items, ...), it should be possible to convert them back to a PO file.

@eldruin
Copy link
Member

eldruin commented May 12, 2023

Sounds great @mgeisler, thank you. It would be great to draw more people into embedded Rust :)
However, please beware that we generally do not have much time to invest, so the setup would need to be easy to manage.

@mgeisler
Copy link
Contributor

Sounds great @mgeisler, thank you. It would be great to draw more people into embedded Rust :)

Yeah, I would hope so!

However, please beware that we generally do not have much time to invest, so the setup would need to be easy to manage.

Makes sense! In general, the setup I use for the course favors long-term maintainability for some simplicity. The easiest no-setup-needed idea is to clone the repository and do the translations directly inside the English files. However, it's hard to impossible to track changes to the original files that way.

The Gettext approach helps here since it decouples the translation from the source. You extract source text into PO files and translators work on those. When you publish a translation, you read the translation from the PO files. The PO files are small version-controlled databases and there is tooling available which can merge new strings into these databases. This is described in detail in mdbook-i18n-helpers.

@burrbull
Copy link
Member

@mgeisler Could you write CI for this repository for pot file update and release translated books?

@mgeisler
Copy link
Contributor

@mgeisler Could you write CI for this repository for pot file update and release translated books?

Yes, I would be happy to do this! It might be a week or two before I get to it.

@mgeisler mgeisler linked a pull request May 16, 2023 that will close this issue
@mgeisler
Copy link
Contributor

It might be a week or two before I get to it.

I nerd-sniped myself and couldn't help but working on it... please see #356 😄

@Jzow
Copy link
Author

Jzow commented May 17, 2023

Excuse me, I have two questions.

Firstly, I have roughly reviewed the description of @mgeisler The mdbook-i18n-helpers warehouse is designed to help md convert to po, and I don't quite understand what the po here means. What is its purpose?

The second question is, do we need to wait for this plugin to complete before translating, and do we need a manual before translating.

@burrbull
Copy link
Member

burrbull commented May 17, 2023

.po is translation file for https://en.wikipedia.org/wiki/Gettext

Supported by several editors. For example
https://poedit.net/

@Jzow
Copy link
Author

Jzow commented May 17, 2023

.po是https://en.wikipedia.org/wiki/Gettext的翻译文件

得到多位编辑的支持。例如 https://poedit.net/

thanks!

@mgeisler
Copy link
Contributor

Hey all, I just wanted to say that offer in #356 still stands:

[...] I hope people can give it a try. If someone (such as @DownyBehind from #350 and @Jzow from #326) wants to play with a Gettext-based translation, then they're free to open PRs against https://github.com/mgeisler/rust-embedded-book. That repository can be a temporary "translation playground" so that you don't have to deal with anything in this repository.

I have no connection with this repository and I think it's up to the Embedded Rust community to decide if they want to embark on a translation. If you do want to try, then feel free to add stuff https://github.com/mgeisler/rust-embedded-book — I'm happy to give people write access so they can experiment freely there. The results will show up in the #356 PR automatically 🙂

For Comprehensive Rust, we've just embarked on adding Chinese (Simplified) and Chinese (Traditional) translations and the PRs are coming in now. That might help show the system in action.

@eldruin
Copy link
Member

eldruin commented May 30, 2023

Thank you @mgeisler! We will discuss this again in tonight's meeting.

@eldruin
Copy link
Member

eldruin commented May 31, 2023

There is some support to go through this route, however, the discussion yesterday highlighted that there are a couple of missing pieces in this PR:

  1. Extraction of the po templates
  2. Merging of the po files

Furthermore, the upstream Rust docs also link to externally hosted copies of the book so if we used po files, we would probably have to coordinate with the upstream Rust docs.

Please do not be discouraged by our hesitance. We are just a diverse team of time-constrained people.

@mgeisler
Copy link
Contributor

mgeisler commented Jun 7, 2023

Hey all,

Thanks for discussing this!

There is some support to go through this route, however, the discussion yesterday highlighted that there are a couple of missing pieces in this PR:

  1. Extraction of the po templates

That is done by enabling the mdbook-xgettext renderer on-demand. The README has an example:

MDBOOK_OUTPUT='{"xgettext": {"pot-file": "messages.pot"}}' mdbook build -d po

This generates a po/messages.pot file. Since it is completely auto-generated, you should not commit it to the repository. Instead, translators generate it as needed when they intend to merge in new strings from English text to their translation.

  1. Merging of the po files

I think you're referring to the updating of existing translations step? There translators use msgmerge to add new strings (from the messages.pot file) to their translation.

If you're instead talking about "merging" as in "version control mergers", then most of the time you just treat the PO files as source files. If you do run into a tricky situation where a PO file has been edited in different conflicting ways, then I would reach for msgcat, which knows how to accumulate messages from two or more PO files. I would basically concatenate the two conflicting versions and then use msgmerge --update to massage this into a new clean file.

Furthermore, the upstream Rust docs also link to externally hosted copies of the book so if we used po files, we would probably have to coordinate with the upstream Rust docs.

I'm not sure why what the impact of this is? Could the upstream docs not just link to the translations like they do today, regardless of how those translations are producted?


Taking a step back, we're only talking about how the translation is stored:

  • it can be stored in a parallel set of Markdown files
  • it can be stored in a more structured way with PO files

The first approach is easy to understand, but the translators are left hanging with no clear way to incorporate fixes from the original files. It's effectively a massive fork — you could just as well have copied the files from src/ to src-da/ for Danish and then update the two sets independently. The only help translators get is to look at the diffs of src/ and then apply those manually (while translating!) to src-da/. Notice that it's hard to track which updates still need to be applied, unless translators replicate the commits one-by-one in strict order.

The second approach lets you use the translation via the PO files and its fuzzy mechanism. Translators can update the PO files as they like, starting with the messages that look most easy. The translators only see the current version of the book in the PO files: they don't see every individual commit or typo fix.

While I didn't present it like this, both approaches lets people publish the resulting books anywhere they like. For Comprehensive Rust, I'm trying to merge all translations back to the main repository, but people could actually update the PO files in their own forks and publish the translation somewhere else. They would still get the advantages of using PO files: since they never touch the files under src/, they will never get merge conflicts when they merge in changes from the main repository. When changes are made in the main repository, msgmerge will tell them about it by flagging the existing translations as fuzzy. Translators could for example import new messages weekly or monthly and fix the fuzzy entries that have appeared since last.

Please do not be discouraged by our hesitance. We are just a diverse team of time-constrained people.

Oh that's totally fine — we all are 😄

I hope the above cleared things up a bit, but if not, please ask me more questions 😄 I'm also happy to come to the chat since that can be much faster.

@burrbull
Copy link
Member

burrbull commented Jun 7, 2023

renderer on-demand

This is the issue.
If you don't see absent translations you don't know you need to actualize it.
I would better see incomplete translation then outdated one.

@mgeisler
Copy link
Contributor

mgeisler commented Jun 7, 2023

renderer on-demand

This is the issue. If you don't see absent translations you don't know you need to actualize it. I would better see incomplete translation then outdated one.

Okay, I think I understand what you mean!

The messages.pot file is not used when rendering a translation into HTML, but it is used when updating the xx.po file for the xx language.

Where you hoping to have all the PO files updated in lockstep when the English source changes? Something like a GitHub action which runs the following?

MDBOOK_OUTPUT='{"xgettext": {"pot-file": "messages.pot"}}' mdbook build -d po
for pofile in po/*.po; do
    msgmerge --update $pofile po/messages.pot
done
git add po/
git commit -m "Sync all PO files"

That would be possible, but awkward: updating the PO files on behalf of the translators takes away the possibility for the translators to see the diff after a msgmerge.

What I would suggest instead would be to run

for pofile in po/*.po; do
    msgmerge --update $pofile po/messages.pot --quiet
    echo -n "$f "
    msgfmt -o /dev/null --stat $pofile
done

in a GitHub action after every merge. The output will show the number of translated and untranslated messages for each language.

That's a passive way to have the information available. If you want a way to notify translators, then you have to look into using a translation platform such as https://www.transifex.com/. I used them years ago and it looks like they still support Gettext PO files.

@Jzow
Copy link
Author

Jzow commented Aug 1, 2023

I'm glad to see this translation conversion tool, but currently it may require evaluation from members of the embedded resource group. Before officially using it, the solution I can think of is to submit an issue or pr to other translation repositories after synchronizing updates, to alert maintainers of other language repositories for updates

@Jzow
Copy link
Author

Jzow commented Aug 1, 2023

I would better see incomplete translation then outdated one.

Agree, this is the majority of reading habits

@mgeisler
Copy link
Contributor

mgeisler commented Aug 2, 2023

renderer on-demand

This is the issue. If you don't see absent translations you don't know you need to actualize it. I would better see incomplete translation then outdated one.

Agree, this is the majority of reading habits

Reading this again, I realize that I might not have explained myself well above. If you look at https://google.github.io/comprehensive-rust/ko/, you will see that the page has a mix of Korean and English. This is because we re-publish all languages whenever something changes in main. So if a typo is fixed in the English source text, we immediately invalidate the affected paragraph in all translations.

So readers of the book certainly notice that something has become outdated 🙂

I'm considering changing this approach since it's hard work for translators to keep up to date. One approach would be to reuse an existing translation as long as it is more than NN% complete. If set to 95%, the translation would be allowed to degrade from 100% down to 95% after which it would stop updating. Degrading means that more and more paragraphs revert back to English. Stop updating means that the text is frozen at this point — so typo fixes won't flip paragraphs back to English, but actual fixes and additions won't show up either.

I don't know if this was clear in the above?

Before officially using it, the solution I can think of is to submit an issue or pr to other translation repositories after synchronizing updates, to alert maintainers of other language repositories for updates

I think this is a good idea. The whole thing is very independent of the source repository — interested contributors can easily fork the repository and maintain their translations using mdbook-i18n-helpers without any help needed from this repository. If they like the system, then it's easy to merge things back later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants