Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider rename Chinese translations from zh-cn/zh-tw style to zh-Hans/zh-Hant #793

Open
HolgerHuo opened this issue Jan 23, 2024 · 16 comments · May be fixed by #865
Open

Consider rename Chinese translations from zh-cn/zh-tw style to zh-Hans/zh-Hant #793

HolgerHuo opened this issue Jan 23, 2024 · 16 comments · May be fixed by #865
Labels
i18n Issue with i18n translations wip Work in progress

Comments

@HolgerHuo
Copy link

Language

Chinese

What's the issue?

In Hugo's i18n processor (used for i18n datetime string), language code for Chinese is in zh-Hans/zh-Hant style and Congo's i18n config file names are in zh-CN/zh-TW style. This creates a discrepancy in user's language code selection and results in the following scenerio:
image
where datetime(which is handled by hugo) is in English and other stuff like word count (which are handled by congo) are in chinese.

Theme version

main

Hugo version

latest

Relevant Hugo log output

No response

@HolgerHuo HolgerHuo added the i18n Issue with i18n translations label Jan 23, 2024
@tomy0000000
Copy link
Contributor

tomy0000000 commented Feb 5, 2024

I don't see this requirement has anything to do with the code.

I couldn't find anything on Hugo's docs that says localized date and time should use zh-Hans/zh-Hant, but my guess is that it's implemented deeper in Golang rather than Hugo.

Can you maybe cite your source for this information?

Anyway, I did find this snippet on Hugo's forum, which seems like what you're trying to achieve. Hope this helps.

@HolgerHuo
Copy link
Author

Hi, sorry for not having cited the source.

Localization of datetime string (also currencies, etc) is done in this pkg https://github.com/gohugoio/locales , and language code for chinese is in zh-Hans/zh-Hant format.

image

#805 btw, in this pr, zh-han should be renamed to zh-hans to conform to the code used in gohugoio/locales

@tomy0000000
Copy link
Contributor

Ok, I ran some tests and can confirm this does in fact interfere with how Hugo renders dates and times.

However, I want to propose that we should change zh-CN to zh-Hans-CN and zh-TW to zh-Hant-TW for better adaptability to fit customs in different regions.

In addition, we may want to add some documentation to hint to future translation contributors that they should use codes available in gohugoio/locales so we won't bump into the same issue again.

@HolgerHuo
Copy link
Author

In addition, we may want to add some documentation to hint to future translation contributors that they should use codes available in gohugoio/locales so we won't bump into the same issue again.

True. This is really a common mistake because most websites use zh-cn as the de facto language code for Chinese(simplified) although w3c suggests zh-Hans (https://www.w3.org/International/articles/language-tags/ ). Plus there are also styles like zh-cmn (same for zh-Hans), and it is really a little messy here.

However, I want to propose that we should change zh-CN to zh-Hans-CN and zh-TW to zh-Hant-TW for better adaptability to fit customs in different regions.

As per my experience, zh-Hans (the simplified version of Chinese) is normally consistent among its usage regions (China Mainland, Macau and Singapore), there are slight differences for zh-Hant (traditional Chinese) between the Hong Kong dialect (which includes words and phrases in Cantonese) and the rest of its usage. Since current translations have only two of them, zh-Hans and zh-Hant naming should be enough because most language processors will fallback to these two and we can create regional translations when we have them.

@tomy0000000
Copy link
Contributor

As per my experience, zh-Hans (the simplified version of Chinese) is normally consistent among its usage regions (China Mainland, Macau and Singapore), there are slight differences for zh-Hant (traditional Chinese) between the Hong Kong dialect (which includes words and phrases in Cantonese) and the rest of its usage. Since current translations have only two of them, zh-Hans and zh-Hant naming should be enough because most language processors will fallback to these two and we can create regional translations when we have them.

Hmm...that's different from my experience.

I cloned a copy of gohugoio/locales and dug into all zh languages and found them to have multiple differences in ways of formatting date, time, and currencies. Just to name a few examples:

Era Month Chinese Yuan Taiwan Dollar
zh_Hans 公元 一月 CNY TWD
zh_Hans_CN 公元 一月 CNY TWD
zh_Hans_HK 公元 一月 CN¥ TWD
zh_Hant 西元 1月 CN¥ $
zh_Hant_HK 公元 一月 CNY NT$
zh_Hant_TW 公元 一月 CNY TWD

As the author of zh-TW.yaml, I cannot speak for other regions using Traditional Chinese. In addition, I would suggest that they create their own translation to fit their needs.

In this case, I would say that we create four translations: zh_Hans, zh_Hans_CN, zh_Hant, zh_Hant_TW, and let the user choose their favorite.

@jpanther
Copy link
Owner

jpanther commented Mar 9, 2024

This is definitely a topic where I do not have relevant knowledge or experience to make a call on the best way forward. My only concern is that the codes chosen work with Hugo and don't cause any implementation issues but it seems like that is not a problem with this issue.

@HolgerHuo
Copy link
Author

In this case, I would say that we create four translations: zh_Hans, zh_Hans_CN, zh_Hant, zh_Hant_TW, and let the user choose their favorite.

@tomy0000000 Ofc, it would be best to have the separated. Since we currently have not authors for other dialects, we may put up a notice so that people using other dialects could use fallback ones.

@HolgerHuo
Copy link
Author

This is definitely a topic where I do not have relevant knowledge or experience to make a call on the best way forward. My only concern is that the codes chosen work with Hugo and don't cause any implementation issues but it seems like that is not a problem with this issue.

@jpanther Hi! As for the codes chosen, did you mean the formerly chosen ones (zh-TW, zh-CN)? It did work for most of the parts. But when translating using hugo's built-in i18n module (datetime functions, etc), zh-CN and zh-TW won't be recognized as valid languages (although these two should also be listed as a common practice) and it will fallback to EN, as shown above.

@jpanther
Copy link
Owner

I'm happy with the proposal to use zh_Hans, zh_Hans_CN, zh_Hant, zh_Hant_TW if that is the consensus here?

@tomy0000000
Copy link
Contributor

I'm happy with the proposal to use zh_Hans, zh_Hans_CN, zh_Hant, zh_Hant_TW if that is the consensus here?

Sounds good to me, let me know if you need help with this.

@jpanther
Copy link
Owner

Looking at this in more detail now, I'm already confused due to my lack of understanding of the dialects involved. Is the plan then to duplicate zh-CN to both zh-Hans and zh-Hans-CN? If so, what is the difference between the two. I then presume we are moving zh-TW to zh-Hant-TW and then creating a new zh-Hant?

@tomy0000000 I might need to take you up on your offer to help with this!

@tomy0000000
Copy link
Contributor

Is the plan then to duplicate zh-CN to both zh-Hans and zh-Hans-CN?

Yes

If so, what is the difference between the two

Users who prefer a translation localized to China's custom can choose zh_Hans_CN, while others can choose zh_Hans for a more generalized translation. Congo will likely maintain multiple translations with the exact same content, but the differences are observable, say, for instance, date and time.

I then presume we are moving zh-TW to zh-Hant-TW and then creating a new zh-Hant?

Yes, that's also the case.


I'm willing to work on this, but I'm a bit busy for the next couple of days, so please allow me to take on this about 2 to 3 days later.

@jpanther
Copy link
Owner

I'm happy to do simple move and replace, the part I'm confused about is what changes need to be made to distinguish between zh-Hant and zh-Hant-TW or can these both be duplicates just like the zh-CN change?

@jpanther
Copy link
Owner

I guess what I'm proposing is, would this be sufficient? ie. making 2 copies of each of the existing translations...

zh-CN -> zh-Hans
zh-CN -> zh-Hans-CN
zh-TW -> zh-Hant
zh-TW -> zh-Hant-TW

@tomy0000000
Copy link
Contributor

I guess what I'm proposing is, would this be sufficient? ie. making 2 copies of each of the existing translations...

zh-CN -> zh-Hans
zh-CN -> zh-Hans-CN
zh-TW -> zh-Hant
zh-TW -> zh-Hant-TW

Yes, this should be sufficient at the moment

jpanther added a commit that referenced this issue Apr 30, 2024
@jpanther jpanther linked a pull request Apr 30, 2024 that will close this issue
@jpanther
Copy link
Owner

I've started a PR for this and have made the changes above. Have a look and let me know if you think this is sufficient.

@jpanther jpanther added the wip Work in progress label May 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
i18n Issue with i18n translations wip Work in progress
Projects
None yet
3 participants