Skip to content

Commit

Permalink
Bump release (#654)
Browse files Browse the repository at this point in the history
* reviewed ep.59 (#406)

* docs(zh-cn): Reviewed 60_what-is-the-bleu-metric.srt (#407)

* finished review (#408)

* docs(zh-cn): Reviewed 61_data-processing-for-summarization.srt (#409)

* Fix subtitle - translation data processing (#411)

* [FR] Final PR (#412)

* [ko] Add chapter 8 translation (#417)

* docs(zh-cn): Reviewed 62_what-is-the-rouge-metric.srt (#419)

* finished review

* fixed errors in original english subtitle

* fixed errors (#420)

* docs(zh-cn): Reviewed 63_data-processing-for-causal-language-modeling.srt (#421)

* Update 63_data-processing-for-causal-language-modeling.srt

* finished review

* Update 63_data-processing-for-causal-language-modeling.srt

* docs(zh-cn): Reviewed 65_data-processing-for-question-answering.srt (#423)

* finished review

* finished review

* finished review (#422)

* Add Ko chapter2 2.mdx (#418)

* Add Ko chapter2 2.mdx

* [ko] Add chapter 8 translation (#417)

* docs(zh-cn): Reviewed 62_what-is-the-rouge-metric.srt (#419)

* finished review

* fixed errors in original english subtitle

* fixed errors (#420)

* docs(zh-cn): Reviewed 63_data-processing-for-causal-language-modeling.srt (#421)

* Update 63_data-processing-for-causal-language-modeling.srt

* finished review

* Update 63_data-processing-for-causal-language-modeling.srt

* docs(zh-cn): Reviewed 65_data-processing-for-question-answering.srt (#423)

* finished review

* finished review

* finished review (#422)

* Add Ko chapter2 2.mdx

Co-authored-by: IL-GU KIM <[email protected]>
Co-authored-by: Yuan <[email protected]>

* update textbook link (#427)

* Visual fixes (#428)

* finish first round review (#429)

* Fix French subtitles + refactor conversion script (#431)

* Fix subtitles and scripts

* Fix subtitle

* Add tokenizer to MLM Trainer (#432)

* Fix FR video descriptions (#433)

* Fix FR video descriptions

* Rename file

* Fix dead GPT model docs link. (#430)

* Translate into Korean: 2-3 (#434)

Co-authored-by: “Ryan” <“[email protected]”>

* Add korean translation of chapter5 (1,2) (#441)

update toctree for chapter 5 (1, 2)
ensure same title for 5-2
add updates from upstream English with custom anchors

Co-Authored-By: Minho Ryu <[email protected]>

Co-authored-by: Meta Learner응용개발팀 류민호 <[email protected]>
Co-authored-by: Minho Ryu <[email protected]>

* Update 3.mdx (#444)

* docs(zh-cn): Reviewed 67_the-post-processing-step-in-question-answering-(tensorflow).srt (#447)

* Update 67_the-post-processing-step-in-question-answering-(tensorflow).srt

* finished review

* docs(zh-cn): Reviewed 66_the-post-processing-step-in-question-answering-(pytorch).srt (#448)

* Update 66_the-post-processing-step-in-question-answering-(pytorch).srt

* finished review

* refined translation

* docs(zh-cn): Reviewed 01_the-pipeline-function.srt (#452)

* finish review

* Update subtitles/zh-CN/01_the-pipeline-function.srt

Co-authored-by: Luke Cheng <[email protected]>

Co-authored-by: Luke Cheng <[email protected]>

* finish review (#453)

* Revise some unnatural translations (#458)

Some unnatural translations have been revised to use expressions more popular with Chinese readers

* Fix chapter 5 links (#461)

* fix small typo (#460)

* Add Ko chapter2 3~8.mdx & Modify Ko chapter2 2.mdx typo (#446)

* Add captions for tasks videos (#464)

* Add captions for tasks videos

* Fix script

* [FR] Add 🤗  Tasks videos (#468)

* Synchronous Chinese course update

Update the Chinese Course document to
sha:f71cf6c3b4cb235bc75a14416c6e8a57fc3d00a7
sha date: 2023/01/06 00:02:26 UTC+8

* review sync

* Update 3.mdx

* format zh_CN

* format all mdx

* Remove temp folder

* finished review (#449)

* docs(zh-cn): Reviewed 31_navigating-the-model-hub.srt (#451)

* docs(zh-cn): Reviewed No. 08 - What happens inside the pipeline function? (PyTorch) (#454)

* docs(zh-cn): Reviewed 03_what-is-transfer-learning.srt (#457)

* docs(zh-cn): 32_managing-a-repo-on-the-model-hub.srt (#469)

* docs(zh-cn): Reviewed No. 10 - Instantiate a Transformers model (PyTorch) (#472)

* update Chinese translation

有一些英文句子与中文语序是相反的,我直接按照最终的中文语序排列了,这样是否可以?

* finish first round review

* finish second round review

* finish second round review

* branch commit

* Update subtitles/zh-CN/10_instantiate-a-transformers-model-(pytorch).srt

Co-authored-by: Luke Cheng <[email protected]>

* Update subtitles/zh-CN/10_instantiate-a-transformers-model-(pytorch).srt

Co-authored-by: Luke Cheng <[email protected]>

---------

Co-authored-by: Luke Cheng <[email protected]>

* docs(zh-cn): 33_the-push-to-hub-api-(pytorch).srt (#473)

* docs(zh-cn): Reviewed 34_the-push-to-hub-api-(tensorflow).srt (#479)

* running python utils/code_formatter.py

* review 05 cn translations

* review 06 cn translations

* Review No.11

* translate no.24

* review 06 cn translations

* review 07 cn translations

* Update 23_what-is-dynamic-padding.srt

* Update 23_what-is-dynamic-padding.srt

* Update 23_what-is-dynamic-padding.srt

* Update subtitles/zh-CN/23_what-is-dynamic-padding.srt

Co-authored-by: Luke Cheng <[email protected]>

* Update subtitles/zh-CN/23_what-is-dynamic-padding.srt

Co-authored-by: Luke Cheng <[email protected]>

* add blank

* Review No. 11, No. 12

* Review No. 13

* Review No. 12

* Review No. 14

* finished review

* optimized translation

* optimized translation

* docs(zh-cn): Reviewed No. 29 - Write your training loop in PyTorch

* Review 15

* Review 16

* Review 17

* Review 18

* Review ch 72 translation

* Update 72 cn translation

* To be reviewed No.42-No.54

* No.11 check-out

* No.12 check-out

* No. 13 14 check-out

* No. 15 16 check-out

* No. 17 18 check-out

* Add note for "token-*"

* Reviewed No.8, 9, 10

* Reviewed No.42

* Review No.43

* finished review

* optimized translation

* finished review

* optimized translation

* Review 44(need refine)

* Review 45(need refine)

* Review No. 46 (need refine)

* Review No.47

* Review No.46

* Review No.45

* Review No.44

* Review No.48

* Review No.49

* Review No.50

* Modify Ko chapter2 8.mdx (#465)

* Add Ko chapter2 2.mdx

* Add Ko chapter2 2.mdx

* Add Ko chapter2 3.mdx & 4.mdx

* Modify Ko chapter2 3.mdx & 4.mdx

* Modify Ko chapter2 3.mdx & 4.mdx

* Modify Ko chapter2 3.mdx & 4.mdx

* Modify _toctree.yml

* Add Ko chapter2 5.mdx

* Modify Ko chapter2 4.mdx

* Add doc-builder step

* Add Ko chapter2 6~8.mdx & Modify Ko chapter2 2.mdx typo

* Modify Ko _toctree.yml

* Modify Ko chapter2 8.mdx & README.md

* Fixed typo (#471)

* fixed subtitle errors (#474)

timestamp: 00:00:26,640 --> 00:00:28,620
modification: notification --> authentication

timestamp: 00:04:21,113 --> 00:04:22,923
modification: of --> or

* Fixed a typo (#475)

* Update 3.mdx (#526)

Fix typo

* [zh-TW] Added chapters 1-9 (#477)

The translation is based on Simplified Chinese version, converted via OpenCC and fixed some formatting issues.

* finished review

* Explain why there are more tokens, than reviews (#476)

* Explain why there are more tokens, than reviews

* Update chapters/en/chapter5/3.mdx

---------

Co-authored-by: lewtun <[email protected]>

* [RU] Subtitles for Chapter 1 of the video course (#489)

* Created a directory for the russian subtitles.

Created a folder for Russian subtitles for the video course and published a translation of the introductory video from chapter 1.

* Uploaded subtitles for chapter 1

Uploaded subtitles for the remaining videos for chapter 1 of the video course.

* Added subtitles for chapter 2 of the video course

Added STR subtitle files for the second chapter of the YouTube video course.

* Delete subtitles/ru directory

Removed the old translation. Incorrect timestamping.

* Create 00_welcome-to-the-hugging-face-course.srt

Create a directory and upload a subtitle file for the introductory video of the course.

* Add files via upload

Upload subtitle files for the first chapter of the course.

* Review No.52

* [ru] Added the glossary and translation guide (#490)

* Added the glossary and translation guide

* Fixed casing

* Minor fixes

* Updated glossary

* Glossary update

* Glossary update

* Glossary update

* [ru] Chapters 0 and 1 proofreading, updating and translating missing sections (#491)

* Chapter 0 proofreading

* Chapter 1 Section 1 proofreading
- Added new people from English version;
- Added links to creator's pages;
- Added FAQ translation;

* Chapter 1 Sections 2-5 proofreading

* Chapter 1 Sections 6-9 proofreading

* Final proofreading and added missing quiz section

* Minor spelling corrections

* Review No.51

* Review No.53

* Review No.54

* finished review

* modified translation

* modified translation

* modified subtitle

use the same text appeared in video

* translated

* Fix typo (#532)

* review chapter4/2

* review chapter4/2

* review chapter4/2

* Review 75

* Review No.20, need review some

* docs(zh-cn): Reviewed Chapter 7/1

* Update 1.mdx

* Review No.22

* Review No.21 (need refinement)

* Review No.30, need review: 26 27 28 30 73 74

* Review 30 (good)

* Review 20

* Review 21 (refine)

* Review 21

* Review 22

* Review 26

* Review 27

* Review 28

* Review 30

* Review 73

* Review 74

* Fix typo

* Review 26-28, 42-54, 73-75

* The GPT2 link is broken

The link `/course/en/chapter7/section6` does not exist in the course.  Corrected to `/course/en/chapter7/6`.

* typo in `Now your turn!` section

Duplicated `the` was removed

* `chunk_size` should be instead of `block_size` 

`chunk_size` should be instead of `block_size` (`block_size` was never mentioned before)

* refactor: rephrase text to improve clarity and specificity

In context to "training with a dataset specific to your task" and "train directly for the final task", I was not able to infer easily that "directly" here implies training from scratch.

* Demo link fixes (#562)

* demo link fixes

* minor demo fix

* Bump release (#566)

* Add note about `remove_unused_columns` for whole word masking

* Merge pull request #24 from huggingface/fix-typo

Fix typo

* Merge pull request #26 from huggingface/fix-qa-offsets

Fix inequalities in answer spans for QA chapter

* Merge pull request #30 from huggingface/fix-modelcard-url

Update model card URL

* Merge pull request #69 from yulonglin/patch-1

Correct typo mixing up space and newline symbols

* Bump release (#99)

* Bump release
* Update author list

Co-authored-by: DOOHAE JUNG <[email protected]>
Co-authored-by: m_khandaker <[email protected]>
Co-authored-by: Md. Al-Amin Khandaker <[email protected]>
Co-authored-by: ftarlaci <[email protected]>
Co-authored-by: Doohae Jung <[email protected]>
Co-authored-by: melaniedrevet <[email protected]>

* Bump release  (#115)

* ko-chapter1/1

* ko _toctree.yml created

* Fix the issue #80

* Single expression changed

* ko/chapter1 finished

* ko/chapter0 finished

* ko/chapter0 finished

* reviewed by @bzantium ko/chapter0

* reviewed by @bzantium chapter0 & fixed typo

* reviewed by @rainmaker712

* maximize Korean expressions

* [Chapter 1] bangla traslation initial commit

* Update 1.mdx

update and fix formating

* Fix formating and typos

* translate _toctree.yml 0-1 chapter

* Add Korean to CI

* [tr] Translated chapter1/2.mdx

* remove translation from sec titles not yet translated

* Add authors [th ru]

* [FIX] _toctree.yml

* Update chapters/bn/chapter0/1.mdx

[FIX] syntax formatting

Co-authored-by: lewtun <[email protected]>

* tag typos & indentation & unnatural expressions

* modified toctree.yml for chapter1/2

* modified toctree.yml for chapter1/2 & fix typo

* French Translation - Chapter 5

* Add Bengali to CI

* Update author list

* Adding translations for 2/4 and 2/5 🚀 (#74)

* Adding translations for 2/4 and 2/5 🚀

* Remove English content

Co-authored-by: lewtun <[email protected]>

* Translation to Russian (#97)

* translation of chapter 2/section 1

* add section 1 / chapter 2 to _toctree.yml

* Translation of Chapter0 to Hindi (#86)

* Hindi?Chapter0-Part_1

* Hindi/Chapter0-Part_2

* Chapter 0 Persian Translation First Draft (#95)

* merged branch0 into main. no toctree yet.

* updated toctree.

* Updated the glossary with terms from chapter0.

* Second draft in collab w/ @schoobani. Added empty chapter1 for preview.

* Glossary typo fix.

* Translation of Chapter0 (setup) to Arabic (#104)

* Add AR translation for `introduction`

* Fix alignment & format

* Add Arabic to CI build

* Russian - Chapter 1 finished (#98)

* 01/4 start

* 1/4 finished

* 1/5 finished

* 1/5 update toc

* 1/6 finished

* 1/7 finished

* 1/8 finished

* 1/9 finished

* 1/4 fix

* toc update

* Chinese - Chapter 1 finished (#113)

* Chinese - Chapter 1 finished

* Add zh to the languages field

Co-authored-by: petrichor1122 <[email protected]>
Co-authored-by: zhlhyx <[email protected]>

* [PT] Translation of chapter 2 (#107)

* add PT translate to 2.1

* add PT translate to 2.2

* add portuguese translation to 2.3

* WIP portuguese translation to 2.4

* add portuguese translation to 2.4

* add portuguese translation to 2.5

* add portuguese translation to 2.6

* add _toctree infos

Co-authored-by: lewtun <[email protected]>

* [FR] Translation of chapter 2 & event + Review of chapters 0 & 5 (#106)

* Update _toctree.yml

Add chapter 2 
+ little fix of chapter 5

* Update 1.mdx

Review of chapter 0

* Create 1.mdx

* Create 2.mdx

* Create 3.mdx

* Create 4.mdx

* Create 5.mdx

* Create 6.mdx

* Create 7.mdx

* Create 8.mdx

* Update 8.mdx

Since AutoNLP has recently been renamed to AutoTrain, let me make the correction on the English file

* Update 1.mdx

Review of chapter 5/1

* Update 2.mdx

Review of chapter 5/2

* Update 3.mdx

Review of chapter 5/3

* Update 4.mdx

Review of chapter 5/4

* Update 5.mdx

Review of chapter 5/5

* Update 6.mdx

Review of chapter 5/6

* Update 7.mdx

Review of chapter 5/7

* Update 8.mdx

Review of chapter 5/8

* Create 1.mdx

event's translation

* Update _toctree.yml

add event to the tree

* Update _toctree.yml

deletion of the files that pose a problem to pass the checks, will be resubmitted in another PR

* Delete 1.mdx

deletion of the files that pose a problem to pass the checks, will be resubmitted in another PR

* make style correction

* Update _toctree.yml

the -

* Fix spacing

Co-authored-by: Lewis Tunstall <[email protected]>

* [th] Translated Chapter2/1 (#83)

* Finish chapter2/1

* Update _toctree.yml

* ko-chapter1/1

* ko _toctree.yml created

* Fix the issue #80

* Single expression changed

* ko/chapter1 finished

* ko/chapter0 finished

* ko/chapter0 finished

* reviewed by @bzantium ko/chapter0

* reviewed by @bzantium chapter0 & fixed typo

* reviewed by @rainmaker712

* maximize Korean expressions

* [Chapter 1] bangla traslation initial commit

* Update 1.mdx

update and fix formating

* Fix formating and typos

* translate _toctree.yml 0-1 chapter

* Add Korean to CI

* remove translation from sec titles not yet translated

* Add authors [th ru]

* [FIX] _toctree.yml

* Update chapters/bn/chapter0/1.mdx

[FIX] syntax formatting

Co-authored-by: lewtun <[email protected]>

* tag typos & indentation & unnatural expressions

* modified toctree.yml for chapter1/2

* modified toctree.yml for chapter1/2 & fix typo

* Add Bengali to CI

* Update author list

* Adding translations for 2/4 and 2/5 🚀 (#74)

* Adding translations for 2/4 and 2/5 🚀

* Remove English content

Co-authored-by: lewtun <[email protected]>

* Translation to Russian (#97)

* translation of chapter 2/section 1

* add section 1 / chapter 2 to _toctree.yml

* Translation of Chapter0 to Hindi (#86)

* Hindi?Chapter0-Part_1

* Hindi/Chapter0-Part_2

* Chapter 0 Persian Translation First Draft (#95)

* merged branch0 into main. no toctree yet.

* updated toctree.

* Updated the glossary with terms from chapter0.

* Second draft in collab w/ @schoobani. Added empty chapter1 for preview.

* Glossary typo fix.

* Translation of Chapter0 (setup) to Arabic (#104)

* Add AR translation for `introduction`

* Fix alignment & format

* Add Arabic to CI build

* Russian - Chapter 1 finished (#98)

* 01/4 start

* 1/4 finished

* 1/5 finished

* 1/5 update toc

* 1/6 finished

* 1/7 finished

* 1/8 finished

* 1/9 finished

* 1/4 fix

* toc update

* Chinese - Chapter 1 finished (#113)

* Chinese - Chapter 1 finished

* Add zh to the languages field

Co-authored-by: petrichor1122 <[email protected]>
Co-authored-by: zhlhyx <[email protected]>

* [PT] Translation of chapter 2 (#107)

* add PT translate to 2.1

* add PT translate to 2.2

* add portuguese translation to 2.3

* WIP portuguese translation to 2.4

* add portuguese translation to 2.4

* add portuguese translation to 2.5

* add portuguese translation to 2.6

* add _toctree infos

Co-authored-by: lewtun <[email protected]>

* [FR] Translation of chapter 2 & event + Review of chapters 0 & 5 (#106)

* Update _toctree.yml

Add chapter 2 
+ little fix of chapter 5

* Update 1.mdx

Review of chapter 0

* Create 1.mdx

* Create 2.mdx

* Create 3.mdx

* Create 4.mdx

* Create 5.mdx

* Create 6.mdx

* Create 7.mdx

* Create 8.mdx

* Update 8.mdx

Since AutoNLP has recently been renamed to AutoTrain, let me make the correction on the English file

* Update 1.mdx

Review of chapter 5/1

* Update 2.mdx

Review of chapter 5/2

* Update 3.mdx

Review of chapter 5/3

* Update 4.mdx

Review of chapter 5/4

* Update 5.mdx

Review of chapter 5/5

* Update 6.mdx

Review of chapter 5/6

* Update 7.mdx

Review of chapter 5/7

* Update 8.mdx

Review of chapter 5/8

* Create 1.mdx

event's translation

* Update _toctree.yml

add event to the tree

* Update _toctree.yml

deletion of the files that pose a problem to pass the checks, will be resubmitted in another PR

* Delete 1.mdx

deletion of the files that pose a problem to pass the checks, will be resubmitted in another PR

* make style correction

* Update _toctree.yml

the -

* Fix spacing

Co-authored-by: Lewis Tunstall <[email protected]>

* [th] Translated Chapter2/1 (#83)

* Finish chapter2/1

* Update _toctree.yml

* Add Hindi to CI (#116)

Co-authored-by: DOOHAE JUNG <[email protected]>
Co-authored-by: m_khandaker <[email protected]>
Co-authored-by: Md. Al-Amin Khandaker <[email protected]>
Co-authored-by: ftarlaci <[email protected]>
Co-authored-by: Doohae Jung <[email protected]>
Co-authored-by: melaniedrevet <[email protected]>
Co-authored-by: Jose M Munoz <[email protected]>
Co-authored-by: svv73 <[email protected]>
Co-authored-by: Vedant Pandya <[email protected]>
Co-authored-by: Bahram Shamshiri <[email protected]>
Co-authored-by: Giyaseddin Bayrak <[email protected]>
Co-authored-by: Pavel <[email protected]>
Co-authored-by: 1375626371 <[email protected]>
Co-authored-by: petrichor1122 <[email protected]>
Co-authored-by: zhlhyx <[email protected]>
Co-authored-by: João Gustavo A. Amorim <[email protected]>
Co-authored-by: lbourdois <[email protected]>
Co-authored-by: Cherdsak Kingkan <[email protected]>

* Bump release 4 (#133)

* Bump release (#138)

* ko-chapter1/1

* ko _toctree.yml created

* Fix the issue #80

* Single expression changed

* ko/chapter1 finished

* ko/chapter0 finished

* ko/chapter0 finished

* reviewed by @bzantium ko/chapter0

* reviewed by @bzantium chapter0 & fixed typo

* reviewed by @rainmaker712

* maximize Korean expressions

* [Chapter 1] bangla traslation initial commit

* Update 1.mdx

update and fix formating

* Fix formating and typos

* translate _toctree.yml 0-1 chapter

* Add Korean to CI

* [tr] Translated chapter1/2.mdx

* remove translation from sec titles not yet translated

* Add authors [th ru]

* [FIX] _toctree.yml

* Update chapters/bn/chapter0/1.mdx

[FIX] syntax formatting

Co-authored-by: lewtun <[email protected]>

* tag typos & indentation & unnatural expressions

* modified toctree.yml for chapter1/2

* modified toctree.yml for chapter1/2 & fix typo

* French Translation - Chapter 5

* Add Bengali to CI

* Update author list

* Adding translations for 2/4 and 2/5 🚀 (#74)

* Adding translations for 2/4 and 2/5 🚀

* Remove English content

Co-authored-by: lewtun <[email protected]>

* Translation to Russian (#97)

* translation of chapter 2/section 1

* add section 1 / chapter 2 to _toctree.yml

* Translation of Chapter0 to Hindi (#86)

* Hindi?Chapter0-Part_1

* Hindi/Chapter0-Part_2

* Chapter 0 Persian Translation First Draft (#95)

* merged branch0 into main. no toctree yet.

* updated toctree.

* Updated the glossary with terms from chapter0.

* Second draft in collab w/ @schoobani. Added empty chapter1 for preview.

* Glossary typo fix.

* Translation of Chapter0 (setup) to Arabic (#104)

* Add AR translation for `introduction`

* Fix alignment & format

* Add Arabic to CI build

* Russian - Chapter 1 finished (#98)

* 01/4 start

* 1/4 finished

* 1/5 finished

* 1/5 update toc

* 1/6 finished

* 1/7 finished

* 1/8 finished

* 1/9 finished

* 1/4 fix

* toc update

* Chinese - Chapter 1 finished (#113)

* Chinese - Chapter 1 finished

* Add zh to the languages field

Co-authored-by: petrichor1122 <[email protected]>
Co-authored-by: zhlhyx <[email protected]>

* [PT] Translation of chapter 2 (#107)

* add PT translate to 2.1

* add PT translate to 2.2

* add portuguese translation to 2.3

* WIP portuguese translation to 2.4

* add portuguese translation to 2.4

* add portuguese translation to 2.5

* add portuguese translation to 2.6

* add _toctree infos

Co-authored-by: lewtun <[email protected]>

* [FR] Translation of chapter 2 & event + Review of chapters 0 & 5 (#106)

* Update _toctree.yml

Add chapter 2 
+ little fix of chapter 5

* Update 1.mdx

Review of chapter 0

* Create 1.mdx

* Create 2.mdx

* Create 3.mdx

* Create 4.mdx

* Create 5.mdx

* Create 6.mdx

* Create 7.mdx

* Create 8.mdx

* Update 8.mdx

Since AutoNLP has recently been renamed to AutoTrain, let me make the correction on the English file

* Update 1.mdx

Review of chapter 5/1

* Update 2.mdx

Review of chapter 5/2

* Update 3.mdx

Review of chapter 5/3

* Update 4.mdx

Review of chapter 5/4

* Update 5.mdx

Review of chapter 5/5

* Update 6.mdx

Review of chapter 5/6

* Update 7.mdx

Review of chapter 5/7

* Update 8.mdx

Review of chapter 5/8

* Create 1.mdx

event's translation

* Update _toctree.yml

add event to the tree

* Update _toctree.yml

deletion of the files that pose a problem to pass the checks, will be resubmitted in another PR

* Delete 1.mdx

deletion of the files that pose a problem to pass the checks, will be resubmitted in another PR

* make style correction

* Update _toctree.yml

the -

* Fix spacing

Co-authored-by: Lewis Tunstall <[email protected]>

* [th] Translated Chapter2/1 (#83)

* Finish chapter2/1

* Update _toctree.yml

* Add Hindi to CI (#116)

* Update README.md (#87)

* Update authors on README (#120)

* Update authors

* French translation `Chapter1` full (#56)


* traduction 1st part of chapter1

* fix typo

* fix job titles and encoder-decoder translation

* add part 2 for 1st chapter

* fix some typo part2

* fix Transformer -> Transformers

* add part 3 not totally ended

* end of part3 of chapter1

* part9 chapter 1

* add part7 chapter 1

* add part5 chapter 1

* part 6 chapter 1

* add part8 chapter 1

* end quizz of chapter

* add last part of chapter 1

Co-authored-by: ChainYo <[email protected]>

* Translate to Japanese Chapter0 (#123)

* start working

* translate chapter0/1.mdx

* [FA] First draft of Chapter2/Page2 (#129)

* merged branch0 into main. no toctree yet.

* updated toctree.

* Updated the glossary with terms from chapter0.

* Second draft in collab w/ @schoobani. Added empty chapter1 for preview.

* Glossary typo fix.

* Added missing backticks.

* Removed a couple of bad indefinite articles I forgot.

* First draft of ch2/p2. Adds to glossary. Trans. guidelines moved out.

* Fixed missing diacritics, fixed the py/tf switch placing. Other fixes.

* Changed the equivalent for prediction per @kambizG 's direction.

* Redid ambiguous passage in translation per @lewtun 's direction.

* [th] Finished whole Chapter 2 translation (#127)

* Finish chapter2/1

* delete untranslated files

* chapter2/2 WIP

* Delete WIP files

* WIP chapter2/2

* Fixed conflict

* Update _toctree.yml

* Update _toctree.yml

* Finished Chapter2/2

* Finish all chapter2/n

* Finish all chapter2/n

* Fixed Ch2/8 as PR run failed

* [de] Translation Chapter 0 (#130)

* Copy files to newly created german dir (de)

* Add translation for chapter 0

* Clean up english files for chapter 1

* Change _toctree.yml for chapter 0

* Fix whitespaces

* Fix whitespaces again

* Adjust _toctree.yml - leave only chaper 0

* Add German language (de) to workflow yaml files

* [de] German Translation Guide (#132)

* German Translation Guide

* Add German Glossary to TOC

* Chapter 1, Section 1 Bengali translation (#124)

* [ADD] Chapter 1, Section 1 benglai tranlation

* [FIX] toc

* [FIX] commit mistakes

* [FIX] remove the Eng duplicates

Co-authored-by: m_khandaker <[email protected]>

* [FR] Review of chapters 0, 2 & 5 + add chapters 6, 7, 8 & event (#125)

* Create 1.mdx

Event translation

* Create 1.mdx

* Chapter 6 in French

* Update 1.mdx

fix italic

* Update 9.mdx

fix italic

* Update 3.mdx

fix italic

* Update 4.mdx

fix italic

* Update 4.mdx

* Update 1.mdx

little fix

* Update 2.mdx

little fix

* Update 4.mdx

fix italic

* Update 8.mdx

fix italic

* Update 1.mdx

little fix

* Update 2.mdx

little fix

* Update 3.mdx

little fix

* Update 5.mdx

little fix

* Update 7.mdx

little fix

* Update 8.mdx

little fix

* add chapter8

* Update 6.mdx

fix italic

* Update 3.mdx

fix, fix everywhere

* Update 2.mdx

fix, fix everywhere

* Update 4.mdx

fix, fix everywhere

* Update 4_tf.mdx

fix, fix everywhere

* Add files via upload

add chapter 7

* Update 1.mdx

fix links

* Update 2.mdx

fix, fix everywhere

* Update 3.mdx

fix, fix everywhere

* Update 4.mdx

fix, fix everywhere

* Update 5.mdx

* Update 6.mdx

fix, fix everywhere

* Update 7.mdx

fix, fix everywhere

* Update 3.mdx

fix link

* Update 8.mdx

fix link

* Update 2.mdx

fix link

* Update 4.mdx

little fix

* Update 6.mdx

* Update 7.mdx

* Update 8.mdx

fix

* Update 2.mdx

little fix

* Update 3.mdx

little fix

* Update 5.mdx

* Update 4_tf.mdx

little fix

* Update _toctree.yml

Forgot the toctree

* Update _toctree.yml

fix local fields

* Update _toctree.yml

My bad, I forgot some 🙃

* Update 7.mdx

I don't know why it was there...

* Update 1.mdx

* [de] Chapter 3 translation (#128)

* chapter 3 part 1 DE

* [DE] Chapter 3 - Part 2

* Prepare TOC-Tree

* Fein-tuning

* Initial translation

* Glossary additions for C3P3

* C3P2 style

* [de] Chapter 3 P3-TF initial translation

* [de] Chapter 3 P4 initial translation

* [de] Chapter 3 Part 5 initial translation

* [de] Chapter 3 P6 Initial translation

* Missing commas

* fixing quotes

* Mark third option on chapter 8, question 8 as correct (#135)

* doc_change(translation): translating course from english to gujarati (#126)

* change(translation): chapter0 to gujarati

content translated: Chapter0/1.mdx - Introduction

commit-by: [email protected]

* Revert "change(translation): chapter0 to gujarati"

This reverts commit c27e06992af8892687f343a19368ce322d69e8b2.

* doc_change(translation): translation to gj

translated content: chapters/gj/chapter0.mdx - introduction

* doc_change(translation): translation to gj

translated content: chapters/gj/chapter0.mdx - introduction

* Delete _toctree.yml

* change: adding gj to github workflow

* nit: fix heading

* Update authors (#136)

* [FA] First draft of Chapter4/Page1 (#134)

* added chapter4 title and it's first section

* added first draft of Chapter4/Page1

* minor fix

* updated the title according to convention

* applied some updates according to convention

* added footnotes, minor improvements

* applied tweaks according to review points

* the new draft of glossary according to PR #134

* fixed an inconsistant title

* minor fix for better compatibility with T points

* applied final touches for this round of G updates

* [FR] End of chapter 3 + chapter 4  (#137)

* add chapters 3 & 4

* Update 2.mdx

fix links

* Update 3.mdx

some fix

* Update 6.mdx

fix tag

* Update 3.mdx

add link to chapter 7

* Update 3_tf.mdx

add link to chapter 7

* Update _toctree.yml

Co-authored-by: DOOHAE JUNG <[email protected]>
Co-authored-by: m_khandaker <[email protected]>
Co-authored-by: Md. Al-Amin Khandaker <[email protected]>
Co-authored-by: ftarlaci <[email protected]>
Co-authored-by: Doohae Jung <[email protected]>
Co-authored-by: melaniedrevet <[email protected]>
Co-authored-by: Jose M Munoz <[email protected]>
Co-authored-by: svv73 <[email protected]>
Co-authored-by: Vedant Pandya <[email protected]>
Co-authored-by: Bahram Shamshiri <[email protected]>
Co-authored-by: Giyaseddin Bayrak <[email protected]>
Co-authored-by: Pavel <[email protected]>
Co-authored-by: 1375626371 <[email protected]>
Co-authored-by: petrichor1122 <[email protected]>
Co-authored-by: zhlhyx <[email protected]>
Co-authored-by: João Gustavo A. Amorim <[email protected]>
Co-authored-by: lbourdois <[email protected]>
Co-authored-by: Cherdsak Kingkan <[email protected]>
Co-authored-by: Thomas Chaigneau <[email protected]>
Co-authored-by: ChainYo <[email protected]>
Co-authored-by: hiromu <[email protected]>
Co-authored-by: Cherdsak Kingkan <[email protected]>
Co-authored-by: Marcus Fraaß <[email protected]>
Co-authored-by: Jesper Dramsch <[email protected]>
Co-authored-by: amyeroberts <[email protected]>
Co-authored-by: Ash <[email protected]>
Co-authored-by: Hamed Homaei Rad <[email protected]>

* Bump release (#147)

* Bump release (#161)

* Fix typos in chapter 9 (#176) (#180)

Co-authored-by: regisss <[email protected]>

* Bump release (#187)

* Chapter 2 Section 1 Bengali Translation (huggingface#72) (#168)

* [TH] Chapter 6 Section 1 and 2 (#171)

Co-authored-by: Suteera <[email protected]>

* [FA] CH1 / P1-2 (#142)

* Spanish Chapter 3: sections 1 & 2 (#162)

* fix typos in bpe, wordpiece, unigram (#166)

* [FR] French Review (#186)

* Part 7: Training a causal... fixes (#179)

* typo & error mitigation

* consistency

* Trainer.predict() returns 3 fields

* ran make style

* [TR] Translated Chapter 1.6 🤗 (#185)

* added chapter 1/6 to _toctree.yml

* [TR] Translated Chapter 1.6 🤗

Co-authored-by: Avishek Das <[email protected]>
Co-authored-by: Suteera  Seeha <[email protected]>
Co-authored-by: Suteera <[email protected]>
Co-authored-by: Saeed Choobani <[email protected]>
Co-authored-by: Fermin Ordaz <[email protected]>
Co-authored-by: Kerem Turgutlu <[email protected]>
Co-authored-by: lbourdois <[email protected]>
Co-authored-by: Sebastian Sosa <[email protected]>
Co-authored-by: tanersekmen <[email protected]>

* Bump release 10 (#194)

* Bump release (#195)

* Bump release 12 (#209)

* Bump release (#224)

* Bump release (#229)

* Bump release (#236)

* Bump release (#258)

* Bump release (#270)

* Bump release (#274)

* Bump release (#286)

* Bump release (#288)

* Chapter 2 Section 1 Bengali Translation (huggingface#72) (#168)

* [TH] Chapter 6 Section 1 and 2 (#171)

Co-authored-by: Suteera <[email protected]>

* [FA] CH1 / P1-2 (#142)

* Spanish Chapter 3: sections 1 & 2 (#162)

* fix typos in bpe, wordpiece, unigram (#166)

* [FR] French Review (#186)

* Part 7: Training a causal... fixes (#179)

* typo & error mitigation

* consistency

* Trainer.predict() returns 3 fields

* ran make style

* [TR] Translated Chapter 1.6 🤗 (#185)

* added chapter 1/6 to _toctree.yml

* [TR] Translated Chapter 1.6 🤗

* [PT][Chapter 01 - 2.mdx] - issue #51 (#170)

* Fix Gradio ToC (#193)

* Add Gradio authors and Blocks event (#189)

* Update 6.mdx (#188)

Correct link to Transformer XL doc

* Add translating notes and glossary to Spanish (#192)

* Add translating notes and glosary to Spanish

* Adding glossary to the toc

* add pt 4.3 (#191)

* [FR] Visual corrections (#190)

* [PT] add chapter 4.4 and 4.5 (#196)

* fix invite discord link (#197)

* [FA] Second draft of CH2/P1-2 (#139)

* added chapter3 in hindi (#198)

* [TR] Chapter 3/1 (#165)

* [RU] Ch3-1/2/3 (#200)

* [PT] add 5.1 and 5.2 (#204)

* [FA] - Ch3 - P1 and P2 (#199)

* [PT] add `end-of-chapter quiz` for chapter 4 (4.6) (#201)


Co-authored-by: lewtun <[email protected]>

* Chapter1: 2.mdx Translated. (#206)

* Remove comments from Persian ToC (#210)

* Fix CI URL for PRs (#211)

* code fragment & english syntax and meaning (#203)

* Updated Ch1/1 with Emoji (#214)

* Add missing numpy import (#217)

* [ES] translate sections 8.1 and 8.2 (#215)

* Fix path to datasets (#216)

* [PT] add 5.3 (#218)

* fix 4.3 (#223)

* Fix notebook generation (#227)

* Add Gradio nb links

* add 5.4 (#226)

* add pt wip (#225)

* Added Gujarati List. (#221)

* Add Gradio nbs links to fr (#228)

* Chinese - Chapter 3finished (#219)

* add ch7 at _toctree and translate 7.1 (#222)

* add 5.5 (#235)

* [FR] Review of chapter 7 (#233)

* Italian translation - chapter 4 (#230)

* Added Thai translation of chapters 3 (#231)

* [Ru] Add part 2, chapter 2 (#234)

* Update 8.mdx (#237)

- Remove Gradio Blocks Party
- Add, Where to next? section

* Created HI/Chapter1/5.mdx (#232)

* Add Spanish chaper3/section4, update toc and glossary (#238)

* [RU] Chapter 3 finished (#239)

* [PT] add 5.6 and 5.7 (#240)

* [EN] Visual corrections (#245)

* Translation for 1/4, 1/5 and 1/6. (#247)

* add event in PT (#250)

* Pin version of black (#252)

* Translate ja event (#241)

* [PT] add quiz chapter 5 (#243)

* Update 5.mdx (#253)

inconsistent naming with line 327

* Translation for Traditional Chinese (zh-tw) chapter0  (#251)


Co-authored-by: Lewis Tunstall <[email protected]>

* Translated the whole Chapter 3 to Thai  (#255)

* Japanese chapter 4 (#244)

* Translation of 1/7, 1/8, and 1/9. (#263)

* [PT] add chapter  8.1 and 8.2 (#265)

* [RU] Chapter 4  (#269)

* Add Thai translation for chapter 6.3b to 6.10 (#268)

* add 8.3 (#266)

* 3.mdx of chapter 01 (#260)

Co-authored-by: Lewis Tunstall <[email protected]>

* Fix typo (#271)

* [PT] add chapter 6.1 (#273)

* add Japanese chapter7 (#267)

* replace `load_metric` with `evaluate.load` (#285)

* update `load_metric` refs to `evaluate.load`

Co-authored-by: lewtun <[email protected]>

* [GJ] Translation to Gujarati - Ch0 Setup (#287)

* [PT] add chapter 6.2 and 6.3 (#279)

* zh-CN - Chapter 4,5finished (#281)

Co-authored-by: Lewis Tunstall <[email protected]>

* Chapter 01 - Done [PT] #51 (#280)

Co-authored-by: Lewis Tunstall <[email protected]>

Co-authored-by: Avishek Das <[email protected]>
Co-authored-by: Suteera  Seeha <[email protected]>
Co-authored-by: Suteera <[email protected]>
Co-authored-by: Saeed Choobani <[email protected]>
Co-authored-by: Fermin Ordaz <[email protected]>
Co-authored-by: Kerem Turgutlu <[email protected]>
Co-authored-by: lbourdois <[email protected]>
Co-authored-by: Sebastian Sosa <[email protected]>
Co-authored-by: tanersekmen <[email protected]>
Co-authored-by: Victor Costa <[email protected]>
Co-authored-by: Camille Couturier <[email protected]>
Co-authored-by: João Gustavo A. Amorim <[email protected]>
Co-authored-by: Bahram Shamshiri <[email protected]>
Co-authored-by: Kavya <[email protected]>
Co-authored-by: Batuhan Ayhan <[email protected]>
Co-authored-by: Pavel <[email protected]>
Co-authored-by: Kambiz Ghoorchian <[email protected]>
Co-authored-by: Vedant Pandya <[email protected]>
Co-authored-by: Diego Vargas <[email protected]>
Co-authored-by: Thomas O'Brien <[email protected]>
Co-authored-by: Lincoln V Schreiber <[email protected]>
Co-authored-by: 1375626371 <[email protected]>
Co-authored-by: Giorgio Severi <[email protected]>
Co-authored-by: svv73 <[email protected]>
Co-authored-by: Ömer Faruk Özdemir <[email protected]>
Co-authored-by: Caterina Bonan <[email protected]>
Co-authored-by: Hiromu Hota <[email protected]>
Co-authored-by: trtd56 <[email protected]>
Co-authored-by: Mehrdad Nezamdoost <[email protected]>
Co-authored-by: Wolvz <[email protected]>
Co-authored-by: a-krirk <[email protected]>
Co-authored-by: atgctg <[email protected]>
Co-authored-by: Thiago Medeiros <[email protected]>
Co-authored-by: webbigdata-jp <[email protected]>
Co-authored-by: Leandro von Werra <[email protected]>
Co-authored-by: Bhadresh Savani <[email protected]>

* Bump release (#295)

* Bump release (#296)

* Bump release (#299)

* Bump release (#305)

* Chinese - Chapter 1 finished

* Add zh to the languages field

 Add zh to the languages field in the build_documentation.yml and build_pr_documentation.yml files

* Remove untranslated chapters in _toctree.yml

Remove all these sections that haven't been translated yet
Remove Chapter 0 from the table of contents since it hasn't been translated yet

* Fixed an error in the translation format

Fixed an error in the translation format of Chapter 1, Section 3

* Added a small part of the missing content

* Fix style

* Complete the translation of Chapters 0 and 2

* Fixed some bugs

·Fixed some formatting errors
·Moved Chapters 0 and 2 to Simplified Chinese

* Add files via upload

Formatting revisions and some translation corrections

* run make style to format chapter1 session3

* run make style to format code

* run make style to format code

* Fix style

* Chapter 2 Section 1 Bengali Translation (huggingface#72) (#168)

* [TH] Chapter 6 Section 1 and 2 (#171)

Co-authored-by: Suteera <[email protected]>

* [FA] CH1 / P1-2 (#142)

* Spanish Chapter 3: sections 1 & 2 (#162)

* fix typos in bpe, wordpiece, unigram (#166)

* [FR] French Review (#186)

* Part 7: Training a causal... fixes (#179)

* typo & error mitigation

* consistency

* Trainer.predict() returns 3 fields

* ran make style

* [TR] Translated Chapter 1.6 🤗 (#185)

* added chapter 1/6 to _toctree.yml

* [TR] Translated Chapter 1.6 🤗

* [PT][Chapter 01 - 2.mdx] - issue #51 (#170)

* Fix Gradio ToC (#193)

* Add Gradio authors and Blocks event (#189)

* Update 6.mdx (#188)

Correct link to Transformer XL doc

* Add translating notes and glossary to Spanish (#192)

* Add translating notes and glosary to Spanish

* Adding glossary to the toc

* add pt 4.3 (#191)

* [FR] Visual corrections (#190)

* [PT] add chapter 4.4 and 4.5 (#196)

* fix invite discord link (#197)

* [FA] Second draft of CH2/P1-2 (#139)

* added chapter3 in hindi (#198)

* [TR] Chapter 3/1 (#165)

* [RU] Ch3-1/2/3 (#200)

* [PT] add 5.1 and 5.2 (#204)

* Add placeholders for audio chapters (#208)

* [FA] - Ch3 - P1 and P2 (#199)

* [PT] add `end-of-chapter quiz` for chapter 4 (4.6) (#201)


Co-authored-by: lewtun <[email protected]>

* Chapter1: 2.mdx Translated. (#206)

* Remove comments from Persian ToC (#210)

* Fix CI URL for PRs (#211)

* code fragment & english syntax and meaning (#203)

* Updated Ch1/1 with Emoji (#214)

* Add missing numpy import (#217)

* Updata chapter3

* Code format for chapter3

* Updata yml file of chapter3

* Uptata yml file of chapter3

* Fix yml file bug

* [ES] translate sections 8.1 and 8.2 (#215)

* Fix path to datasets (#216)

* [PT] add 5.3 (#218)

* fix 4.3 (#223)

* Run make style

* Fix notebook generation (#227)

* Add Gradio nb links

* add 5.4 (#226)

* add pt wip (#225)

* Added Gujarati List. (#221)

* Fix quality

* Add Gradio nbs links to fr (#228)

* Fix ToC tree

* Remove audio templates

* Fix fr section

* Fix fr chapter

* Chinese - Chapter 3finished (#219)

* add ch7 at _toctree and translate 7.1 (#222)

* add 5.5 (#235)

* [FR] Review of chapter 7 (#233)

* Italian translation - chapter 4 (#230)

* Added Thai translation of chapters 3 (#231)

* [Ru] Add part 2, chapter 2 (#234)

* Update 8.mdx (#237)

- Remove Gradio Blocks Party
- Add, Where to next? section

* Created HI/Chapter1/5.mdx (#232)

* Add Spanish chaper3/section4, update toc and glossary (#238)

* [RU] Chapter 3 finished (#239)

* [PT] add 5.6 and 5.7 (#240)

* [EN] Visual corrections (#245)

* Translation for 1/4, 1/5 and 1/6. (#247)

* add event in PT (#250)

* Pin version of black (#252)

* Translate ja event (#241)

* [PT] add quiz chapter 5 (#243)

* Update 5.mdx (#253)

inconsistent naming with line 327

* Translation for Traditional Chinese (zh-tw) chapter0  (#251)


Co-authored-by: Lewis Tunstall <[email protected]>

* Translated the whole Chapter 3 to Thai  (#255)

* Japanese chapter 4 (#244)

* Translation of 1/7, 1/8, and 1/9. (#263)

* [PT] add chapter  8.1 and 8.2 (#265)

* [RU] Chapter 4  (#269)

* Add Thai translation for chapter 6.3b to 6.10 (#268)

* add 8.3 (#266)

* 3.mdx of chapter 01 (#260)

Co-authored-by: Lewis Tunstall <[email protected]>

* Fix typo (#271)

* [PT] add chapter 6.1 (#273)

* add Japanese chapter7 (#267)

* zh-CN - Chapter 4,5finished

* replace `load_metric` with `evaluate.load` (#285)

* update `load_metric` refs to `evaluate.load`

Co-authored-by: lewtun <[email protected]>

* [GJ] Translation to Gujarati - Ch0 Setup (#287)

* [PT] add chapter 6.2 and 6.3 (#279)

* Fix formatting

* Debug formatting

* Debug FR formatting

* zh-CN - Chapter 4,5finished (#281)

Co-authored-by: Lewis Tunstall <[email protected]>

* Chapter 01 - Done [PT] #51 (#280)

Co-authored-by: Lewis Tunstall <[email protected]>

* tf_default_data_collator seems to have moved

* zh-CN - Chapter 6finished

* Revert "Merge branch 'huggingface:main' into main"

This reverts commit aebb46e12f9f87a4303f8bb4f0f2cf545eb83b21, reversing
changes made to 69187a3789e8d3d2d0de821ebe495f111d1cc73d.

* Revert "zh-CN - Chapter 6finished"

This reverts commit e69fce28d3a7b35b76c4f768a6cedf295b37d8c9.

* zh-CN - Chapter 6finished

* fix style

* undo bad commit

* Chapter5it (#278)

* added the italian translation for unit 1 chapter5

Co-authored-by: Leandro von Werra <[email protected]>

* Vietnamese translation (#293)

* Update .github/workflows/build_pr_documentation.yml

Co-authored-by: lewtun <[email protected]>

* Translate JP chapter 8 (#249)

* Italian translation - Chapter 8 (#272)

* Translation to Vietnamese - chapter 5 (#297)

* Add course contributors (#298)

* Add CourseFloatingBanner component

* DocNotebookDropdown -> CourseFloatingBanner

* Italian translation Ch 2/1, 2/2 (#300)

* Add contributors (#304)

* Add forum button (#306)

Co-authored-by: 1375626371 <[email protected]>
Co-authored-by: 1375626371 <[email protected]>
Co-authored-by: Avishek Das <[email protected]>
Co-authored-by: Suteera  Seeha <[email protected]>
Co-authored-by: Suteera <[email protected]>
Co-authored-by: Saeed Choobani <[email protected]>
Co-authored-by: Fermin Ordaz <[email protected]>
Co-authored-by: Kerem Turgutlu <[email protected]>
Co-authored-by: lbourdois <[email protected]>
Co-authored-by: Sebastian Sosa <[email protected]>
Co-authored-by: tanersekmen <[email protected]>
Co-authored-by: Victor Costa <[email protected]>
Co-authored-by: Camille Couturier <[email protected]>
Co-authored-by: João Gustavo A. Amorim <[email protected]>
Co-authored-by: Bahram Shamshiri <[email protected]>
Co-authored-by: Kavya <[email protected]>
Co-authored-by: Batuhan Ayhan <[email protected]>
Co-authored-by: Pavel <[email protected]>
Co-authored-by: Kambiz Ghoorchian <[email protected]>
Co-authored-by: Vedant Pandya <[email protected]>
Co-authored-by: Diego Vargas <[email protected]>
Co-authored-by: Thomas O'Brien <[email protected]>
Co-authored-by: Lincoln V Schreiber <[email protected]>
Co-authored-by: Giorgio Severi <[email protected]>
Co-authored-by: svv73 <[email protected]>
Co-authored-by: Ömer Faruk Özdemir <[email protected]>
Co-authored-by: Caterina Bonan <[email protected]>
Co-authored-by: Hiromu Hota <[email protected]>
Co-authored-by: trtd56 <[email protected]>
Co-authored-by: Mehrdad Nezamdoost <[email protected]>
Co-authored-by: Wolvz <[email protected]>
Co-authored-by: a-krirk <[email protected]>
Co-authored-by: atgctg <[email protected]>
Co-authored-by: Thiago Medeiros <[email protected]>
Co-authored-by: webbigdata-jp <[email protected]>
Co-authored-by: Leandro von Werra <[email protected]>
Co-authored-by: Bhadresh Savani <[email protected]>
Co-authored-by: Andreas Ehrencrona <[email protected]>
Co-authored-by: leandro <[email protected]>
Co-authored-by: Matt <[email protected]>
Co-authored-by: Nolanogenn <[email protected]>
Co-authored-by: Hồng Hạnh <[email protected]>
Co-authored-by: Younes Belkada <[email protected]>
Co-authored-by: Edoardo Abati <[email protected]>
Co-authored-by: Mishig Davaadorj <[email protected]>
Co-authored-by: Acciaro Gennaro Daniele <[email protected]>

* Bump release (#307)

* Bump release (#308)

* Bump release (#314)

* Bump release (#320)

* Bump release (#328)

* Bump release (#333)

* Bump release (#335)

* Bump release (#343)

* Bump release (#355)

* Bump release (#358)

* Bump release (#371)

* Bump release (#381)

* Bump release (#387)

* Bump release (#404)

* Bump release (#413)

* Bump release (#426)

* Bump release (#463)

---------

Co-authored-by: DOOHAE JUNG <[email protected]>
Co-authored-by: m_khandaker <[email protected]>
Co-authored-by: Md. Al-Amin Khandaker <[email protected]>
Co-authored-by: ftarlaci <[email protected]>
Co-authored-by: Doohae Jung <[email protected]>
Co-authored-by: melaniedrevet <[email protected]>
Co-authored-by: Jose M Munoz <[email protected]>
Co-authored-by: svv73 <[email protected]>
Co-authored-by: Vedant Pandya <[email protected]>
Co-authored-by: Bahram Shamshiri <[email protected]>
Co-authored-by: Giyaseddin Bayrak <[email protected]>
Co-authored-by: Pavel <[email protected]>
Co-authored-by: 1375626371 <[email protected]>
Co-authored-by: petrichor1122 <[email protected]>
Co-authored-by: zhlhyx <[email protected]>
Co-authored-by: João Gustavo A. Amorim <[email protected]>
Co-authored-by: lbourdois <[email protected]>
Co-authored-by: Cherdsak Kingkan <[email protected]>
Co-authored-by: Thomas Chaigneau <[email protected]>
Co-authored-by: ChainYo <[email protected]>
Co-authored-by: hiromu <[email protected]>
Co-authored-by: Cherdsak Kingkan <[email protected]>
Co-authored-by: Marcus Fraaß <[email protected]>
Co-authored-by: Jesper Dramsch <[email protected]>
Co-authored-by: amyeroberts <[email protected]>
Co-authored-by: Ash <[email protected]>
Co-authored-by: Hamed Homaei Rad <[email protected]>
Co-authored-by: Dawood Khan <[email protected]>
Co-authored-by: regisss <[email protected]>
Co-authored-by: Avishek Das <[email protected]>
Co-authored-by: Suteera  Seeha <[email protected]>
Co-authored-by: Suteera <[email protected]>
Co-authored-by: Saeed Choobani <[email protected]>
Co-authored-by: Fermin Ordaz <[email protected]>
Co-authored-by: Kerem Turgutlu <[email protected]>
Co-authored-by: Sebastian Sosa <[email protected]>
Co-authored-by: tanersekmen <[email protected]>
Co-authored-by: Victor Costa <[email protected]>
Co-authored-by: Camille Couturier <[email protected]>
Co-authored-by: Kavya <[email protected]>
Co-authored-by: Batuhan Ayhan <[email protected]>
Co-authored-by: Kambiz Ghoorchian <[email protected]>
Co-authored-by: Diego Vargas <[email protected]>
Co-authored-by: Thomas O'Brien <[email protected]>
Co-authored-by: Lincoln V Schreiber <[email protected]>
Co-authored-by: Giorgio Severi <[email protected]>
Co-authored-by: Ömer Faruk Özdemir <[email protected]>
Co-authored-by: Caterina Bonan <[email protected]>
Co-authored-by: Hiromu Hota <[email protected]>
Co-authored-by: trtd56 <[email protected]>
Co-authored-by: Mehrdad Nezamdoost <[email protected]>
Co-authored-by: Wolvz <[email protected]>
Co-authored-by: a-krirk <[email protected]>
Co-authored-by: atgctg <[email protected]>
Co-authored-by: Thiago Medeiros <[email protected]>
Co-authored-by: webbigdata-jp <[email protected]>
Co-authored-by: Leandro von Werra <[email protected]>
Co-authored-by: Bhadresh Savani <[email protected]>
Co-authored-by: 1375626371 <[email protected]>
Co-authored-by: Andreas Ehrencrona <[email protected]>
Co-authored-by: leandro <[email protected]>
Co-authored-by: Matt <[email protected]>
Co-authored-by: Nolanogenn <[email protected]>
Co-authored-by: Hồng Hạnh <[email protected]>
Co-authored-by: Younes Belkada <[email protected]>
Co-authored-by: Edoardo Abati <[email protected]>
Co-authored-by: Mishig Davaadorj <[email protected]>
Co-authored-by: Acciaro Gennaro Daniele <[email protected]>

* Revert "Bump release (#566)" (#567)

This reverts commit cccc2c91ac8e702e5e14bbb0419dbf0490c7aaaf.

* updated documentation links

* [doc build] Use secrets (#581)

* docs: fix broken links

* changed 'perspires' to 'persists' in chapter 1 quiz

solves issue #585

* Update 4.mdx

You forgot to write a return for this function.

* Update 4.mdx : Fix Typo

Should be "course"

* fix link

* Update 2.mdx

updated loading datasets link

* Update 2.mdx

updated loading datasets link

* Update 2.mdx

updated loading datasets link

* Update 2.mdx

updated loading datasets link

* Update 2.mdx

updated loading datasets link

* Update 2.mdx

updated loading datasets link

* Update 2.mdx

updated loading datasets link

* Update 2.mdx

updated loading datasets link

* Update 2.mdx

updated loading datasets link

* Update 2.mdx

updated loading datasets link

* Update 2.mdx

updated loading datasets link

* Update 2.mdx

updated loading datasets link

* Fix syntax in vi/chapter7/7.mdx

There was an unnecessary `</Tip>`

* Remove `get_lr()` from logs which refers to nonexistent function

`get_lr()` is called as part of this function, but the function is not declared anywhere in the script. This change removes this portion of the code since it is non-necessary.

* Update 4.mdx

removed judgmental argument

* Update en-version

* fix: remove useless token

* fix: remove useless token (#635)

* Translate Chapter 3 to Spanish (#510)

* translate Chapter 3 to Spanish

* translate code comments to Spanish and fix typos

* Translating Chapter 6 to Spanish (#523)

* Translating sections 1 and 2 to spanish

* Translating sections 3 to spanish

* Translating sections 3b to spanish

* Translating sections 4 to spanish

* Translating section 5 to spanish

* Translating section 6 to spanish

* Translating section 7 to spanish

* Translating section 8 to spanish

* Translating section 9 to spanish

* Translating section 10 to spanish

* Adding Sections to _toctree.yml

* Fixing Typos after second review

---------

Co-authored-by: datacubeR <[email protected]>

* Update 5.mdx

Ajuste na tradução de "encoders". São "codificadores", não "decodificadores". Decoders são "decodificadores".

* Update doc CI (#643)

* Фиксация текущих результатов.

* Фиксирую текущее состояние.

* Fixing the transfer results for today.

* Translated files 3b and partially 4. Fixing the result.

* Fixing today's translation.

* fix typos in Spanish translation (#511)

* Fixing today's translation. Files: 6.mdx, 7.mdx and half of 8.mdx.

* The translation of chapter 6 has been completed.

* Delete chapters/en/.ipynb_checkpoints/_toctree-checkpoint.yml

This is backup created by JupyterLab.

* Delete chapters/en/chapter5/.ipynb_checkpoints/8-checkpoint.mdx

This is backup created by JupyterLab.

* Delete chapters/en/chapter6/.ipynb_checkpoints/1-checkpoint.mdx

This is backup created by JupyterLab.

* Delete chapters/en/chapter6/.ipynb_checkpoints/2-checkpoint.mdx

This is backup created by JupyterLab.

* Delete chapters/en/chapter6/.ipynb_checkpoints/8-checkpoint.mdx

This is backup created by JupyterLab.

* Delete chapters/en/chapter6/.ipynb_checkpoints/9-checkpoint.mdx

This is backup created by JupyterLab.

* Delete chapters/ru/.ipynb_checkpoints/TRANSLATING-checkpoint.txt

This is backup created by JupyterLab.

* Delete chapters/ru/.ipynb_checkpoints/_toctree-checkpoint.yml

This is backup created by JupyterLab.

* Delete chapters/ru/chapter5/.ipynb_checkpoints/8-checkpoint.mdx

This is backup created by JupyterLab.

* Update 10.mdx

Minor fix.

* Update 10.mdx

Trying to solve the markup problem.

* Update 10.mdx

Correcting the syntax of some markup again)

* Update chapters/ru/chapter6/4.mdx

Yes, that space is redundant here. You're right about that.

Co-authored-by: Maria Khalusova <[email protected]>

* Update chapters/ru/chapter6/4.mdx

Extra space. I overlooked it. My mistake.

Co-authored-by: Maria Khalusova <[email protected]>

* Update chapters/ru/chapter6/3.mdx

There's an extra space here. You're right.

Co-authored-by: Maria Khalusova <[email protected]>

* Update chapters/ru/chapter6/3.mdx

There's an extra space here. You're right.

Co-authored-by: Maria Khalusova <[email protected]>

* Update chapters/ru/chapter6/3b.mdx

Yeah, there's no need for a space here.

Co-authored-by: Maria Khalusova <[email protected]>

* Update chapters/ru/chapter6/3.mdx

Co-authored-by: Maria Khalusova <[email protected]>

* Update 3.mdx

* Update 7.mdx

Translated the comments noted on the review.

* Update 3.mdx

Translated the missing comments in the code.

* Update chapters/ru/chapter6/3b.mdx

Yes, an extra space.

Co-authored-by: Maria Khalusova <[email protected]>

* Update chapters/ru/chapter6/5.mdx

Minor fix.

Co-authored-by: Maria Khalusova <[email protected]>

---------

Co-authored-by: Yuan <[email protected]>
Co-authored-by: lbourdois <[email protected]>
Co-authored-by: IL-GU KIM <[email protected]>
Co-authored-by: Kim Bo Geum <[email protected]>
Co-authored-by: Bartosz Szmelczynski <[email protected]>
Co-authored-by: Shawn Lee <[email protected]>
Co-authored-by: Naveen Reddy D <[email protected]>
Co-authored-by: rainmaker <[email protected]>
Co-authored-by: “Ryan” <“[email protected]”>
Co-authored-by: Wonhyeong Seo <[email protected]>
Co-authored-by: Meta Learner응용개발팀 류민호 <[email protected]>
Co-authored-by: Minho Ryu <[email protected]>
Co-authored-by: richardachen <[email protected]>
Co-authored-by: Luke Cheng <[email protected]>
Co-authored-by: beyondguo <[email protected]>
Co-authored-by: bsenst <[email protected]>
Co-authored-by: 1375626371 <[email protected]>
Co-authored-by: yaoqih <[email protected]>
Co-authored-by: 李洋 <[email protected]>
Co-authored-by: PowerChina <[email protected]>
Co-authored-by: chenglu99 <[email protected]>
Co-authored-by: iCell <[email protected]>
Co-authored-by: Tiezhen WANG <[email protected]>
Co-authored-by: Qi Zhang <[email protected]>
Co-authored-by: researcher <[email protected]>
Co-authored-by: simpleAI <[email protected]>
Co-authored-by: FYJNEVERFOLLOWS <[email protected]>
Co-authored-by: zhangchaosd <[email protected]>
Co-authored-by: TK Buristrakul <[email protected]>
Co-authored-by: Acciaro Gennaro Daniele <[email protected]>
Co-authored-by: Carlos Aguayo <[email protected]>
Co-authored-by: ateliershen <[email protected]>
Co-authored-by: Pavel Nesterov <[email protected]>
Co-authored-by: Artyom Boyko <[email protected]>
Co-authored-by: Kirill Milintsevich <[email protected]>
Co-authored-by: jybarnes21 <[email protected]>
Co-authored-by: gxy-gxy <[email protected]>
Co-authored-by: iLeGend <[email protected]>
Co-authored-by: sj <[email protected]>
Co-authored-by: Sureshkumar Thangavel <[email protected]>
Co-authored-by: Andrei Shirobokov <[email protected]>
Co-authored-by: Pranav <[email protected]>
Co-authored-by: Maria Khalusova <[email protected]>
Co-authored-by: DOOHAE JUNG <[email protected]>
Co-authored-by: m_khandaker <[email protected]>
Co-authored-by: Md. Al-Amin Khandaker <[email protected]>
Co-authored-by: ftarlaci <[email protected]>
Co-authored-by: Doohae Jung <[email protected]>
Co-authored-by: melaniedrevet <[email protected]>
Co-authored-by: Jose M Munoz <[email protected]>
Co-authored-by: svv73 <[email protected]>
Co-authored-by: Vedant Pandya <[email protected]>
Co-authored-by: Bahram Shamshiri <[email protected]>
Co-authored-by: Giyaseddin Bayrak <[email protected]>
Co-authored-by: Pavel <[email protected]>
Co-authored-by: 1375626371 <[email protected]>
Co-authored-by: petrichor1122 <[email protected]>
Co-authored-by: zhlhyx <[email protected]>
Co-authored-by: João Gustavo A. Amorim <[email protected]>
Co-authored-by: Cherdsak Kingkan <[email protected]>
Co-authored-by: Thomas Chaigneau <[email protected]>
Co-authored-by: ChainYo <[email protected]>
Co-authored-by: hiromu <[email protected]>
Co-authored-by: Cherdsak Kingkan <[email protected]>
Co-authored-by: Marcus Fraaß <[email protected]>
Co-authored-by: Jesper Dramsch <[email protected]>
Co-authored-by: amyeroberts <[email protected]>
Co-authored-by: Ash <[email protected]>
Co-authored-by: Hamed Homaei Rad <[email protected]>
Co-authored-by: Dawood Khan <[email protected]>
Co-authored-by: regisss <[email protected]>
Co-authored-by: Avishek Das <[email protected]>
Co-authored-by: Suteera  Seeha <[email protected]>
Co-authored-by: Suteera <[email protected]>
Co-authored-by: Saeed Choobani <[email protected]>
Co-authored-by: Fermin Ordaz <[email protected]>
Co-authored-by: Kerem Turgutlu <[email protected]>
Co-authored-by: Sebastian Sosa <[email protected]>
Co-authored-by: tanersekmen <[email protected]>
Co-authored-by: Victor Costa <[email protected]>
Co-authored-by: Camille Couturier <[email protected]>
Co-authored-by: Kavya <[email protected]>
Co-authored-by: Batuhan Ayhan <[email protected]>
Co-authored-by: Kambiz Ghoorchian <[email protected]>
Co-authored-by: Diego Vargas <[email protected]>
Co-authored-by: Thomas O'Brien <[email protected]>
Co-authored-by: Lincoln V Schreiber <[email protected]>
Co-authored-by: Giorgio Severi <[email protected]>
Co-authored-by: Ömer Faruk Özdemir <[email protected]>
Co-authored-by: Caterina Bonan <[email protected]>
Co-authored-by: Hiromu Hota <[email protected]>
Co-authored-by: trtd56 <[email protected]>
Co-authored-by: Mehrdad Nezamdoost <[email protected]>
Co-authored-by: Wolvz <[email protected]>
Co-authored-by: a-krirk <[email protected]>
Co-authored-by: atgctg <[email protected]>
Co-authored-by: Thiago M…
  • Loading branch information
Show file tree
Hide file tree
Showing 85 changed files with 8,245 additions and 256 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/build_documentation.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,4 +16,4 @@ jobs:
additional_args: --not_python_module
languages: ar bn de en es fa fr gj he hi id it ja ko pt ru th tr vi zh-CN zh-TW
secrets:
token: ${{ secrets.HUGGINGFACE_PUSH }}
hf_token: ${{ secrets.HF_DOC_BUILD_PUSH }}
1 change: 0 additions & 1 deletion .github/workflows/build_pr_documentation.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,4 +17,3 @@ jobs:
path_to_docs: course/chapters/
additional_args: --not_python_module
languages: ar bn de en es fa fr gj he hi id it ja ko pt ru th tr vi zh-CN zh-TW
hub_base_path: https://moon-ci-docs.huggingface.co
13 changes: 0 additions & 13 deletions .github/workflows/delete_doc_comment.yml

This file was deleted.

17 changes: 17 additions & 0 deletions .github/workflows/upload_pr_documentation.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
name: Upload PR Documentation

on:
workflow_run:
workflows: ["Build PR Documentation"]
types:
- completed

jobs:
build:
uses: huggingface/doc-builder/.github/workflows/upload_pr_documentation.yml@main
with:
package_name: course
hub_base_path: https://moon-ci-docs.huggingface.co
secrets:
hf_token: ${{ secrets.HF_DOC_BUILD_PUSH }}
comment_bot_token: ${{ secrets.COMMENT_BOT_TOKEN }}
2 changes: 1 addition & 1 deletion chapters/de/chapter3/2.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ In diesem Abschnitt verwenden wir den MRPC-Datensatz (Microsoft Research Paraphr
<Youtube id="W_gMJF0xomE"/>
{/if}

Das Hub enthält nicht nur Modelle; Es hat auch mehrere Datensätze in vielen verschiedenen Sprachen. Du kannst die Datensätze [hier](https://huggingface.co/datasets) durchsuchen, und wir empfehlen, einen weiteren Datensatz zu laden und zu verarbeiten, sobald Sie diesen Abschnitt abgeschlossen haben (die Dokumentation befindet sich [hier](https: //huggingface.co/docs/datasets/loading_datasets.html#from-the-huggingface-hub)). Aber jetzt konzentrieren wir uns auf den MRPC-Datensatz! Dies ist einer der 10 Datensätze, aus denen sich das [GLUE-Benchmark](https://gluebenchmark.com/) zusammensetzt. Dies ist ein akademisches Benchmark, das verwendet wird, um die Performance von ML-Modellen in 10 verschiedenen Textklassifizierungsaufgaben zu messen.
Das Hub enthält nicht nur Modelle; Es hat auch mehrere Datensätze in vielen verschiedenen Sprachen. Du kannst die Datensätze [hier](https://huggingface.co/datasets) durchsuchen, und wir empfehlen, einen weiteren Datensatz zu laden und zu verarbeiten, sobald Sie diesen Abschnitt abgeschlossen haben (die Dokumentation befindet sich [hier](https://huggingface.co/docs/datasets/loading)). Aber jetzt konzentrieren wir uns auf den MRPC-Datensatz! Dies ist einer der 10 Datensätze, aus denen sich das [GLUE-Benchmark](https://gluebenchmark.com/) zusammensetzt. Dies ist ein akademisches Benchmark, das verwendet wird, um die Performance von ML-Modellen in 10 verschiedenen Textklassifizierungsaufgaben zu messen.

Die Bibliothek 🤗 Datasets bietet einen leichten Befehl zum Herunterladen und Caching eines Datensatzes aus dem Hub. Wir können den MRPC-Datensatz wie folgt herunterladen:

Expand Down
2 changes: 1 addition & 1 deletion chapters/en/chapter1/10.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -241,7 +241,7 @@ result = classifier("This is a course about the Transformers library")
choices={[
{
text: "The model is a fine-tuned version of a pretrained model and it picked up its bias from it.",
explain: "When applying Transfer Learning, the bias in the pretrained model used perspires in the fine-tuned model.",
explain: "When applying Transfer Learning, the bias in the pretrained model used persists in the fine-tuned model.",
correct: true
},
{
Expand Down
4 changes: 2 additions & 2 deletions chapters/en/chapter1/4.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,7 @@ By the way, you can evaluate the carbon footprint of your models' training throu

This pretraining is usually done on very large amounts of data. Therefore, it requires a very large corpus of data, and training can take up to several weeks.

*Fine-tuning*, on the other hand, is the training done **after** a model has been pretrained. To perform fine-tuning, you first acquire a pretrained language model, then perform additional training with a dataset specific to your task. Wait -- why not simply train directly for the final task? There are a couple of reasons:
*Fine-tuning*, on the other hand, is the training done **after** a model has been pretrained. To perform fine-tuning, you first acquire a pretrained language model, then perform additional training with a dataset specific to your task. Wait -- why not simply train the model for your final use case from the start (**scratch**)? There are a couple of reasons:

* The pretrained model was already trained on a dataset that has some similarities with the fine-tuning dataset. The fine-tuning process is thus able to take advantage of knowledge acquired by the initial model during pretraining (for instance, with NLP problems, the pretrained model will have some kind of statistical understanding of the language you are using for your task).
* Since the pretrained model was already trained on lots of data, the fine-tuning requires way less data to get decent results.
Expand Down Expand Up @@ -144,7 +144,7 @@ We will dive into those architectures independently in later sections.

A key feature of Transformer models is that they are built with special layers called *attention layers*. In fact, the title of the paper introducing the Transformer architecture was ["Attention Is All You Need"](https://arxiv.org/abs/1706.03762)! We will explore the details of attention layers later in the course; for now, all you need to know is that this layer will tell the model to pay specific attention to certain words in the sentence you passed it (and more or less ignore the others) when dealing with the representation of each word.

To put this into context, consider the task of translating text from English to French. Given the input "You like this course", a translation model will need to also attend to the adjacent word "You" to get the proper translation for the word "like", because in French the verb "like" is conjugated differently depending on the subject. The rest of the sentence, however, is not useful for the translation of that word. In the same vein, when translating "this" the model will also need to pay attention to the word "course", because "this" translates differently depending on whether the associated noun is masculine or feminine. Again, the other words in the sentence will not matter for the translation of "this". With more complex sentences (and more complex grammar rules), the model would need to pay special attention to words that might appear farther away in the sentence to properly translate each word.
To put this into context, consider the task of translating text from English to French. Given the input "You like this course", a translation model will need to also attend to the adjacent word "You" to get the proper translation for the word "like", because in French the verb "like" is conjugated differently depending on the subject. The rest of the sentence, however, is not useful for the translation of that word. In the same vein, when translating "this" the model will also need to pay attention to the word "course", because "this" translates differently depending on whether the associated noun is masculine or feminine. Again, the other words in the sentence will not matter for the translation of "course". With more complex sentences (and more complex grammar rules), the model would need to pay special attention to words that might appear farther away in the sentence to properly translate each word.

The same concept applies to any task associated with natural language: a word by itself has a meaning, but that meaning is deeply affected by the context, which can be any other word (or words) before or after the word being studied.

Expand Down
10 changes: 5 additions & 5 deletions chapters/en/chapter1/5.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,8 @@ Encoder models are best suited for tasks requiring an understanding of the full

Representatives of this family of models include:

- [ALBERT](https://huggingface.co/transformers/model_doc/albert.html)
- [BERT](https://huggingface.co/transformers/model_doc/bert.html)
- [DistilBERT](https://huggingface.co/transformers/model_doc/distilbert.html)
- [ELECTRA](https://huggingface.co/transformers/model_doc/electra.html)
- [RoBERTa](https://huggingface.co/transformers/model_doc/roberta.html)
- [ALBERT](https://huggingface.co/docs/transformers/model_doc/albert)
- [BERT](https://huggingface.co/docs/transformers/model_doc/bert)
- [DistilBERT](https://huggingface.co/docs/transformers/model_doc/distilbert)
- [ELECTRA](https://huggingface.co/docs/transformers/model_doc/electra)
- [RoBERTa](https://huggingface.co/docs/transformers/model_doc/roberta)
2 changes: 1 addition & 1 deletion chapters/en/chapter2/5.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -329,7 +329,7 @@ With Transformer models, there is a limit to the lengths of the sequences we can
- Use a model with a longer supported sequence length.
- Truncate your sequences.

Models have different supported sequence lengths, and some specialize in handling very long sequences. [Longformer](https://huggingface.co/transformers/model_doc/longformer.html) is one example, and another is [LED](https://huggingface.co/transformers/model_doc/led.html). If you're working on a task that requires very long sequences, we recommend you take a look at those models.
Models have different supported sequence lengths, and some specialize in handling very long sequences. [Longformer](https://huggingface.co/docs/transformers/model_doc/longformer) is one example, and another is [LED](https://huggingface.co/docs/transformers/model_doc/led). If you're working on a task that requires very long sequences, we recommend you take a look at those models.

Otherwise, we recommend you truncate your sequences by specifying the `max_sequence_length` parameter:

Expand Down
2 changes: 1 addition & 1 deletion chapters/en/chapter3/2.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ In this section we will use as an example the MRPC (Microsoft Research Paraphras
<Youtube id="W_gMJF0xomE"/>
{/if}

The Hub doesn't just contain models; it also has multiple datasets in lots of different languages. You can browse the datasets [here](https://huggingface.co/datasets), and we recommend you try to load and process a new dataset once you have gone through this section (see the general documentation [here](https://huggingface.co/docs/datasets/loading_datasets.html#from-the-huggingface-hub)). But for now, let's focus on the MRPC dataset! This is one of the 10 datasets composing the [GLUE benchmark](https://gluebenchmark.com/), which is an academic benchmark that is used to measure the performance of ML models across 10 different text classification tasks.
The Hub doesn't just contain models; it also has multiple datasets in lots of different languages. You can browse the datasets [here](https://huggingface.co/datasets), and we recommend you try to load and process a new dataset once you have gone through this section (see the general documentation [here](https://huggingface.co/docs/datasets/loading)). But for now, let's focus on the MRPC dataset! This is one of the 10 datasets composing the [GLUE benchmark](https://gluebenchmark.com/), which is an academic benchmark that is used to measure the performance of ML models across 10 different text classification tasks.

The 🤗 Datasets library provides a very simple command to download and cache a dataset on the Hub. We can download the MRPC dataset like this:

Expand Down
2 changes: 1 addition & 1 deletion chapters/en/chapter3/6.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -211,7 +211,7 @@ Test what you learned in this chapter!
explain: "This is what we did with <code>Trainer</code>, not the 🤗 Accelerate library. Try again!"
},
{
text: "It makes our training loops work on distributed strategies",
text: "It makes our training loops work on distributed strategies.",
explain: "Correct! With 🤗 Accelerate, your training loops will work for multiple GPUs and TPUs.",
correct: true
},
Expand Down
2 changes: 1 addition & 1 deletion chapters/en/chapter6/5.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,7 @@ Let's take the example we used during training, with the three merge rules learn
("h", "ug") -> "hug"
```

The word `"bug"` will be tokenized as `["b", "ug"]`. `"mug"`, however, will be tokenized as `["[UNK]", "ug"]` since the letter `"m"` was not in the base vocabulary. Likewise, the word `"thug"` will be tokenized as `["[UNK]", "hug"]`: the letter `"t"` is not in the base vocabulary, and applying the merge rules results first in `"u"` and `"g"` being merged and then `"hu"` and `"g"` being merged.
The word `"bug"` will be tokenized as `["b", "ug"]`. `"mug"`, however, will be tokenized as `["[UNK]", "ug"]` since the letter `"m"` was not in the base vocabulary. Likewise, the word `"thug"` will be tokenized as `["[UNK]", "hug"]`: the letter `"t"` is not in the base vocabulary, and applying the merge rules results first in `"u"` and `"g"` being merged and then `"h"` and `"ug"` being merged.

<Tip>

Expand Down
2 changes: 1 addition & 1 deletion chapters/en/chapter6/7.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ So, the sum of all frequencies is 210, and the probability of the subword `"ug"`

<Tip>

✏️ **Now your turn!** Write the code to compute the the frequencies above and double-check that the results shown are correct, as well as the total sum.
✏️ **Now your turn!** Write the code to compute the frequencies above and double-check that the results shown are correct, as well as the total sum.

</Tip>

Expand Down
4 changes: 2 additions & 2 deletions chapters/en/chapter7/3.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -347,7 +347,7 @@ print(f"'>>> Concatenated reviews length: {total_length}'")
'>>> Concatenated reviews length: 951'
```

Great, the total length checks out -- so now let's split the concatenated reviews into chunks of the size given by `block_size`. To do so, we iterate over the features in `concatenated_examples` and use a list comprehension to create slices of each feature. The result is a dictionary of chunks for each feature:
Great, the total length checks out -- so now let's split the concatenated reviews into chunks of the size given by `chunk_size`. To do so, we iterate over the features in `concatenated_examples` and use a list comprehension to create slices of each feature. The result is a dictionary of chunks for each feature:

```python
chunks = {
Expand Down Expand Up @@ -1035,7 +1035,7 @@ Neat -- our model has clearly adapted its weights to predict words that are more

<Youtube id="0Oxphw4Q9fo"/>

This wraps up our first experiment with training a language model. In [section 6](/course/en/chapter7/section6) you'll learn how to train an auto-regressive model like GPT-2 from scratch; head over there if you'd like to see how you can pretrain your very own Transformer model!
This wraps up our first experiment with training a language model. In [section 6](/course/en/chapter7/6) you'll learn how to train an auto-regressive model like GPT-2 from scratch; head over there if you'd like to see how you can pretrain your very own Transformer model!

<Tip>

Expand Down
2 changes: 1 addition & 1 deletion chapters/en/chapter7/4.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,7 @@ split_datasets["train"][1]["translation"]
'fr': 'Par défaut, développer les fils de discussion'}
```

We get a dictionary with two sentences in the pair of languages we requested. One particularity of this dataset full of technical computer science terms is that they are all fully translated in French. However, French engineers are often lazy and leave most computer science-specific words in English when they talk. Here, for instance, the word "threads" might well appear in a French sentence, especially in a technical conversation; but in this dataset it has been translated into the more correct "fils de discussion." The pretrained model we use, which has been pretrained on a larger corpus of French and English sentences, takes the easier option of leaving the word as is:
We get a dictionary with two sentences in the pair of languages we requested. One particularity of this dataset full of technical computer science terms is that they are all fully translated in French. However, French engineers leave most computer science-specific words in English when they talk. Here, for instance, the word "threads" might well appear in a French sentence, especially in a technical conversation; but in this dataset it has been translated into the more correct "fils de discussion." The pretrained model we use, which has been pretrained on a larger corpus of French and English sentences, takes the easier option of leaving the word as is:

```py
from transformers import pipeline
Expand Down
3 changes: 1 addition & 2 deletions chapters/en/chapter7/6.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -870,7 +870,6 @@ for epoch in range(num_train_epochs):
if step % 100 == 0:
accelerator.print(
{
"lr": get_lr(),
"samples": step * samples_per_step,
"steps": completed_steps,
"loss/train": loss.item() * gradient_accumulation_steps,
Expand Down Expand Up @@ -912,4 +911,4 @@ And that's it -- you now have your own custom training loop for causal language

</Tip>

{/if}
{/if}
2 changes: 1 addition & 1 deletion chapters/en/events/3.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,4 @@ You can find all the demos that the community created under the [`Gradio-Blocks`

**Natural language to SQL**

<iframe src="https://huggingface.co/spaces/Curranj/Words_To_SQL" frameBorder="0" height="640" title="Gradio app" class="container p-0 flex-grow space-iframe" allow="accelerometer; ambient-light-sensor; autoplay; battery; camera; document-domain; encrypted-media; fullscreen; geolocation; gyroscope; layout-animations; legacy-image-formats; magnetometer; microphone; midi; oversized-images; payment; picture-in-picture; publickey-credentials-get; sync-xhr; usb; vr ; wake-lock; xr-spatial-tracking" sandbox="allow-forms allow-modals allow-popups allow-popups-to-escape-sandbox allow-same-origin allow-scripts allow-downloads"></iframe>
<iframe src="https://curranj-words-to-sql.hf.space" frameBorder="0" height="640" title="Gradio app" class="container p-0 flex-grow space-iframe" allow="accelerometer; ambient-light-sensor; autoplay; battery; camera; document-domain; encrypted-media; fullscreen; geolocation; gyroscope; layout-animations; legacy-image-formats; magnetometer; microphone; midi; oversized-images; payment; picture-in-picture; publickey-credentials-get; sync-xhr; usb; vr ; wake-lock; xr-spatial-tracking" sandbox="allow-forms allow-modals allow-popups allow-popups-to-escape-sandbox allow-same-origin allow-scripts allow-downloads"></iframe>
39 changes: 38 additions & 1 deletion chapters/es/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -39,15 +39,25 @@
title: ¡Haz completado el uso básico!
- local: chapter2/8
title: Quiz de final de capítulo
quiz: 2

- title: 3. Ajuste (fine-tuning) de un modelo preentrenado
sections:
- local: chapter3/1
title: Introducción
- local: chapter3/2
title: Procesamiento de los datos
- local: chapter3/3
title: Ajuste de un modelo con la API Trainer
- local: chapter3/3_tf
title: Ajuste de un modelo con Keras
- local: chapter3/4
title: Entrenamiento completo
- local: chapter3/5
title: Ajuste de modelos, ¡hecho!
- local: chapter3/6
title: Quiz de final de capítulo
quiz: 3

- title: 5. La librería 🤗 Datasets
sections:
Expand All @@ -66,9 +76,36 @@
- local: chapter5/7
title: 🤗 Datasets, ¡listo!
- local: chapter5/8
title: Quiz
title: Quiz de final de capítulo
quiz: 5


- title: 6. La librería 🤗 Tokenizers
sections:
- local: chapter6/1
title: Introducción
- local: chapter6/2
title: Entrenar un nuevo tokenizador a partir de uno existente
- local: chapter6/3
title: Los poderes especiales de los Tokenizadores Rápidos (Fast tokenizers)
- local: chapter6/3b
title: Tokenizadores Rápidos en un Pipeline de Question-Answering
- local: chapter6/4
title: Normalización y pre-tokenización
- local: chapter6/5
title: Tokenización por Codificación Byte-Pair
- local: chapter6/6
title: Tokenización WordPiece
- local: chapter6/7
title: Tokenización Unigram
- local: chapter6/8
title: Construir un tokenizador, bloque por bloque
- local: chapter6/9
title: Tokenizadores, listo!
- local: chapter6/10
title: Quiz de final de capítulo
quiz: 1

- title: 8. ¿Cómo solicitar ayuda?
sections:
- local: chapter8/1
Expand Down
Loading

0 comments on commit 70d6f6a

Please sign in to comment.