Skip to content

Navigation Menu

Explore
By company size
By use case
By industry
View all solutions
Topics
- AI
- DevOps
- Security
- Software Development
- View all
Explore
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
- Enterprise platform
  AI-powered developer platform
Available add-ons
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

SuperGreenHandRSOC / Newbie_step_llm Public

Notifications You must be signed in to change notification settings
Fork 0
Star 0

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Breadcrumbs

Newbie_step_llm

/

Log.md

Copy path

Latest commit

History

46 lines (41 loc) · 1.67 KB

Breadcrumbs

Newbie_step_llm

/

Log.md

File metadata and controls

46 lines (41 loc) · 1.67 KB

Doc.

Architecture
1. simple version & Basic idea
  1. overview
  2. Encoder, Decoder
  3. Attention
    - Coding: Attention
2. comprehensive version
  1. Encoder-only: BERT
    1. architecture
    2. pre-training
  2. Decoder-only: GPT
    1. Brief history: What's new, what's the differences
    2. pre-training
    - tokenizer, embedding, transformer block, im_head
  3. Encoder-Decoder
  4. Comparation
    - bert vs. GPT
    - decoder-only vs. encoder-decoder
  - Coding
Training process

step 1 transformer 基本框架

结果不同是算法写得不对还是随机数生成不同？
贴一下代码 + 写注释介绍思路
关于Attention反向传播的推导是否、如何有利于下文我对更细致分析的理解

step 2 tranformer 架构

transformer 基本框架和架构都是结构的问题，说明起来有什么不同吗（要pre的话要不并在一起讲吧）
- 记得包括预训练流程
Concepts/MLM more approaches
- Next Sentence Prediction
- reconstruct corrupted sentence
Encoder - Decoder: seems a simple conbination
pre-training vs. post training: have general methods for post-training but specific approaches for pre-training according to the architecture itself?
comparation: 还有其它原因吗

step 3 llama3

没细看
报告里到处找不到 max-length
了解更多预处理和infrastructure

Pending

The following section seems to have mentioned this as well, where's different - comprehensive the basic architecture of the transformers
记得看一下 flu shot learning 里面的链接的别的学习资料, and check with this one

Footer

© 2025 GitHub, Inc.

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.