Skip to content

Commit cef0574

Browse files
committed
2023/02/22
1 parent 87f9bda commit cef0574

File tree

69 files changed

+2009
-68
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

69 files changed

+2009
-68
lines changed

README.md

Lines changed: 68 additions & 68 deletions
Large diffs are not rendered by default.
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
https://arxiv.org/abs/2102.12092
2+
3+
Zero-Shot Text-to-Image Generation (Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea Voss, Alec Radford, Mark Chen, Ilya Sutskever)
4+
5+
openai의 학습 팁들이 들어있음. powersgd, 16bit 회피 등.

papers/2021/211028 Colossal-AI.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
https://arxiv.org/abs/2110.14883
2+
3+
Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training (Shenggui Li, Jiarui Fang, Zhengda Bian, Hongxin Liu, Yuliang Liu, Haichen Huang, Boxiang Wang, Yang You)
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
https://arxiv.org/abs/2211.14275
2+
3+
Solving math word problems with process- and outcome-based feedback (Jonathan Uesato, Nate Kushman, Ramana Kumar, Francis Song, Noah Siegel, Lisa Wang, Antonia Creswell, Geoffrey Irving, Irina Higgins)
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
https://arxiv.org/abs/2212.01757
2+
3+
Languages You Know Influence Those You Learn: Impact of Language Characteristics on Multi-Lingual Text-to-Text Transfer (Benjamin Muller, Deepanshu Gupta, Siddharth Patwardhan, Jean-Philippe Fauconnier, David Vandyke, Sachin Agarwal)
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
https://arxiv.org/abs/2212.08073
2+
3+
Constitutional AI: Harmlessness from AI Feedback (Yuntao Bai, Saurav Kadavath, Sandipan Kundu, Amanda Askell, Jackson Kernion, Andy Jones, Anna Chen, Anna Goldie, Azalia Mirhoseini, Cameron McKinnon, Carol Chen, Catherine Olsson, Christopher Olah, Danny Hernandez, Dawn Drain, Deep Ganguli, Dustin Li, Eli Tran-Johnson, Ethan Perez, Jamie Kerr, Jared Mueller, Jeffrey Ladish, Joshua Landau, Kamal Ndousse, Kamile Lukosuite, Liane Lovitt, Michael Sellitto, Nelson Elhage, Nicholas Schiefer, Noemi Mercado, Nova DasSarma, Robert Lasenby, Robin Larson, Sam Ringer, Scott Johnston, Shauna Kravec, Sheer El Showk, Stanislav Fort, Tamera Lanham, Timothy Telleen-Lawton, Tom Conerly, Tom Henighan, Tristan Hume, Samuel R. Bowman, Zac Hatfield-Dodds, Ben Mann, Dario Amodei, Nicholas Joseph, Sam McCandlish, Tom Brown, Jared Kaplan)
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
https://arxiv.org/abs/2302.03528
2+
3+
Efficiently Upgrading Multilingual Machine Translation Models to Support More Languages (Simeng Sun, Maha Elbayad, Anna Sun, James Cross)
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
https://arxiv.org/abs/2302.04542
2+
3+
Efficient Attention via Control Variates (Lin Zheng, Jianbo Yuan, Chong Wang, Lingpeng Kong)
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
https://arxiv.org/abs/2302.04931
2+
3+
In-Context Learning with Many Demonstration Examples (Mukai Li, Shansan Gong, Jiangtao Feng, Yiheng Xu, Jun Zhang, Zhiyong Wu, Lingpeng Kong)
4+
5+
transformer lm의 context length가 그리 넉넉하지 않은데 in-context learning을 위해 프롬프트와 예제를 추가하다보면 더 부족해지죠. 그 문제에 대한 제안이네요. 사실 핵심은 long-range efficient attention과 efficient attention의 locality + ciruclar positional embedding을 사용한 length extrapolation입니다. 여기서 efficient attention으로는 EVA (https://arxiv.org/abs/2302.04542) 를 썼습니다. (놀랍게도 저자가 한 명 빼고 안 겹치네요.)
6+
7+
EVA 논문의 설명이 복잡하긴 한데 이 논문의 요약을 따르자면 청킹 / 청크 밖 원격 feature 내에 efficient attention과 pooling, remote feature와 청크 내 feature에 대한 plain attention 조합이군요. 이 설명으로는 window attention + long rang feature의 조합으로 보이네요.
8+
9+
지금 openai나 구글에서는 context length 문제에 대해 어떻게 대응하고 있는지 궁금하긴 합니다. (그냥 gpt-3 그대로일까요?)
10+
11+
#efficient_attention #transformer
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
https://arxiv.org/abs/2302.05206
2+
3+
The Wisdom of Hindsight Makes Language Models Better Instruction Followers (Tianjun Zhang, Fangchen Liu, Justin Wong, Pieter Abbeel, Joseph E. Gonzalez)
4+
5+
rl 없이 instruct tuning을 해보자. instruction prompt / query에 모델로 샘플링한 answer로 triplet을 만든 다음 이 answer의 스코어가 높아지도록 instruction prompt를 편집하는 방식으로 작동하는군요. 스코어 평가는 그렇다 치고 instruction prompt를 편집하는 것이 문제인데 여기서는 prompt 자체를 정답 생성 / 오답 생성으로 만들고 편집은 negation을 취하는 방식으로 했습니다. 흠.
6+
7+
#instruct #reinforcement_learning

0 commit comments

Comments
 (0)