Skip to content

Recent papers and projects in multitask Learning, fine-tuning, and their applications

Notifications You must be signed in to change notification settings

VirtuosoResearch/Multitask-Learning-and-Fine-Tuning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Benchmarks for Reasoning Abilities of Large Language Models

Benchmarks:

Task Type Construction # Problems # Problem Types Problems Prompt style
GSM8k Arithmic reasoning of math computation steps using language Manually written language descriptions 8,500 4 addition, subtraction, multiplication, division Multi-step reasoing (Similar to chain-of-thoughts): The problems take between 2 and 8 steps to solve, as described by natural languages.
MATH Math problems from mathematics competitions 12,000 7 Multi-step reasoing (Similar to chain-of-thoughts)
MMLU High-school and college-level common knowledge 15,000 57 Multiple-choice; Few-show examples.
BBH Language and symbolic reasoning 6,500 23 Few shot chain-of-thought exemplars.
HumanEval Python programming problems with text comments and docstrings test cases Manually written programs 164
TheoremQA
SummEdits

Examples.

Example
GSM8K Question: Angelo and Melanie want to plan how many hours over the next week they should study together for their test next week. They have 2 chapters of their textbook to study and 4 worksheets to memorize. They figure out that they should dedicate 3 hours to each chapter of their textbook and 1.5 hours for each worksheet. If they plan to study no more than 4 hours each day, how many days should they plan to study total over the next week if they take a 10-minute break every hour, include 3 10-minute snack breaks each day, and 30 minutes for lunch each day?
Let's think step by step
Angelo and Melanie think they should dedicate 3 hours to each of the 2 chapters, 3 hours x 2 chapters = 6 hours total.
For the worksheets they plan to dedicate 1.5 hours for each worksheet, 1.5 hours x 4 worksheets = 6 hours total.
Angelo and Melanie need to start with planning 12 hours to study, at 4 hours a day, 12 / 4 = 3 days.
However, they need to include time for breaks and lunch. Every hour they want to include a 10-minute break, so 12 total hours x 10 minutes = 120 extra minutes for breaks.
They also want to include 3 10-minute snack breaks, 3 x 10 minutes = 30 minutes.
And they want to include 30 minutes for lunch each day, so 120 minutes for breaks + 30 minutes for snack breaks + 30 minutes for lunch = 180 minutes, or 180 / 60 minutes per hour = 3 extra hours.
So Angelo and Melanie want to plan 12 hours to study + 3 hours of breaks = 15 hours total.
They want to study no more than 4 hours each day, 15 hours / 4 hours each day = 3.75
They will need to plan to study 4 days to allow for all the time they need.
The answer is 4
MATH Question: The sum of two numbers is 6. The difference of their squares is 12. What is the positive difference of the two numbers?
Let's think step by step
Call the two numbers $x$ and $y$.
We are given that $x+y = 6$ and $x^2 - y^2 = 12$.
Because $x^2 - y^2$ factors into $(x+y)(x-y)$,
we can substitute in for $x+y$,
giving $6(x-y) = 12$,
or $x-y = \boxed{2}$.
The answer is 2
MMLU The following are multiple choice questions (with answers) about abstract algebra.
Find all c in Z_3 such that Z_3[x]/(x^2 + c) is a field.
A. 0
B. 1
C. 2
D. 3
Answer: B
Statement 1: Every function from a finite set onto itself must be one to one. Statement 2: Every subgroup of an abelian group is abelian.
A. True, True
B. False, False
C. True, False
D. False, True
Answer: A
Find the degree for the given field extension Q(sqrt(2), sqrt(3), sqrt(18)) over Q.
A. 0
B. 4
C. 2
D. 6
Answer: B
BBH Input: "If you follow these instructions, do you return to the starting point? Always face forward. Take 1 step backward. Take 9 steps left. Take 2 steps backward. Take 6 steps forward. Take 4 steps forward. Take 4 steps backward. Take 3 steps right. Options: Yes or No",
Let's think step by step
Target: "No"
HumanEval Prompt: def incr_list(l: list):<br/> """Return list with elements incremented by 1. <br/> >>> incr_list([1, 2, 3])<br/> [2, 3, 4]<br/> >>> incr_list([5, 3, 5, 2, 3, 3, 9, 0])<br/> [6, 4, 6, 3, 4, 4, 10, 1]<br/> """
Output: return [i+1 for i in l]

Concepts:

  • Fine-tuning: Use the language modeling objective to further train a pretrained language model.
  • Verification: First train a generator by question-solution pairs. Then, sample multiple generated solutions, assign each solution a score (binary scores of whether the solution leads to the correct answer), and train a model by the scores. A model trained by the verification scores is called verifier.
    • At test time, we sample solutions to each test problem, rank them with the verifier, and then return the one with the highest verifier score.

Prompting methods:

Prompt strategy Prompts In-context examples Prompt generation GSM8k
Metric: Solve rate (%); Model: Codex
Scratchpad (Nye et al., 2021) Break a code function down and ask the model to output all intermediate steps of the code (input, intermediate steps, output) Manually designed based on an algorithm ?
Chain-of-though prompting (Wei et al., 2022) Prompt the model with the rationale in solving a multi-step reasoning problem. (input, chain-of-thought, output) Manually written 63.1
Algotihmic prompting (Zhou et al., 2022) Prompt the model with detailed rationales, including describing the steps within an algorithm. (input, algorithmic prompt, output) Manually written 82.7

Multi-Task Learning

End-to-End Multi-Task Learning with Attention. CVPR 2019. paper

Latent Multi-task Architecture Learning. AAAI 2019. paper

Cross-stitch Networks for Multi-task Learning. CVPR 2016. paper

Learning Multiple Tasks with Multilinear Relationship Networks. NIPS 2017. paper

More multitask learning papers here

Meta Learning

Survey

Meta-Learning in Neural Networks: A Survey. paper

Black-Box Approaches

Recurrent Neural Network

(MANN) Meta-learning with memory-augmented neural networks. ICML 2016. paper

Attention-Based Network

Matching Networks for One-Shot Learning. NIPS 2016. paper

(SNAIL) A Simple Neural Attentive Meta-Learner. ICLR 2018. paper

Optimization-Based Methods

(MAML) Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks. ICML 2017. paper

(Reptile; First-order method) On First-Order Meta-Learning Algorithms. arXiv 2018. paper

Other Forms of Prior on MAML

(Implicit MAML) Meta-Learning with Implicit Gradients. NIPS 2019. paper

(Implicit Differentiation; SVM) Meta-Learning with Differentiable Convex Optimization. CVPR 2019. paper

(Bayesian linear regression) Meta-Learning Priors for Efficient Online Bayesian Regression. Workshop on the Algorithmic Foundations of Robotics 2018. paper

(Ridge regression; Logistic regression) Meta-learning with Differentiable Closed-Form Solvers. ICLR 2019. paper

Understanding MAML

(MAML expressive power and university) Meta-Learning and Universality: Deep Representations and Gradient Descent can Approximate any Learning Algorithm. ICLR 2018. paper

(Map MAML to Bayes Framework) Recasting Gradient-Based Meta-Learning as Hierarchical Bayes. ICLR 2018. paper

Tricks to Optimize MAML

Choose architecture that is effective for inner gradient-step

Auto-Meta: Automated Gradient Based Meta Learner Search. NIPS 2018 Workshop on Meta-Learning. paper

Automatically learn inner vector learning rate, tune outer learning rate

Alpha MAML: Adaptive Model-Agnostic Meta-Learning. ICML 2019 Workshop on Automated Machine Learning. paper

Meta-SGD: Learning to Learn Quickly for Few-Shot Learning. arXiv 2017. paper

Optimize only a subset of the parameters in the inner loop

(DEML) Deep Meta-Learning: Learning to Learn in the Concept Space. arXiv 2018. paper

(CAVIA) Fast Context Adaptation via Meta-Learning. ICML 2019. paper

Decouple inner learning rate, BN statistics per-step

(MAML++) How to train your MAML. ICLR 2019. paper

Introduce context variables for increased expressive power

(CAVIA) Fast Context Adaptation via Meta-Learning. ICML 2019. paper

(Bias transformation) Meta-Learning and Universality: Deep Representations and Gradient Descent can Approximate any Learning Algorithm. ICLR 2018. paper

Non-Parametric Methods via Metric Learning

Siamese Neural Networks for One-shot Image Recognition. ICML 2015. paper

Matching Networks for One Shot Learning. NIPS 2016. paper

Prototypical Networks for Few-shot Learning. NIPS 2017. paper

Learn non-linear relation module on embeddings

Learning to Compare: Relation Network for Few-Shot Learning. CVPR 2018. paper

Learn infinite mixture of prototypes

Infinite Mixture Prototypes for Few-Shot Learning. ICML 2019. paper

Perform message passing on embeddings

Few-Shot Learning with Graph Neural Networks ICLR 2018. paper

Bayesian Meta-Learning & Generative Models

Amortized Inference

Amortized Bayesian Meta-Learning. ICLR 2019. paper

Ensemble Method

Bayesian Model-Agnostic Meta-Learning. NIPS 2018. paper

Sampling & Hybrid Inference

Probabilistic Model-Agnostic Meta-Learning. NIPS 2018. paper

Meta-Learning Probabilistic Inference for Prediction. ICLR 2019. paper

Hybrid meta-learning approaches

Meta-Learning with Latent Embedding Optimization. ICLR 2019. paper

Fast Context Adaptation via Meta-Learning. ICML 2019. paper

Meta-Dataset: A Dataset of Datasets for Learning to Learn from Few Examples. ICLR 2020. paper

Few-Shot Learning with Graph Neural Networks. ICLR 2018. paper

(CAML) Learning to Learn with Conditional Class Dependencies. ICLR 2019. paper

Meta Reinforcement Learning

Policy Gradient RL

MAML and Black-Box Meta Learning Approaches can be directly applied to Policy-Gradient RL methods

Value-Based RL

It is not easy to applied existing meta learning approaches to Value-Based RL because Value-Based RL is dynamic programming method

Meta-Q-Learning. ICLR 2020. paper

(Goal-Conditioned RL with hindsight relabeling)/(Multi-Task RL) Hindsight Experience Replay. NIPS 2017. paper

(better learning) Learning Latent Plans from Play. CoRL 2019. paper

(learn a better goal representation)

Universal Planning Networks. ICML 2018. paper

Unsupervised Visuomotor Control through Distributional Planning Networks. RSS 2019. paper

Applications

Meta-Learning for Low-Resource Neural Machine Translation. EMNLP 2018. paper

Few-shot Autoregressive Density Estimation: Towards Learning to Learn Distributions. ICLR 2018. paper

One-Shot Imitation Learning. NIPS 2017. paper

Massively Multitask Networks for Drug Discovery. ICML 2015. paper

About

Recent papers and projects in multitask Learning, fine-tuning, and their applications

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published