LLM-and-VLM-Paper-List

This repo contains papers and relevant resources about large language models and multi-modal models. From foundation papers to downstream tasks such as trustworthy (E.g., Robustness, Privacy, and Fairness) and agent...
Note: It only records Papers for my personal needs :). It is welcome to open an issue if you think I missed some important or exciting work!

Survey

Text-to-Image Synthesis: A Decade Survey. Arxiv'2024. [Paper]
Adversarial attacks and defenses on text-to-image diffusion models: A survey. Information Fusion (2025). [Paper, GitHub]
A Survey of Attacks on Large Vision-Language Models: Resources, Advances, and Future Trends. Arxiv'2024. [Paper], [GitHub]
Holistic evaluation of language models. TMLR. Paper
Holistic Evaluation of Text-to-Image Models. NeurIPS'2023. Paper
A Survey on Evaluation of Large Language Models. Arxiv'2023. Paper
A Survey of Large Language Models for Healthcare: from Data, Technology, and Applications to Accountability and Ethics. Arxiv'2023. Paper, GitHub
A Survey on Multimodal Large Language Model. Arxiv'2023. Paper, GitHub
Vision Language Models for Vision Tasks: A Survey. Arxiv'2023. Paper, GitHub
Efficient Large Language Models: A Survey. Arxiv'2023. Paper, GitHub
Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing. Arxiv'2021. Paper
Safety of Multimodal Large Language Models on Images and Text. Arxiv'2024. Paper
MM-LLMs: Recent Advances in MultiModal Large Language Models. Arxiv'2024. Paper
A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications. Arxiv'2024. Paper
A Survey on Large Language Model (LLM) Security and Privacy: The Good, the Bad, and the Ugly. Arxiv'2024. Paper
Privacy in Large Language Models: Attacks, Defenses and Future Directions. Arxiv'2023. Paper

Language Model

Foundation LM Models

Transformer: Attention Is All You Need. NIPS'2017. Paper
GPT-1: Improving Language Understanding by Generative Pre-Training. 2018. Paper
BERT: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. NAACL'2019. Paper
GPT-2: Language Models are Unsupervised Multitask Learners. 2018. Paper
RoBERTa: RoBERTa: A Robustly Optimized BERT Pretraining Approach. Arxiv'2019, Paper
DistilBERT: DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. Arxiv'2019. Paper
T5: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. JMLR'2020. Paper
GPT-3: Language Models are Few-Shot Learners. NeurIPS'2020. Paper
GLaM: GLaM: Efficient Scaling of Language Models with Mixture-of-Experts. ICML'2022. Paper
PaLM: PaLM: Scaling Language Modeling with Pathways. ArXiv'2022. Paper
BLOOM: BLOOM: A 176B-Parameter Open-Access Multilingual Language Model. Arxiv'2022. Paper
BLOOMZ: Crosslingual Generalization through Multitask Finetuning. Arxiv'2023. Paper
LLaMA: LLaMA: Open and Efficient Foundation Language Models. Arxiv'2023. Paper
GPT-4: GPT-4 Technical Report. Arxiv'2023. Paper
PaLM 2: PaLM 2 Technical Report. 2023. Paper
Llama 2: Llama 2: Open foundation and fine-tuned chat models. Arxiv'2023. Paper
Mistral: Mistral 7B. Arxiv'2023. Paper
Phi1: Project Link
Phi1.5: Project Link
Phi2: Project Link
Falcon: Project Link
Llama 3: The Llama 3 Herd of Models. Arxiv'2024. Paper

RLHF

PPO: Proximal Policy Optimization Algorithms. Arxiv'2017. Paper
DPO: Direct Preference Optimization: Your Language Model is Secretly a Reward Model. NeurIPS'2023. Paper

Parameter Efficient Fine-tuning

LoRA: LoRA: Low-Rank Adaptation of Large Language Models. Arxiv'2021. Paper
Q-LoRA: QLoRA: Efficient Finetuning of Quantized LLMs. NeurIPS'2023. Paper

Healthcare LM

Med-PaLM: Large Language Models Encode Clinical Knowledge. Arxiv'2022. Paper
MedAlpaca: MedAlpaca -- An Open-Source Collection of Medical Conversational AI Models and Training Data. Arxiv'2023. Paper
Med-PaLM 2: Towards Expert-Level Medical Question Answering with Large Language Models. Arxiv'2023. Paper
HuatuoGPT: HuatuoGPT, towards Taming Language Model to Be a Doctor. EMNLP'2023(findings). Paper
GPT-4-Med: Capabilities of GPT-4 on Medical Challenge Problems. Arxiv'2023. Paper

Watermarking LLM

Prompt Engineering in LLM

Hard Prompt

PET: Exploiting Cloze Questions for Few Shot Text Classification and Natural Language Inference. EACL'2021. Paper
Making Pre-trained Language Models Better Few-shot Learners. ACL'2021. Paper

Soft Prompt

Prompt-Tuning:The Power of Scale for Parameter-Efficient Prompt Tuning. EMNLP'2021 [Paper]
Prefix-Tuning: Prefix-Tuning: Optimizing Continuous Prompts for Generation. ACL'2021. Paper
P-tuning: P-Tuning: Prompt Tuning Can Be Comparable to Fine-tuning Across Scales and Tasks. ACL'2022. Paper
P-tuning v2: P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks. Arxiv'2022. Paper

Between Soft and Hard

Auto-Prompt: AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts. EMNLP'2020. Paper
FluentPrompt: Toward Human Readable Prompt Tuning: Kubrick's The Shining is a good movie, and a good prompt too?. EMNLP'2023 (findings). Paper
PEZ: Hard prompts made easy: Gradient-based discrete optimization for prompt tuning and discovery. Arxiv'2023. Paper

Multi-modal Models

Foundation Multi-Modal Models

CLIP: Learning Transferable Visual Models From Natural Language Supervision. ICML'2021. Paper
DeCLIP: Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm. ICLR'2022. Paper
FILIP: FILIP: Fine-grained Interactive Language-Image Pre-Training. ICLR'2022. Paper
Stable Diffusion: High-Resolution Image Synthesis with Latent Diffusion Models. CVPR'2022. Paper
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation. ICML'2022. Paper
BLIP2: BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models. ICML'2023. Paper
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention. ICLR'2024. Paper
Llama-adapter v2: Parameter-efficient visual instruction model. arXiv'2023. Paper
LLaVA: Visual Instruction Tuning. NeurIPS'2023. Paper
LLaVA 1.5: Improved Baselines with Visual Instruction Tuning. CVPR'2024. Paper
Instruct BLIP: InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning. NeurIPS'2023. Paper
InternVL 1.0: InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks. CVPR'2024 (Oral). Paper
InternVL 1.5: How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites. Arxiv'2024. Tech Report

T2I Concept Removal or Safety

SLD: Safe Latent Diffusion: Mitigating Inappropriate Degeneration in Diffusion Models. CVPR'2023. Paper
ESD: Erasing Concepts from Diffusion Models. ICCV'2023. Paper
UCE: Unified Concept Editing in Diffusion Models. Arxiv'2023. Paper
POSI: Universal Prompt Optimizer for Safe Text-to-Image Generation. NAACL'2024. Paper
Meta-Unlearning on Diffusion Models: Preventing Relearning Unlearned Concepts. Arxiv'2024. [Paper, GitHub]
EIUP: EIUP: A Training-Free Approach to Erase Non-Compliant Concepts Conditioned on Implicit Unsafe Prompts. Arxiv'2024. Paper

LVLM Hallucinations

POPE: Evaluating Object Hallucination in Large Vision-Language Models. EMNLP'2023. Paper
HallusionBench: An Advanced Diagnostic Suite for Entangled Language Hallucination and Visual Illusion in Large Vision-Language Models. CVPR'2024. Paper

LVLM Adversarial Attack

On the Adversarial Robustness of Multi-Modal Foundation Models. ICCV Workshop'2023. Paper

LVLM Privacy

Does CLIP Know My Face? Journal of Artificial Intelligence Research. Paper
Membership Inference Attacks against Large Vision-Language Models. Paper

Prompt Engineering in VLM

AI for Science

GALLON: LLM and GNN are Complementary: Distilling LLM for Multimodal Graph Learning. Arxiv 2024. Paper

Agent

LLM-based Agent

Stanford Town: Generative Agents: Interactive Simulacra of Human Behavior. UIST'2023. Paper

VLM-based Agent

OSWorld: OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments Arxiv'2024. Paper

Useful-Resource

Hugging Face course. https://huggingface.co/learn
LLaMA Factory. https://GitHub.com/hiyouga/LLaMA-Factory
DeepSpeed. https://GitHub.com/microsoft/DeepSpeed
trlx. https://GitHub.com/CarperAI/trlx
Prompt Engineering Update. https://GitHub.com/thunlp/PromptPapers

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LLM-and-VLM-Paper-List

Table of Contents

Survey

Language Model

Foundation LM Models

RLHF

Parameter Efficient Fine-tuning

Healthcare LM

Watermarking LLM

Prompt Engineering in LLM

Hard Prompt

Soft Prompt

Between Soft and Hard

Multi-modal Models

Foundation Multi-Modal Models

T2I Concept Removal or Safety

LVLM Hallucinations

LVLM Adversarial Attack

LVLM Privacy

Prompt Engineering in VLM

AI for Science

Agent

LLM-based Agent

VLM-based Agent

Useful-Resource

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Wu-Zongyu/Awesome-LLM-and-Multimodal

Folders and files

Latest commit

History

Repository files navigation

LLM-and-VLM-Paper-List

Table of Contents

Survey

Language Model

Foundation LM Models

RLHF

Parameter Efficient Fine-tuning

Healthcare LM

Watermarking LLM

Prompt Engineering in LLM

Hard Prompt

Soft Prompt

Between Soft and Hard

Multi-modal Models

Foundation Multi-Modal Models

T2I Concept Removal or Safety

LVLM Hallucinations

LVLM Adversarial Attack

LVLM Privacy

Prompt Engineering in VLM

AI for Science

Agent

LLM-based Agent

VLM-based Agent

Useful-Resource

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Packages