Skip to content

A paper list about large language models and multimodal models (Diffusion, VLM). From foundations to applications. It is only used to record papers for my personal needs.

Notifications You must be signed in to change notification settings

Wu-Zongyu/Awesome-LLM-and-Multimodal

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 

Repository files navigation

LLM-and-VLM-Paper-List

This repo contains papers and relevant resources about large language models and multi-modal models. From foundation papers to downstream tasks such as trustworthy (E.g., Robustness, Privacy, and Fairness) and agent...
Note: It only records Papers for my personal needs :). It is welcome to open an issue if you think I missed some important or exciting work!

Table of Contents

Survey

  • Text-to-Image Synthesis: A Decade Survey. Arxiv'2024. [Paper]
  • Adversarial attacks and defenses on text-to-image diffusion models: A survey. Information Fusion (2025). [Paper, GitHub]
  • A Survey of Attacks on Large Vision-Language Models: Resources, Advances, and Future Trends. Arxiv'2024. [Paper], [GitHub]
  • Holistic evaluation of language models. TMLR. Paper
  • Holistic Evaluation of Text-to-Image Models. NeurIPS'2023. Paper
  • A Survey on Evaluation of Large Language Models. Arxiv'2023. Paper
  • A Survey of Large Language Models for Healthcare: from Data, Technology, and Applications to Accountability and Ethics. Arxiv'2023. Paper, GitHub
  • A Survey on Multimodal Large Language Model. Arxiv'2023. Paper, GitHub
  • Vision Language Models for Vision Tasks: A Survey. Arxiv'2023. Paper, GitHub
  • Efficient Large Language Models: A Survey. Arxiv'2023. Paper, GitHub
  • Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing. Arxiv'2021. Paper
  • Safety of Multimodal Large Language Models on Images and Text. Arxiv'2024. Paper
  • MM-LLMs: Recent Advances in MultiModal Large Language Models. Arxiv'2024. Paper
  • A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications. Arxiv'2024. Paper
  • A Survey on Large Language Model (LLM) Security and Privacy: The Good, the Bad, and the Ugly. Arxiv'2024. Paper
  • Privacy in Large Language Models: Attacks, Defenses and Future Directions. Arxiv'2023. Paper

Language Model

Foundation LM Models

  • Transformer: Attention Is All You Need. NIPS'2017. Paper
  • GPT-1: Improving Language Understanding by Generative Pre-Training. 2018. Paper
  • BERT: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. NAACL'2019. Paper
  • GPT-2: Language Models are Unsupervised Multitask Learners. 2018. Paper
  • RoBERTa: RoBERTa: A Robustly Optimized BERT Pretraining Approach. Arxiv'2019, Paper
  • DistilBERT: DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. Arxiv'2019. Paper
  • T5: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. JMLR'2020. Paper
  • GPT-3: Language Models are Few-Shot Learners. NeurIPS'2020. Paper
  • GLaM: GLaM: Efficient Scaling of Language Models with Mixture-of-Experts. ICML'2022. Paper
  • PaLM: PaLM: Scaling Language Modeling with Pathways. ArXiv'2022. Paper
  • BLOOM: BLOOM: A 176B-Parameter Open-Access Multilingual Language Model. Arxiv'2022. Paper
  • BLOOMZ: Crosslingual Generalization through Multitask Finetuning. Arxiv'2023. Paper
  • LLaMA: LLaMA: Open and Efficient Foundation Language Models. Arxiv'2023. Paper
  • GPT-4: GPT-4 Technical Report. Arxiv'2023. Paper
  • PaLM 2: PaLM 2 Technical Report. 2023. Paper
  • Llama 2: Llama 2: Open foundation and fine-tuned chat models. Arxiv'2023. Paper
  • Mistral: Mistral 7B. Arxiv'2023. Paper
  • Phi1: Project Link
  • Phi1.5: Project Link
  • Phi2: Project Link
  • Falcon: Project Link
  • Llama 3: The Llama 3 Herd of Models. Arxiv'2024. Paper

RLHF

  • PPO: Proximal Policy Optimization Algorithms. Arxiv'2017. Paper
  • DPO: Direct Preference Optimization: Your Language Model is Secretly a Reward Model. NeurIPS'2023. Paper

Parameter Efficient Fine-tuning

  • LoRA: LoRA: Low-Rank Adaptation of Large Language Models. Arxiv'2021. Paper
  • Q-LoRA: QLoRA: Efficient Finetuning of Quantized LLMs. NeurIPS'2023. Paper

Healthcare LM

  • Med-PaLM: Large Language Models Encode Clinical Knowledge. Arxiv'2022. Paper
  • MedAlpaca: MedAlpaca -- An Open-Source Collection of Medical Conversational AI Models and Training Data. Arxiv'2023. Paper
  • Med-PaLM 2: Towards Expert-Level Medical Question Answering with Large Language Models. Arxiv'2023. Paper
  • HuatuoGPT: HuatuoGPT, towards Taming Language Model to Be a Doctor. EMNLP'2023(findings). Paper
  • GPT-4-Med: Capabilities of GPT-4 on Medical Challenge Problems. Arxiv'2023. Paper

Watermarking LLM

Prompt Engineering in LLM

Hard Prompt

  • PET: Exploiting Cloze Questions for Few Shot Text Classification and Natural Language Inference. EACL'2021. Paper
  • Making Pre-trained Language Models Better Few-shot Learners. ACL'2021. Paper

Soft Prompt

  • Prompt-Tuning:The Power of Scale for Parameter-Efficient Prompt Tuning. EMNLP'2021 [Paper]
  • Prefix-Tuning: Prefix-Tuning: Optimizing Continuous Prompts for Generation. ACL'2021. Paper
  • P-tuning: P-Tuning: Prompt Tuning Can Be Comparable to Fine-tuning Across Scales and Tasks. ACL'2022. Paper
  • P-tuning v2: P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks. Arxiv'2022. Paper

Between Soft and Hard

  • Auto-Prompt: AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts. EMNLP'2020. Paper
  • FluentPrompt: Toward Human Readable Prompt Tuning: Kubrick's The Shining is a good movie, and a good prompt too?. EMNLP'2023 (findings). Paper
  • PEZ: Hard prompts made easy: Gradient-based discrete optimization for prompt tuning and discovery. Arxiv'2023. Paper

Multi-modal Models

Foundation Multi-Modal Models

  • CLIP: Learning Transferable Visual Models From Natural Language Supervision. ICML'2021. Paper

  • DeCLIP: Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm. ICLR'2022. Paper

  • FILIP: FILIP: Fine-grained Interactive Language-Image Pre-Training. ICLR'2022. Paper

  • Stable Diffusion: High-Resolution Image Synthesis with Latent Diffusion Models. CVPR'2022. Paper

  • BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation. ICML'2022. Paper

  • BLIP2: BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models. ICML'2023. Paper

  • LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention. ICLR'2024. Paper

  • Llama-adapter v2: Parameter-efficient visual instruction model. arXiv'2023. Paper

  • LLaVA: Visual Instruction Tuning. NeurIPS'2023. Paper

  • LLaVA 1.5: Improved Baselines with Visual Instruction Tuning. CVPR'2024. Paper

  • Instruct BLIP: InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning. NeurIPS'2023. Paper

  • InternVL 1.0: InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks. CVPR'2024 (Oral). Paper

  • InternVL 1.5: How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites. Arxiv'2024. Tech Report

T2I Concept Removal or Safety

  • SLD: Safe Latent Diffusion: Mitigating Inappropriate Degeneration in Diffusion Models. CVPR'2023. Paper
  • ESD: Erasing Concepts from Diffusion Models. ICCV'2023. Paper
  • UCE: Unified Concept Editing in Diffusion Models. Arxiv'2023. Paper
  • POSI: Universal Prompt Optimizer for Safe Text-to-Image Generation. NAACL'2024. Paper
  • Meta-Unlearning on Diffusion Models: Preventing Relearning Unlearned Concepts. Arxiv'2024. [Paper, GitHub]
  • EIUP: EIUP: A Training-Free Approach to Erase Non-Compliant Concepts Conditioned on Implicit Unsafe Prompts. Arxiv'2024. Paper

LVLM Hallucinations

  • POPE: Evaluating Object Hallucination in Large Vision-Language Models. EMNLP'2023. Paper
  • HallusionBench: An Advanced Diagnostic Suite for Entangled Language Hallucination and Visual Illusion in Large Vision-Language Models. CVPR'2024. Paper

LVLM Adversarial Attack

  • On the Adversarial Robustness of Multi-Modal Foundation Models. ICCV Workshop'2023. Paper

LVLM Privacy

  • Does CLIP Know My Face? Journal of Artificial Intelligence Research. Paper
  • Membership Inference Attacks against Large Vision-Language Models. Paper

Prompt Engineering in VLM

AI for Science

  • GALLON: LLM and GNN are Complementary: Distilling LLM for Multimodal Graph Learning. Arxiv 2024. Paper

Agent

LLM-based Agent

  • Stanford Town: Generative Agents: Interactive Simulacra of Human Behavior. UIST'2023. Paper

VLM-based Agent

  • OSWorld: OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments Arxiv'2024. Paper

Useful-Resource

About

A paper list about large language models and multimodal models (Diffusion, VLM). From foundations to applications. It is only used to record papers for my personal needs.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •