This repo contains papers and relevant resources about large language models and multi-modal models. From foundation papers to downstream tasks such as trustworthy (E.g., Robustness, Privacy, and Fairness) and agent...
Note: It only records Papers for my personal needs :). It is welcome to open an issue if you think I missed some important or exciting work!
- Text-to-Image Synthesis: A Decade Survey. Arxiv'2024. [Paper]
- Adversarial attacks and defenses on text-to-image diffusion models: A survey. Information Fusion (2025). [Paper, GitHub]
- A Survey of Attacks on Large Vision-Language Models: Resources, Advances, and Future Trends. Arxiv'2024. [Paper], [GitHub]
- Holistic evaluation of language models. TMLR. Paper
- Holistic Evaluation of Text-to-Image Models. NeurIPS'2023. Paper
- A Survey on Evaluation of Large Language Models. Arxiv'2023. Paper
- A Survey of Large Language Models for Healthcare: from Data, Technology, and Applications to Accountability and Ethics. Arxiv'2023. Paper, GitHub
- A Survey on Multimodal Large Language Model. Arxiv'2023. Paper, GitHub
- Vision Language Models for Vision Tasks: A Survey. Arxiv'2023. Paper, GitHub
- Efficient Large Language Models: A Survey. Arxiv'2023. Paper, GitHub
- Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing. Arxiv'2021. Paper
- Safety of Multimodal Large Language Models on Images and Text. Arxiv'2024. Paper
- MM-LLMs: Recent Advances in MultiModal Large Language Models. Arxiv'2024. Paper
- A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications. Arxiv'2024. Paper
- A Survey on Large Language Model (LLM) Security and Privacy: The Good, the Bad, and the Ugly. Arxiv'2024. Paper
- Privacy in Large Language Models: Attacks, Defenses and Future Directions. Arxiv'2023. Paper
- Transformer: Attention Is All You Need. NIPS'2017. Paper
- GPT-1: Improving Language Understanding by Generative Pre-Training. 2018. Paper
- BERT: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. NAACL'2019. Paper
- GPT-2: Language Models are Unsupervised Multitask Learners. 2018. Paper
- RoBERTa: RoBERTa: A Robustly Optimized BERT Pretraining Approach. Arxiv'2019, Paper
- DistilBERT: DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. Arxiv'2019. Paper
- T5: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. JMLR'2020. Paper
- GPT-3: Language Models are Few-Shot Learners. NeurIPS'2020. Paper
- GLaM: GLaM: Efficient Scaling of Language Models with Mixture-of-Experts. ICML'2022. Paper
- PaLM: PaLM: Scaling Language Modeling with Pathways. ArXiv'2022. Paper
- BLOOM: BLOOM: A 176B-Parameter Open-Access Multilingual Language Model. Arxiv'2022. Paper
- BLOOMZ: Crosslingual Generalization through Multitask Finetuning. Arxiv'2023. Paper
- LLaMA: LLaMA: Open and Efficient Foundation Language Models. Arxiv'2023. Paper
- GPT-4: GPT-4 Technical Report. Arxiv'2023. Paper
- PaLM 2: PaLM 2 Technical Report. 2023. Paper
- Llama 2: Llama 2: Open foundation and fine-tuned chat models. Arxiv'2023. Paper
- Mistral: Mistral 7B. Arxiv'2023. Paper
- Phi1: Project Link
- Phi1.5: Project Link
- Phi2: Project Link
- Falcon: Project Link
- Llama 3: The Llama 3 Herd of Models. Arxiv'2024. Paper
- PPO: Proximal Policy Optimization Algorithms. Arxiv'2017. Paper
- DPO: Direct Preference Optimization: Your Language Model is Secretly a Reward Model. NeurIPS'2023. Paper
- LoRA: LoRA: Low-Rank Adaptation of Large Language Models. Arxiv'2021. Paper
- Q-LoRA: QLoRA: Efficient Finetuning of Quantized LLMs. NeurIPS'2023. Paper
- Med-PaLM: Large Language Models Encode Clinical Knowledge. Arxiv'2022. Paper
- MedAlpaca: MedAlpaca -- An Open-Source Collection of Medical Conversational AI Models and Training Data. Arxiv'2023. Paper
- Med-PaLM 2: Towards Expert-Level Medical Question Answering with Large Language Models. Arxiv'2023. Paper
- HuatuoGPT: HuatuoGPT, towards Taming Language Model to Be a Doctor. EMNLP'2023(findings). Paper
- GPT-4-Med: Capabilities of GPT-4 on Medical Challenge Problems. Arxiv'2023. Paper
- PET: Exploiting Cloze Questions for Few Shot Text Classification and Natural Language Inference. EACL'2021. Paper
- Making Pre-trained Language Models Better Few-shot Learners. ACL'2021. Paper
- Prompt-Tuning:The Power of Scale for Parameter-Efficient Prompt Tuning. EMNLP'2021 [Paper]
- Prefix-Tuning: Prefix-Tuning: Optimizing Continuous Prompts for Generation. ACL'2021. Paper
- P-tuning: P-Tuning: Prompt Tuning Can Be Comparable to Fine-tuning Across Scales and Tasks. ACL'2022. Paper
- P-tuning v2: P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks. Arxiv'2022. Paper
- Auto-Prompt: AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts. EMNLP'2020. Paper
- FluentPrompt: Toward Human Readable Prompt Tuning: Kubrick's The Shining is a good movie, and a good prompt too?. EMNLP'2023 (findings). Paper
- PEZ: Hard prompts made easy: Gradient-based discrete optimization for prompt tuning and discovery. Arxiv'2023. Paper
-
CLIP: Learning Transferable Visual Models From Natural Language Supervision. ICML'2021. Paper
-
DeCLIP: Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm. ICLR'2022. Paper
-
FILIP: FILIP: Fine-grained Interactive Language-Image Pre-Training. ICLR'2022. Paper
-
Stable Diffusion: High-Resolution Image Synthesis with Latent Diffusion Models. CVPR'2022. Paper
-
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation. ICML'2022. Paper
-
BLIP2: BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models. ICML'2023. Paper
-
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention. ICLR'2024. Paper
-
Llama-adapter v2: Parameter-efficient visual instruction model. arXiv'2023. Paper
-
LLaVA: Visual Instruction Tuning. NeurIPS'2023. Paper
-
LLaVA 1.5: Improved Baselines with Visual Instruction Tuning. CVPR'2024. Paper
-
Instruct BLIP: InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning. NeurIPS'2023. Paper
-
InternVL 1.0: InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks. CVPR'2024 (Oral). Paper
-
InternVL 1.5: How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites. Arxiv'2024. Tech Report
- SLD: Safe Latent Diffusion: Mitigating Inappropriate Degeneration in Diffusion Models. CVPR'2023. Paper
- ESD: Erasing Concepts from Diffusion Models. ICCV'2023. Paper
- UCE: Unified Concept Editing in Diffusion Models. Arxiv'2023. Paper
- POSI: Universal Prompt Optimizer for Safe Text-to-Image Generation. NAACL'2024. Paper
- Meta-Unlearning on Diffusion Models: Preventing Relearning Unlearned Concepts. Arxiv'2024. [Paper, GitHub]
- EIUP: EIUP: A Training-Free Approach to Erase Non-Compliant Concepts Conditioned on Implicit Unsafe Prompts. Arxiv'2024. Paper
- POPE: Evaluating Object Hallucination in Large Vision-Language Models. EMNLP'2023. Paper
- HallusionBench: An Advanced Diagnostic Suite for Entangled Language Hallucination and Visual Illusion in Large Vision-Language Models. CVPR'2024. Paper
- On the Adversarial Robustness of Multi-Modal Foundation Models. ICCV Workshop'2023. Paper
- Does CLIP Know My Face? Journal of Artificial Intelligence Research. Paper
- Membership Inference Attacks against Large Vision-Language Models. Paper
- GALLON: LLM and GNN are Complementary: Distilling LLM for Multimodal Graph Learning. Arxiv 2024. Paper
- Stanford Town: Generative Agents: Interactive Simulacra of Human Behavior. UIST'2023. Paper
- OSWorld: OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments Arxiv'2024. Paper
- Hugging Face course. https://huggingface.co/learn
- LLaMA Factory. https://GitHub.com/hiyouga/LLaMA-Factory
- DeepSpeed. https://GitHub.com/microsoft/DeepSpeed
- trlx. https://GitHub.com/CarperAI/trlx
- Prompt Engineering Update. https://GitHub.com/thunlp/PromptPapers