Skip to content
Change the repository type filter

All

    Repositories list

    • agentdojo

      Public
      A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents.
      Python
      9335844Updated Oct 29, 2025Oct 29, 2025
    • Jupyter Notebook
      0200Updated Oct 15, 2025Oct 15, 2025
    • Python
      0500Updated Oct 13, 2025Oct 13, 2025
    • Check for probably-hallucinated references in arxiv papers
      Python
      0100Updated Sep 5, 2025Sep 5, 2025
    • Python
      02200Updated Aug 7, 2025Aug 7, 2025
    • RealMath

      Public
      Python
      11500Updated May 23, 2025May 23, 2025
    • Python
      33310Updated May 21, 2025May 21, 2025
    • Core code for AgentDojo
      Python
      0000Updated May 14, 2025May 14, 2025
    • llm_lab

      Public
      Python
      0000Updated Apr 15, 2025Apr 15, 2025
    • Blind-MIA

      Public
      This is the official code for Blind Baselines Beat Membership Inference Attacks for Foundation Models
      Python
      0110Updated Mar 29, 2025Mar 29, 2025
    • 0000Updated Feb 6, 2025Feb 6, 2025
    • Repository for the VMI Summer Retreat Workshop on Hacking AI Agents
      Python
      0100Updated Jan 18, 2025Jan 18, 2025
    • Official code for "Measuring Non-Adversarial Reproduction of Training Data in Large Language Models" (https://arxiv.org/abs/2411.10242)
      Jupyter Notebook
      1800Updated Nov 18, 2024Nov 18, 2024
    • Python
      42500Updated Oct 6, 2024Oct 6, 2024
    • .github

      Public
      0000Updated Jul 5, 2024Jul 5, 2024
    • Python
      24500Updated Jun 19, 2024Jun 19, 2024
    • Finding trojans in aligned LLMs. Official repository for the competition hosted at SaTML 2024.
      Python
      911510Updated Jun 13, 2024Jun 13, 2024
    • Python
      0100Updated Jun 13, 2024Jun 13, 2024
    • Official code for "Evaluations of Machine Learning Privacy Defenses are Misleading" (https://arxiv.org/abs/2404.17399)
      Jupyter Notebook
      31000Updated Apr 29, 2024Apr 29, 2024
    • Playing around with the CC3M data
      Python
      0000Updated Apr 29, 2024Apr 29, 2024
    • Code for paper "Universal Jailbreak Backdoors from Poisoned Human Feedback"
      Python
      96240Updated Apr 24, 2024Apr 24, 2024
    • Code for the paper "Evading Black-box Classifiers Without Breaking Eggs" [SaTML 2024]
      Python
      12100Updated Apr 15, 2024Apr 15, 2024
    • Data for "Quantifying Memorization Across Neural Language Models"
      0720Updated Mar 26, 2024Mar 26, 2024
    • Code used to run the platform for the LLM CTF colocated with SaTML 2024
      Python
      72700Updated Mar 20, 2024Mar 20, 2024
    • Python
      0100Updated Nov 14, 2023Nov 14, 2023
    • Python
      23000Updated Jun 19, 2023Jun 19, 2023
    • privacy

      Public
      Library for training machine learning models with privacy for training data
      Python
      469000Updated Jun 13, 2023Jun 13, 2023
    • Certified robustness "for free" using off-the-shelf diffusion models and classifiers
      Python
      44430Updated May 25, 2023May 25, 2023
    • Datasets for the SATML 2023 competition on training data extraction
      0510Updated Aug 24, 2022Aug 24, 2022