SPY Lab

All

29 repositories

agentdojo
Public
A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents.
security benchmark large-language-models prompt-injection
Python
•
MIT License
•93•358•4•4•Updated Oct 29, 2025Oct 29, 2025
modal-aphasia
Public
Jupyter Notebook
•0•2•0•0•Updated Oct 15, 2025Oct 15, 2025
infoseclab_25
Public
Python
•0•5•0•0•Updated Oct 13, 2025Oct 13, 2025
hallucinated-citations
Public
Check for probably-hallucinated references in arxiv papers
Python
•
MIT License
•0•1•0•0•Updated Sep 5, 2025Sep 5, 2025
jailbreak-tax
Public
Python
•0•22•0•0•Updated Aug 7, 2025Aug 7, 2025
RealMath
Public
Python
•1•15•0•0•Updated May 23, 2025May 23, 2025
autoadvexbench
Public
Python
•3•33•1•0•Updated May 21, 2025May 21, 2025
agentdojo-core
Public
Core code for AgentDojo
Python
•
MIT License
•0•0•0•0•Updated May 14, 2025May 14, 2025
llm_lab
Public
Python
•0•0•0•0•Updated Apr 15, 2025Apr 15, 2025
Blind-MIA
Public
This is the official code for Blind Baselines Beat Membership Inference Attacks for Foundation Models
Python
•0•1•1•0•Updated Mar 29, 2025Mar 29, 2025
camel-prompt-injection
Public
0•0•0•0•Updated Feb 6, 2025Feb 6, 2025
vmi-retreat-workshop-2024
Public
Repository for the VMI Summer Retreat Workshop on Hacking AI Agents
Python
•
MIT License
•0•1•0•0•Updated Jan 18, 2025Jan 18, 2025
non-adversarial-reproduction
Public
Official code for "Measuring Non-Adversarial Reproduction of Training Data in Large Language Models" (https://arxiv.org/abs/2411.10242)
Jupyter Notebook
•1•8•0•0•Updated Nov 18, 2024Nov 18, 2024
unlearning-vs-safety
Public
Python
•4•25•0•0•Updated Oct 6, 2024Oct 6, 2024
.github
Public
0•0•0•0•Updated Jul 5, 2024Jul 5, 2024
robust-style-mimicry
Public
Python
•
MIT License
•2•45•0•0•Updated Jun 19, 2024Jun 19, 2024
rlhf_trojan_competition
Public
Finding trojans in aligned LLMs. Official repository for the competition hosted at SaTML 2024.
Python
•
Apache License 2.0
•9•115•1•0•Updated Jun 13, 2024Jun 13, 2024
ctf-satml24-data-analysis
Public
Python
•0•1•0•0•Updated Jun 13, 2024Jun 13, 2024
misleading-privacy-evals
Public
Official code for "Evaluations of Machine Learning Privacy Defenses are Misleading" (https://arxiv.org/abs/2404.17399)
Jupyter Notebook
•3•10•0•0•Updated Apr 29, 2024Apr 29, 2024
data-decay
Public
Playing around with the CC3M data
Python
•0•0•0•0•Updated Apr 29, 2024Apr 29, 2024
rlhf-poisoning
Public
Code for paper "Universal Jailbreak Backdoors from Poisoned Human Feedback"
Python
•
Apache License 2.0
•9•62•4•0•Updated Apr 24, 2024Apr 24, 2024
realistic-adv-examples
Public
Code for the paper "Evading Black-box Classifiers Without Breaking Eggs" [SaTML 2024]
Python
•
MIT License
•1•21•0•0•Updated Apr 15, 2024Apr 15, 2024
lm_memorization_data
Public
Data for "Quantifying Memorization Across Neural Language Models"
Apache License 2.0
•0•7•2•0•Updated Mar 26, 2024Mar 26, 2024
satml-llm-ctf
Public
Code used to run the platform for the LLM CTF colocated with SaTML 2024
Python
•
MIT License
•7•27•0•0•Updated Mar 20, 2024Mar 20, 2024
infoseclab_23
Public
Python
•0•1•0•0•Updated Nov 14, 2023Nov 14, 2023
superhuman-ai-consistency
Public
Python
•
MIT License
•2•30•0•0•Updated Jun 19, 2023Jun 19, 2023
privacy
Public
Library for training machine learning models with privacy for training data
Python
•
Apache License 2.0
•469•0•0•0•Updated Jun 13, 2023Jun 13, 2023
diffusion_denoised_smoothing
Public
Certified robustness "for free" using off-the-shelf diffusion models and classifiers
Python
•
MIT License
•4•44•3•0•Updated May 25, 2023May 25, 2023
lm-extraction-benchmark-data
Public
Datasets for the SATML 2023 competition on training data extraction
Apache License 2.0
•0•5•1•0•Updated Aug 24, 2022Aug 24, 2022