Llm-Attacks

Visit Tool

llm-attacks is an open-source repository for universal and transferable adversarial attacks on aligned language models. It provides a fast and easy-to-use implementation of the GCG algorithm for jailbreaking language models.

Claim this tool

No Views Yet

At a glance

Pricing

Open Source

Free tier

Yes

API

Skill level

Technical

About

What is llm-attacks?

llm-attacks is an open-source repository dedicated to researching and implementing universal and transferable adversarial attacks on aligned language models. It features nanogcg, a fast and easy-to-use implementation of the GCG (Gradient-based Continuous Generation) algorithm, which can be installed via pip. The repository includes a notebook demo for attacking LLaMA-2 with GCG, providing a minimal implementation for familiarization. Researchers can use the provided scripts to reproduce GCG experiments on AdvBench, including individual, multiple behavior, and transfer experiments. The tool supports models like Vicuna-7B and LLaMA-2-7B-Chat, making it valuable for evaluating and improving the robustness of language models against adversarial prompts.

Best used for

Ideal for developers and data scientists who need to research and implement adversarial attacks on aligned language models, test model robustness, and reproduce GCG experiments. Especially valuable for those working with LLaMA-2 and Vicuna models to understand their vulnerabilities.

Common actions

evaluate language models

research adversarial attacks

test model robustness

implement GCG algorithm

workflowscollaborationopen-sourcegithub copilotface swappingautomated workflow"AI Agents"low-code/no-codedeepfake

Capabilities

Key features

Universal adversarial attacks
Transferable adversarial attacks
GCG algorithm implementation
AdvBench experiment scripts
LLaMA-2, Vicuna support

Target Audience

developerdata scientist

Integrations

Not yet documented

Pricing & Plans

Open Source

Free

FAQs

What is nanogcg and how is it used?

nanogcg is a fast and easy-to-use implementation of the GCG (Gradient-based Continuous Generation) algorithm. It can be installed via pip and is used within llm-attacks to perform adversarial attacks, such as jailbreaking LLaMA-2 for generating harmful completions.

Which language models are supported by llm-attacks?

The llm-attacks repository primarily supports LLaMA or Pythia based models. Specifically, it has been tested with Vicuna-7B and LLaMA-2-7B-Chat. Users can configure paths to their downloaded models and tokenizers within the experiment configuration files.

What kind of experiments can be run with llm-attacks?

llm-attacks allows users to reproduce GCG experiments on AdvBench. This includes individual experiments with single behaviors or strings, multiple behavior experiments (e.g., 25 behaviors on one model), and transfer experiments across multiple models.

Trending

Subcategories trending in Coding & Development

Code Assistants DevOps & Infrastructure No-Code / Low-Code Testing & QA Backend & APIs Prompt Engineering

Trending

Explore

Browse AI tools by category

Content & Design Productivity & Business Coding & Development AI Agents & Automation Research & Education Wellness & Lifestyle Career Development Marketing & Growth Data & Analytics Customer Support & CX Finance E-commerce