Heretic

Visit Tool

Heretic is an open-source tool that removes censorship from transformer-based language models. It uses directional ablation to decensor models automatically without expensive post-training.

Claim this tool

2Views

At a glance

Pricing

Open Source

Free tier

Yes

API

Skill level

Technical

About

What is heretic?

Heretic is an open-source tool designed for the fully automatic removal of censorship, also known as "safety alignment," from transformer-based language models. It achieves this without requiring expensive post-training processes, utilizing an advanced implementation of directional ablation combined with a TPE-based parameter optimizer powered by Optuna. This approach allows Heretic to automatically find high-quality ablation parameters by co-minimizing refusal rates and KL divergence from the original model, ensuring the decensored model retains as much original intelligence as possible. The tool supports most dense and many multimodal models, including various MoE architectures. It also offers research features for interpretability studies, such as plotting residual vectors and printing residual geometry details.

Best used for

Ideal for AI researchers and developers who need to automatically remove censorship from transformer-based language models, evaluate model safety alignments, and conduct interpretability research. Especially valuable for creating decensored models with minimal impact on original model intelligence.

Common actions

decensor language models

analyze model internals

optimize model parameters

research AI interpretability

"AI Agents"github copilotface swappinglow-code/no-codedeepfakeautomated workflowworkflowsopen-sourcecollaboration

Capabilities

Key features

Automatic censorship removal
Directional ablation implementation
TPE-based parameter optimizer
Model quantization support
Residual vector plotting
Residual geometry analysis
Supports diverse LLM architectures

Target Audience

AI researcherdeveloperprofessor

Integrations

Not yet documented

Pricing & Plans

Open Source

Free

FAQs

What types of language models does Heretic support for decensoring?

Heretic supports most dense models, including many multimodal models, and several different MoE architectures. However, it does not yet support SSMs/hybrid models, models with inhomogeneous layers, or certain novel attention systems.

How long does it take to decensor a model using Heretic?

The time required varies by model size and hardware. For example, decensoring Llama-3.1-8B-Instruct takes about 45 minutes on an RTX 3090 with default settings. Heretic benchmarks the system to optimize batch size for available hardware.

Can Heretic be used for research into language model interpretability?

Yes, Heretic includes optional research features. By installing with the 'research' extra, users can generate plots of residual vectors and print detailed residual geometry metrics, aiding in the analysis of model internals and semantics.

Trending

Subcategories trending in Coding & Development

Code Assistants DevOps & Infrastructure No-Code / Low-Code Testing & QA Backend & APIs Prompt Engineering

Trending

Also listed in

This tool also appears in

Research & Education › Academic Research

Explore

Browse AI tools by category

Content & Design Productivity & Business Coding & Development AI Agents & Automation Research & Education Wellness & Lifestyle Career Development Marketing & Growth Data & Analytics Customer Support & CX Finance E-commerce