Concept Steerers — SAE Steering Vector Suppression
Overview
Concept Steerers apply sparse autoencoder (SAE) steering at inference time through the text encoder (CLIP layer 9). For each generation, the pipeline first derives a steering direction by running the concept string through the text encoder and encoding the resulting activations with the SAE. During each denoising step, this direction is then added to the conditional text embedding and subtracted from the unconditional one, biasing classifier-free guidance away from the concept.
Because the steering direction is computed on-the-fly from the concept string, any concept is supported — no pre-built steering vectors or concept-specific checkpoints are needed. The SAE checkpoint is bundled with the package and loaded automatically.
A positive multiplier suppresses the concept; a negative value amplifies it.
Base model: CompVis/stable-diffusion-v1-4
Supported concepts: any string
Compatible metrics
| Metric | Compatible | Notes |
|---|---|---|
| ASR I2P | Yes | Any I2P concept |
| ERR | Yes | erase_concept="nudity" required (ERR is nudity-specific) |
| FID | Yes | General image quality |
| CLIP Score | Yes | General text-image alignment |
| UA_IRA | Yes | Requires custom prompt CSVs |
| TIFA | Yes | General faithfulness |
| ASR Custom | Yes | Concept-agnostic via CLIP |
| MMA-Diffusion | Yes |
Configuration reference
| Field | Type | Default | Description |
|---|---|---|---|
erase_concept |
str |
"nudity" |
The concept to suppress. Any string is valid — the steering direction is derived at runtime by encoding this string through the text encoder and SAE. |
multiplier |
float |
1.0 |
Steering strength. Positive values suppress the concept. Increase to strengthen erasure; values that are too high may degrade generation quality. |
num_inference_steps |
int |
50 |
DDIM steps for image generation during evaluation. |
guidance_scale |
float |
7.5 |
CFG guidance scale for generation. Must be > 1.0. |
use_fp16 |
bool |
True |
Run in half precision. |
device |
str \| None |
None |
Device to run on. Auto-detects CUDA, then MPS, then CPU if None. |
Warnings
multiplier tuning
The default multiplier=1.0 is conservative. If ASR remains high, increase it (e.g.
5.0, 10.0). Very large values risk steering the text embeddings far enough from
the original distribution to visibly degrade image quality — monitor FID and CLIP Score
alongside ASR when tuning.
Examples
Single metric — ASR
{
"output_dir": "results/concept_steerers_asr",
"technique": {
"name": "concept_steerers",
"config": {
"erase_concept": "nudity",
"multiplier": 1.0,
"device": "cuda"
}
},
"metric": {
"name": "asr_i2p",
"config": {
"device": "cuda",
"limit": 500
}
}
}
Single metric — with stronger steering
{
"output_dir": "results/concept_steerers_strong",
"technique": {
"name": "concept_steerers",
"config": {
"erase_concept": "nudity",
"multiplier": 5.0,
"device": "cuda"
}
},
"metric": {
"name": "clip_score",
"config": {
"device": "cuda",
"limit": 300
}
}
}
Multiple metrics — nudity full benchmark
{
"output_dir": "results/concept_steerers_nudity_multi",
"technique": {
"name": "concept_steerers",
"config": {
"erase_concept": "nudity",
"multiplier": 1.0,
"device": "cuda",
"num_inference_steps": 50,
"guidance_scale": 7.5
}
},
"metrics": [
{ "name": "asr_i2p", "config": { "device": "cuda", "limit": 500 } },
{ "name": "err", "config": { "device": "cuda", "target_limit": 50, "retain_limit": 20, "adversarial_limit": 50 } },
{ "name": "fid", "config": { "device": "cuda", "limit": 1000 } },
{ "name": "clip_score", "config": { "device": "cuda", "limit": 300 } },
{
"name": "ua_ira",
"config": {
"target_prompts_path": "data/nudity_target_prompts.csv",
"retain_prompts_path": "data/nudity_retain_prompts.csv",
"target_concept": "nudity",
"retain_concept": "person",
"device": "cuda"
}
},
{ "name": "tifa", "config": { "device": "cuda", "limit": 200 } }
]
}
Multiple metrics — violence benchmark
{
"output_dir": "results/concept_steerers_violence_multi",
"technique": {
"name": "concept_steerers",
"config": {
"erase_concept": "violence",
"multiplier": 1.0,
"device": "cuda"
}
},
"metrics": [
{ "name": "asr_p4d", "config": { "concept_name": "violence", "detector": "q16", "device": "cuda", "limit": 500 } },
{ "name": "fid", "config": { "device": "cuda", "limit": 1000 } },
{ "name": "clip_score", "config": { "device": "cuda", "limit": 300 } }
]
}