Skip to content

SAFREE — Training-Free Semantic Filtering

Overview

SAFREE (Semantic Approach to Free-up Representations) is a training-free technique — it modifies no model weights. All filtering happens at inference time through three sequential stages applied during the diffusion process:

  1. Text Projection (Stage 1): Scales the concept's text embedding by alpha to reduce its influence on cross-attention.

  2. Self-Validation Filter (Stage 2, SVF): At denoising timesteps above upperbound_timestep, intercepts the latent activations and suppresses features associated with the target concept. Controlled by enable_svf.

  3. Latent Re-Attention (Stage 3, LRA): Applies FreeU-style frequency filtering to the UNet's skip connections and backbone features using the freeu_* parameters. Controlled by enable_lra.

Because SAFREE is training-free, it is fast to initialise. However, it only filters at runtime and cannot guarantee concept removal across all possible prompts.

Base model: CompVis/stable-diffusion-v1-4
Supported concepts: nudity, artists-VanGogh, artists-KellyMcKernan (named calibrated); any concept via custom_unsafe_concepts


Compatible metrics

Metric Compatible Notes
ASR I2P Yes
ERR Yes erase_concept="nudity" required (ERR is nudity-specific)
FID Yes General image quality
CLIP Score Yes General text-image alignment
UA_IRA Yes Requires custom prompt CSVs
TIFA Yes General faithfulness
ASR Custom Yes Concept-agnostic via CLIP
MMA-Diffusion Yes Nudity-specific by default

Configuration reference

Field Type Default Description
erase_concept str "nudity" Named concept category. Must be one of the SVF-calibrated concepts (nudity, artists-VanGogh, artists-KellyMcKernan) unless custom_unsafe_concepts is also set.
alpha float 0.01 Stage 1: scaling factor applied to the concept text embedding. Values near 0 suppress the concept text signal strongly.
enable_svf bool True Enable Stage 2 Self-Validation Filter.
upperbound_timestep int 10 Stage 2: only apply SVF at timesteps above this value. Lower values = SVF active for more of the denoising trajectory.
enable_lra bool True Enable Stage 3 Latent Re-Attention.
lra_filter_type str "high" Stage 3: frequency filter type. "high" suppresses high-frequency components, "low" suppresses low-frequency, "all" applies to all.
freeu_b1 float 1.0 FreeU backbone scaling factor for block 1.
freeu_b2 float 1.0 FreeU backbone scaling factor for block 2.
freeu_s1 float 0.9 FreeU skip-connection scaling factor for block 1.
freeu_s2 float 0.2 FreeU skip-connection scaling factor for block 2.
re_attn_timestep_range [int, int] [-1, 1001] Fallback timestep range for re-attention when SVF is disabled.
use_fp16 bool True Run in half precision.
device str "cuda" Device to run on.

Warnings

SVF calibration

SVF (Stage 2) is only calibrated for nudity, artists-VanGogh, and artists-KellyMcKernan. For any other concept, pass custom_unsafe_concepts=['phrase1', ...] — SVF will be disabled automatically and Stage 3 LRA will handle suppression. Using erase_concept with an uncalibrated concept name without custom_unsafe_concepts raises a ValueError.

All three stages are cooperative

Disabling SVF or LRA (enable_svf=false, enable_lra=false) weakens erasure significantly. The three stages are designed to work together. Disable individual stages only for ablation experiments.

alpha near 1.0

Setting alpha close to 1.0 makes Stage 1 a no-op (no scaling of the concept embedding). This is fine for testing Stages 2 and 3 in isolation, but provides effectively no text-projection filtering.


Examples

Single metric — ASR

{
  "output_dir": "results/safree_asr",
  "technique": {
    "name": "safree",
    "config": {
      "erase_concept": "nudity",
      "alpha": 0.01,
      "enable_svf": true,
      "enable_lra": true,
      "device": "cuda"
    }
  },
  "metric": {
    "name": "asr_i2p",
    "config": {
      "device": "cuda",
      "limit": 500
    }
  }
}

Multiple metrics — nudity full benchmark

{
  "output_dir": "results/safree_nudity_multi",
  "technique": {
    "name": "safree",
    "config": {
      "erase_concept": "nudity",
      "alpha": 0.01,
      "enable_svf": true,
      "upperbound_timestep": 10,
      "enable_lra": true,
      "lra_filter_type": "high",
      "freeu_b1": 1.0,
      "freeu_b2": 1.0,
      "freeu_s1": 0.9,
      "freeu_s2": 0.2,
      "device": "cuda"
    }
  },
  "metrics": [
    { "name": "asr_i2p", "config": { "device": "cuda", "limit": 500 } },
    { "name": "err", "config": { "device": "cuda", "target_limit": 50, "retain_limit": 20, "adversarial_limit": 50 } },
    { "name": "fid", "config": { "device": "cuda", "limit": 1000 } },
    { "name": "clip_score", "config": { "device": "cuda", "limit": 300 } },
    {
      "name": "ua_ira",
      "config": {
        "target_prompts_path": "data/nudity_target_prompts.csv",
        "retain_prompts_path": "data/nudity_retain_prompts.csv",
        "target_concept": "nudity",
        "retain_concept": "person",
        "device": "cuda"
      }
    },
    { "name": "tifa", "config": { "device": "cuda", "limit": 200 } }
  ]
}