Skip to content

CoGFD — Concept-combination Graph-based Feature Decoupling

Overview

CoGFD erases concept combinations rather than individual atomic concepts. The key insight is that a harmful theme such as "nudity" is not a single token but a combination of component concepts (person + unclothed). Erasing the atom "person" collaterally damages general generation; erasing the combination preserves each component while making the model unable to produce them together.

Training modifies only cross-attention K/Q/V projections (attn2 layers). All other UNet parameters are frozen, keeping fine-tuning surgical.

Loss

L = λ_erase · L_erase + λ_preserve · L_preserve + λ_decouple · L_decouple
| Term | What it does | |------|-------------| | L_erase | Pulls every combination-prompt response toward the frozen null output, erasing the harmful combination | | L_preserve | Keeps each individual/component concept unchanged vs the frozen original model | | L_decouple | Minimises cosine similarity between combination directions and individual-concept directions in noise-prediction space — ensures decoupling rather than mere suppression |

λ_preserve defaults to 2.0 (higher than λ_erase = 1.0) to prevent the preservation term from being overwhelmed. This ratio is intentional.

Concept logic graph

In the original paper, Stage 1 uses an LLM to enumerate every visual phrasing of the harmful theme (e.g. "naked woman", "person without clothes", ...), making erasure robust to prompt rephrasing. This implementation uses hardcoded defaults for "nudity" and "violence". For any other concept, supply combination_prompts manually or replace the defaults with LLM-generated expansions.

Base model: CompVis/stable-diffusion-v1-4
Paper: Nie et al. "Erasing Concept Combination from Text-to-Image Diffusion Model" (ICLR 2025)


Compatible metrics

Metric Compatible Notes
ASR I2P Any I2P concept NudeNet for nudity; Q16 for all others
ASR P4D Any
ASR MMA-Diffusion Any Requires explicit target prompts for non-nudity
ASR Ring-a-Bell Any
ERR nudity only Requires erase_concept="nudity"
FID Any
CLIP Score Any
UA-IRA Any Requires custom prompt CSVs
TIFA Any

Configuration reference

Field Type Default Description
erase_concept str "nudity" High-level concept to erase. Used to look up built-in defaults if combination_prompts is empty.
combination_prompts list[str] [] Prompts expressing the harmful concept combination (the concept logic graph). Falls back to built-in defaults for "nudity" and "violence". Required for all other concepts.
preserve_concepts list[str] [] Individual component concepts to keep intact (e.g. ["a person", "a woman"]). Falls back to built-in defaults for "nudity" and "violence". Leave empty to skip preservation.
lambda_erase float 1.0 Weight for the combination erasure loss. Must be >= 0.
lambda_preserve float 2.0 Weight for the individual preservation loss. Must be >= 0. Set higher than lambda_erase to prevent collateral concept loss.
lambda_decouple float 0.5 Weight for the feature decoupling loss. Must be >= 0.
train_steps int 150 Training iterations. Must be > 0.
learning_rate float 1e-5 Optimiser learning rate. Must be > 0.
load_path str \| None None Path to a directory saved by a previous CoGFD run (via save_path). Must contain a unet/ subdirectory in HuggingFace save_pretrained format. If set, training is skipped entirely.
save_path str \| None None Directory to save the modified UNet using HuggingFace save_pretrained after training. Produces a unet/ subdirectory — not a single .pt file. Only used when training runs (i.e. load_path is not set). Skipped if None.
num_inference_steps int 50 DDIM steps for image generation.
guidance_scale float 7.5 CFG scale for generation.
use_fp16 bool True Run in half precision on CUDA.
device str \| None None Device for training and inference. Auto-detects CUDA/MPS if None.

Built-in concept defaults

For erase_concept="nudity" and erase_concept="violence", combination_prompts and preserve_concepts are populated automatically if left empty. For all other values of erase_concept, both lists must be supplied explicitly.

erase_concept Built-in combination_prompts Built-in preserve_concepts
"nudity" 5 prompts (nude/naked variants) ["a person", "a woman", "a man", "a human body"]
"violence" 4 prompts (violent scene variants) ["a person", "a scene", "an action scene"]
anything else none — must be supplied none — must be supplied

Warnings

Checkpoint format differs from ESD and MACE

CoGFD uses HuggingFace save_pretrained format rather than a bare .pt state dict. save_path produces a directory containing a unet/ subdirectory (with config.json and weight files). Pass that same directory as load_path on subsequent runs — do not point load_path at the unet/ subdirectory itself.

Custom concepts require explicit combination_prompts

If erase_concept is not "nudity" or "violence", combination_prompts must be provided. Without it the only combination prompt is the raw erase_concept string, which provides poor coverage and weakens erasure robustness.

lambda_preserve should stay above lambda_erase

The preservation loss counteracts collateral erasure of component concepts. If lambda_preserve is set too low relative to lambda_erase, individual concepts (e.g. "a person") will degrade alongside the combination. The defaults (lambda_preserve=2.0, lambda_erase=1.0) reflect the paper's tuning.

train_steps=150 is a moderate default

Unlike AdvUnlearn where the default is intentionally minimal, 150 steps is a reasonable starting point for CoGFD given its lighter per-step budget (3 UNet passes total regardless of prompt count). Published results use 100–500 steps. Increase for thorough erasure.


Examples

Nudity (built-in defaults)

{
  "output_dir": "results/cogfd_nudity",
  "technique": {
    "name": "cogfd",
    "config": {
      "erase_concept": "nudity",
      "train_steps": 150,
      "save_path": "checkpoints/cogfd_nudity",
      "device": "cuda"
    }
  },
  "metric": {
    "name": "asr_i2p",
    "config": {
      "device": "cuda",
      "limit": 500
    }
  }
}

Violence (built-in defaults)

{
  "output_dir": "results/cogfd_violence",
  "technique": {
    "name": "cogfd",
    "config": {
      "erase_concept": "violence",
      "train_steps": 200,
      "device": "cuda"
    }
  },
  "metric": {
    "name": "ua_ira",
    "config": {
      "target_prompts_path": "data/violence_target_prompts.csv",
      "retain_prompts_path": "data/violence_retain_prompts.csv",
      "target_concept": "violence",
      "retain_concept": "action scene",
      "device": "cuda"
    }
  }
}

Custom concept

{
  "output_dir": "results/cogfd_vangogh",
  "technique": {
    "name": "cogfd",
    "config": {
      "erase_concept": "Van Gogh",
      "combination_prompts": [
        "a painting in the style of Van Gogh",
        "a Van Gogh style landscape",
        "swirling brushstrokes in the style of Van Gogh",
        "a starry night Van Gogh painting"
      ],
      "preserve_concepts": [
        "a landscape painting",
        "an oil painting",
        "a painting with brushstrokes"
      ],
      "train_steps": 200,
      "device": "cuda"
    }
  },
  "metric": {
    "name": "ua_ira",
    "config": {
      "target_prompts_path": "data/vangogh_target_prompts.csv",
      "retain_prompts_path": "data/vangogh_retain_prompts.csv",
      "target_concept": "Van Gogh painting",
      "retain_concept": "landscape painting",
      "device": "cuda"
    }
  }
}

Multi-metric run

{
  "output_dir": "results/cogfd_nudity_multi",
  "technique": {
    "name": "cogfd",
    "config": {
      "erase_concept": "nudity",
      "lambda_erase": 1.0,
      "lambda_preserve": 2.0,
      "lambda_decouple": 0.5,
      "train_steps": 150,
      "save_path": "checkpoints/cogfd_nudity",
      "device": "cuda"
    }
  },
  "metrics": [
    { "name": "asr_i2p", "config": { "device": "cuda", "limit": 500 } },
    { "name": "err", "config": { "device": "cuda", "target_limit": 50, "retain_limit": 20, "adversarial_limit": 50 } },
    { "name": "fid", "config": { "device": "cuda", "limit": 1000 } },
    { "name": "clip_score", "config": { "device": "cuda", "limit": 300 } },
    {
      "name": "ua_ira",
      "config": {
        "target_prompts_path": "data/nudity_target_prompts.csv",
        "retain_prompts_path": "data/nudity_retain_prompts.csv",
        "target_concept": "nudity",
        "retain_concept": "person",
        "device": "cuda"
      }
    },
    { "name": "tifa", "config": { "device": "cuda", "limit": 200 } }
  ]
}

Reusing trained weights across runs

Set save_path on the first run to persist the trained weights, then use load_path on all subsequent runs to skip retraining. This is especially useful when benchmarking multiple metrics against the same trained model. See Caching adversarial prompts and technique weights for the full workflow.