Skip to content

Caching Adversarial Prompts and Technique Weights

Generating adversarial prompts (P4D, MMA Diffusion, Ring-A-Bell) and training concept-erasure models (ESD, MACE, SSD, CoGFD, AdvUnlearn) are expensive. When benchmarking multiple techniques against the same prompt set, or running the same technique across multiple metrics, repeating these steps on every run wastes significant compute.

This page describes how to generate once and reuse.


Caching adversarial prompts

P4D

P4D at num_iter=3000 on 50 prompts takes hours. Run it once, save the output, then point all subsequent technique evaluations at the cached CSV.

Run 1 — generate and save:

{
  "name": "asr_p4d",
  "config": {
    "concept_name": "nudity",
    "target_prompts_path": "data/nudity_target_prompts.csv",
    "erase_id": "std",
    "generated_prompts_output": "cache/p4d_nudity_adversarial.csv",
    "num_iter": 3000,
    "eval_step": 50,
    "device": "cuda"
  }
}

Runs 2–N — skip generation, evaluate only:

{
  "name": "asr_p4d",
  "config": {
    "concept_name": "nudity",
    "precomputed_prompts_path": "cache/p4d_nudity_adversarial.csv",
    "device": "cuda"
  }
}

When precomputed_prompts_path is set, all P4D optimisation is skipped. The CSV must have an adversarial_prompt column. target_prompts_path is not required.

MMA Diffusion

The same pattern applies. MMA Diffusion's GCG attack is also expensive to re-run.

Run 1 — generate and save:

{
  "name": "asr_mma_diffusion",
  "config": {
    "concept_name": "nudity",
    "output_csv": "cache/mma_nudity_adversarial.csv",
    "n_steps": 200,
    "device": "cuda"
  }
}

Runs 2–N — skip generation:

{
  "name": "asr_mma_diffusion",
  "config": {
    "concept_name": "nudity",
    "output_csv": "cache/mma_nudity_adversarial.csv",
    "precomputed_prompts_path": "cache/mma_nudity_adversarial.csv",
    "device": "cuda"
  }
}

Note

output_csv is still required by config validation even when precomputed_prompts_path is set. Point it at the same file.

Ring-A-Bell

Ring-A-Bell discovery can be disabled entirely. Set enable_discovery=false and pass prompts directly via seed_prompts_csv.

Run 1 — discover and save (via generated_prompts_output):

{
  "name": "asr_ring_a_bell",
  "config": {
    "concept_name": "nudity",
    "seed_prompts_csv": "data/nudity_target_prompts.csv",
    "generated_prompts_output": "cache/ring_a_bell_nudity_discovered.csv",
    "enable_discovery": true,
    "device": "cuda"
  }
}

Runs 2–N — skip discovery:

The GA output CSV has no header row

generated_prompts_output is written without a header (one prompt per row). seed_prompts_csv expects a header row and skips it on load — so the first prompt is silently dropped if you feed the output directly back as input.

Add a header row before reusing:

echo "prompt" | cat - cache/ring_a_bell_nudity_discovered.csv > cache/ring_a_bell_nudity_with_header.csv

Then use the file with the header as seed_prompts_csv:

{
  "name": "asr_ring_a_bell",
  "config": {
    "concept_name": "nudity",
    "seed_prompts_csv": "cache/ring_a_bell_nudity_with_header.csv",
    "enable_discovery": false,
    "device": "cuda"
  }
}

Multi-technique comparison with shared prompts

The most common use case: generate adversarial prompts once, then benchmark several techniques against the same set.

Run 1:  ESD + asr_p4d (generate=true)   → cache/p4d_adversarial.csv
Run 2:  MACE + asr_p4d (precomputed)    ← cache/p4d_adversarial.csv
Run 3:  UCE + asr_p4d (precomputed)     ← cache/p4d_adversarial.csv
Run 4:  AdvUnlearn + asr_p4d (precomputed) ← cache/p4d_adversarial.csv

Example multi-metric config reusing cached P4D prompts:

{
  "output_dir": "results/mace_nudity_multi",
  "technique": {
    "name": "mace",
    "config": {
      "erase_concept": "nudity",
      "save_path": "checkpoints/mace_nudity.pt",
      "device": "cuda"
    }
  },
  "metrics": [
    {
      "name": "asr_p4d",
      "config": {
        "concept_name": "nudity",
        "precomputed_prompts_path": "cache/p4d_nudity_adversarial.csv",
        "device": "cuda"
      }
    },
    {
      "name": "asr_mma_diffusion",
      "config": {
        "concept_name": "nudity",
        "output_csv": "cache/mma_nudity_adversarial.csv",
        "precomputed_prompts_path": "cache/mma_nudity_adversarial.csv",
        "device": "cuda"
      }
    }
  ]
}

Caching technique weights

Training-based techniques (ESD, MACE, SSD, CoGFD, AdvUnlearn) re-run training on every invocation unless weights are saved and reloaded. This matters when you want to:

  • Run multiple metrics against the same trained model without retraining between runs
  • Iterate on metric configuration without waiting for training to complete

All training-based techniques support save_path and load_path. Set save_path on the first run, then replace it with load_path for all subsequent runs.

Technique save_path format load_path format
ESD single .pt file (UNet state dict) same .pt file
MACE single .pt file (UNet state dict) same .pt file
SSD single .pt file (UNet state dict) same .pt file
CoGFD directory with unet/ subdirectory (HF save_pretrained) same directory
AdvUnlearn single .pt file written to save_dir/ path to that .pt file via load_path

See each technique's configuration reference for exact field names and format details.


Caveats

Prompts generated against erase_id="std" are not white-box targeted. They are hard prompts that elicit the concept from unmodified SD generally, but were not optimised against any specific erased model. Using them to evaluate a trained technique still produces a valid ASR score — it just measures how well the technique resists generic adversarial prompts rather than prompts specifically crafted to bypass it. See ASR P4D — Limitations.

Cached prompts don't transfer across concepts. A prompt set generated for nudity is not meaningful for violence evaluation. Keep separate caches per concept.