Free Run — Custom Model Evaluation
Overview
Free Run loads any HuggingFace text-to-image model and generates images without any weight modification or inference-time intervention. Use it to evaluate a model you already have — an external checkpoint, a fine-tuned variant, or any T2I model not covered by the built-in technique wrappers.
Unlike every other technique, Free Run does not fix a base model. You supply the model_id
and Free Run loads it via AutoPipelineForText2Image, which supports SD 1.x, SD 2.x, SDXL,
FLUX, PixArt-α/Σ, Kandinsky, HunyuanDiT, and any other model whose HuggingFace repo
specifies a compatible pipeline class.
No erasure, filtering, or safety mechanism is applied — what the model generates is what gets evaluated.
Compatible metrics
All metrics are compatible. Free Run is exempt from nudity-concept validation, so ASR and ERR can be used regardless of concept.
Configuration reference
| Field | Type | Default | Description |
|---|---|---|---|
model_id |
str |
— | Required. HuggingFace model ID for any T2I model supported by AutoPipelineForText2Image. |
device |
str \| None |
None |
Device to run on. Auto-detects CUDA → MPS → CPU if None. |
use_fp16 |
bool |
True |
Run in half precision on CUDA. Ignored on CPU and MPS (always float32). |
num_inference_steps |
int |
50 |
Number of denoising steps. |
guidance_scale |
float |
7.5 |
Classifier-free guidance scale. |
Default value notes
num_inference_steps=50 and guidance_scale=7.5 are calibrated for SD 1.x. Set them
explicitly when evaluating other model families:
| Model | Typical steps | Typical guidance |
|---|---|---|
| SD 1.x / SD 2.x | 50 | 7.5 |
| SDXL | 25–40 | 5.0–7.5 |
| FLUX | 20–28 | 3.5–4.0 |
| PixArt | 20 | 4.5 |
Warnings
model_id is required
Free Run has no fixed base model. Omitting model_id raises a ValueError at
initialisation.
No safety filtering
Free Run disables the safety checker on any model that has one. Images are generated without constraint — high ASR scores are expected when evaluating concepts the model has not been trained to suppress.
num_inference_steps and guidance_scale defaults are SD-centric
The defaults produce valid output for other model families but may not reflect published results for those models. Set them explicitly when evaluating SDXL, FLUX, or PixArt.
Examples
Evaluating a custom model checkpoint
{
"output_dir": "results/my_model_nudity",
"technique": {
"name": "free_run",
"config": {
"model_id": "my-org/my-finetuned-sd",
"device": "cuda"
}
},
"metrics": [
{
"name": "asr_i2p",
"config": { "concept_name": "nudity", "device": "cuda", "limit": 500 }
},
{
"name": "fid",
"config": { "device": "cuda", "limit": 1000 }
}
]
}
SDXL
{
"output_dir": "results/sdxl_eval",
"technique": {
"name": "free_run",
"config": {
"model_id": "stabilityai/stable-diffusion-xl-base-1.0",
"num_inference_steps": 30,
"guidance_scale": 5.0,
"device": "cuda"
}
},
"metrics": [
{ "name": "fid", "config": { "device": "cuda", "limit": 1000 } },
{ "name": "clip_score", "config": { "device": "cuda", "limit": 500 } }
]
}