k-diffusion icon indicating copy to clipboard operation
k-diffusion copied to clipboard

Confirm JSON config for FFHQ-1024?

Open tin-sely opened this issue 1 year ago • 2 comments

I'm planning on using the config for the FFHQ-1024, just wanted to double check it's correct.

  • Is the "Conditioning Dropout Rate", the same as mapping_dropout_rate, or something else?
  • Attention Heads (Width / Head Dim) seems like it's configured automatically based on the "widths", and "depths"?
  • For Levels (Local + Global Attention) 3+2, I assume I add three " {"type": "shifted-window", "d_head": 64, "window_size": 7}, ", and two {"type": "global", "d_head": 64}?
Screenshot 2024-04-16 at 10 47 48
{
  "model": {
    "type": "image_transformer_v2",
    "input_channels": 3, 
    "input_size": [1024, 1024],
    "patch_size": [4, 4],
    "depths": [2, 2, 2, 2, 2], 
    "widths": [128, 256, 384, 768, 1024],
    "self_attns": [
      {"type": "shifted-window", "d_head": 64, "window_size": 7}, 
      {"type": "shifted-window", "d_head": 64, "window_size": 7},
      {"type": "shifted-window", "d_head": 64, "window_size": 7},
      {"type": "global", "d_head": 64},
      {"type": "global", "d_head": 64}
    ],
    "loss_config": "karras",
    "loss_weighting": "soft-min-snr", 
    "dropout_rate": [0.0, 0.0, 0.0, 0.0, 0.1], 
    "mapping_dropout_rate": 0.1,
    "augment_prob": 0.12, 
    "sigma_data": 0.5, 
    "sigma_min": 1e-3,
    "sigma_max": 1e3, 
    "sigma_sample_density": {
      "type": "cosine-interpolated" 
    }
  },
  "dataset": {
    "type": "huggingface", 
    "location": "nelorth/oxford-flowers", 
    "image_key": "image" 
  },
  "optimizer": {
    "type": "adamw",
    "lr": 5e-4, 
    "betas": [0.9, 0.95], 
    "eps": 1e-8, 
    "weight_decay": 1e-2 
  },
  "lr_sched": {
    "type": "constant", 
    "warmup": 0.0 
  },
  "ema_sched": {
    "type": "inverse", 
    "power": 0.75, 
    "max_value": 0.9999 
  }
}

tin-sely avatar Apr 16 '24 02:04 tin-sely

The type for those self-attention blocks should be neighborhood unless you do want to use Swin, and we used a mapping dropout rate of 0. Apart from that, the config matches what we used.

And to answer your other two questions:

  1. Something else iirc
  2. Yes

stefan-baumann avatar Apr 16 '24 07:04 stefan-baumann

The type for those self-attention blocks should be neighborhood unless you do want to use Swin, and we used a mapping dropout rate of 0. Apart from that, the config matches what we used.

And to answer your other two questions:

  1. Something else iirc
  2. Yes

May you release the pre-trained models of HDiT on FFHQ-1024?

Luo-Yihong avatar Nov 27 '24 13:11 Luo-Yihong