[Feat]: move the noising arguments (offset, perturbation, min, max, weight, bias) to concepts
Describe your use-case.
At the moment, the arguments about noising are global, applying to all samples, if we want to teach styles and objects from different concepts we must train the same lora twice with different noising values on each concept to focus the model, which is suboptimal.
What would you like to see as a solution?
It would be very convenient to have those moved to be applied locally, per concept, so we can train at the same time on a concept about image composition and another concept about fine details.
Have you considered alternatives? List them here.
I tried to make the changes myself, but failed miserably, i was able to edit the ui and such, but not to make the global variables local and inject them to the batches or images... The code is quite complicated for my small brain, i'm getting lost and losing tracks...
If someone more skilled could look into that, i'd be super grateful. Thanks !
if you've already started with the UI as you mentioned and want to try again, the code you were looking for is here: https://github.com/Nerogar/OneTrainer/blob/8d222ede8b9b56bcb5e8b34b6b7dad4fc927d0f5/modules/modelSetup/mixin/ModelSetupNoiseMixin.py#L51
I maybe should've explained a bit more what i've done, but i fearerd the message would be too long (and i'm not english, so it may be confusing to read...). In short what i have edited before giving up : ConceptConfig.py – to add noising/masking settings at the concept level. ModelSetupDiffusionNoiseMixin.py – to adapt noise generation and timestep calculation using local settings. BaseStableDiffusionXLSetup.py – to pass the local concept configuration when calling _create_noise. TrainingTab.py – to remove global noising/masking widgets, since these settings are now set per concept. ConceptWindow.py – to add the user interface (a new “advanced” tab) allowing the user to set these settings for each concept.
In more details : conceptconfig :
@staticmethod
def default_values():
data = []
# (Les paramètres existants …)
data.append(("image", ConceptImageConfig.default_values(), ConceptImageConfig, False))
data.append(("text", ConceptTextConfig.default_values(), ConceptTextConfig, False))
data.append(("name", "", str, False))
data.append(("path", "", str, False))
data.append(("seed", random.randint(-(1 << 30), 1 << 30), int, False))
data.append(("enabled", True, bool, False))
data.append(("include_subdirectories", False, bool, False))
data.append(("image_variations", 1, int, False))
data.append(("text_variations", 1, int, False))
data.append(("balancing", 1.0, float, False))
data.append(("balancing_strategy", BalancingStrategy.REPEATS, BalancingStrategy, False))
data.append(("loss_weight", 1.0, float, False))
#
# ------ NOISING & MASKING (niveaux locaux) ------
data.append(("offset_noise_weight", 0.0, float, False))
data.append(("perturbation_noise_weight", 0.0, float, False))
data.append(("min_noising_strength", 0.0, float, False))
data.append(("max_noising_strength", 1.0, float, False))
data.append(("noising_weight", 0.0, float, False))
data.append(("noising_bias", 0.5, float, False))
data.append(("masked_training", False, bool, False))
data.append(("unmasked_probability", 0.1, float, False))
data.append(("unmasked_weight", 0.1, float, False))
data.append(("normalize_masked_area_loss", False, bool, False))
#
return ConceptConfig(data)
ModelSetupDiffusionNoiseMixin :
def _create_noise(self, source_tensor: Tensor, config: TrainConfig, concept: dict, generator: Generator):
# Crée le bruit de base :
noise = torch.randn(
source_tensor.shape,
generator=generator,
device=config.train_device,
dtype=source_tensor.dtype
)
# Récupération des paramètres locaux avec fallback sur la config globale :
offset_weight = concept.get("offset_noise_weight", config.offset_noise_weight)
if offset_weight > 0:
offset_noise = torch.randn(
(source_tensor.shape[0], source_tensor.shape[1], 1, 1),
generator=generator,
device=config.train_device,
dtype=source_tensor.dtype
)
noise = noise + (offset_weight * offset_noise)
perturb_weight = concept.get("perturbation_noise_weight", config.perturbation_noise_weight)
if perturb_weight > 0:
perturbation_noise = torch.randn(
source_tensor.shape,
generator=generator,
device=config.train_device,
dtype=source_tensor.dtype
)
noise = noise + (perturb_weight * perturbation_noise)
return noise
as well as get_timestep_discrete :
def _get_timestep_discrete(
self,
noise_scheduler: DDIMScheduler,
deterministic: bool,
generator: Generator,
batch_size: int,
config: TrainConfig,
concept: dict,
global_step: int,
) -> Tensor:
if not deterministic:
# Utilisation des paramètres locaux (avec fallback)
min_strength = concept.get("min_noising_strength", config.min_noising_strength)
max_strength = concept.get("max_noising_strength", config.max_noising_strength)
min_timestep = int(noise_scheduler.config['num_train_timesteps'] * min_strength)
max_timestep = int(noise_scheduler.config['num_train_timesteps'] * max_strength)
noising_weight = concept.get("noising_weight", config.noising_weight)
noising_bias = concept.get("noising_bias", config.noising_bias)
if noising_weight == 0:
return torch.randint(
low=min_timestep,
high=max_timestep,
size=(batch_size,),
generator=generator,
device=generator.device,
).long()
else:
rng = np.random.default_rng(global_step)
weights = np.linspace(0, 1, max_timestep - min_timestep)
weights = 1 / (1 + np.exp(-noising_weight * (weights - noising_bias))) # Sigmoid
weights /= np.sum(weights)
samples = rng.choice(np.arange(min_timestep, max_timestep), size=(batch_size,), p=weights)
return torch.tensor(samples, dtype=torch.long, device=generator.device)
else:
# Si déterministe : renvoyer la valeur médiane
return torch.tensor(
int(noise_scheduler.config['num_train_timesteps'] * 0.5) - 1,
dtype=torch.long,
device=generator.device,
).unsqueeze(0)
And get_timestep_continuous :
def _get_timestep_continuous(
self,
deterministic: bool,
generator: Generator,
batch_size: int,
config: TrainConfig,
concept: dict,
global_step: int,
) -> Tensor:
if not deterministic:
noising_weight = concept.get("noising_weight", config.noising_weight)
noising_bias = concept.get("noising_bias", config.noising_bias)
min_strength = concept.get("min_noising_strength", config.min_noising_strength)
max_strength = concept.get("max_noising_strength", config.max_noising_strength)
if noising_weight == 0:
return (1 - torch.rand(
size=(batch_size,),
generator=generator,
device=generator.device,
)) * (max_strength - min_strength) + min_strength
else:
rng = np.random.default_rng(global_step)
choices = np.linspace(np.finfo(float).eps, 1, 5000) # discretisation
weights = 1 / (1 + np.exp(-noising_weight * (choices - noising_bias))) # Sigmoid
weights /= np.sum(weights)
samples = rng.choice(choices, size=(batch_size,), p=weights)
samples = samples * (max_strength - min_strength) + min_strength
return torch.tensor(samples, dtype=torch.float, device=generator.device)
else:
return torch.full(
size=(batch_size,),
fill_value=0.5,
device=generator.device,
)
BaseStableDiffusionXLSetup : (added a line)
concept_params = data.get("concept", {}) # data est le batch d'entrée
latent_noise = self._create_noise(scaled_latent_image, config, concept_params, generator)
TrainingTab : (just commented the noise frame)
def __setup_stable_diffusion_xl_ui(self, column_0, column_1, column_2):
self.__create_base_frame(column_0, 0)
self.__create_text_encoder_1_frame(column_0, 1)
self.__create_text_encoder_2_frame(column_0, 2)
self.__create_embedding_frame(column_0, 3)
self.__create_base2_frame(column_1, 0)
self.__create_unet_frame(column_1, 1)
# self.__create_noise_frame(column_1, 2)
self.__create_align_prop_frame(column_2, 0)
self.__create_masked_frame(column_2, 1)
self.__create_loss_frame(column_2, 2, supports_vb_loss=False)
ConceptWindow : (added a tab for the parameters removed above)
class ConceptWindow(ctk.CTkToplevel):
def __init__(
self,
parent,
concept: ConceptConfig,
ui_state: UIState,
image_ui_state: UIState,
text_ui_state: UIState,
*args, **kwargs,
):
ctk.CTkToplevel.__init__(self, parent, *args, **kwargs)
self.concept = concept
self.ui_state = ui_state
self.image_ui_state = image_ui_state
self.text_ui_state = text_ui_state
self.title("Concept")
self.geometry("800x530")
self.resizable(False, False)
self.wait_visibility()
self.grab_set()
self.focus_set()
self.grid_rowconfigure(0, weight=1)
self.grid_columnconfigure(0, weight=1)
tabview = ctk.CTkTabview(self)
tabview.grid(row=0, column=0, sticky="nsew")
self.__general_tab(tabview.add("general"), concept)
self.__image_augmentation_tab(tabview.add("image augmentation"))
self.__text_augmentation_tab(tabview.add("text augmentation"))
# --- Nouvel onglet pour Advanced (Noising & Masking) ---
self.__advanced_tab(tabview.add("advanced"))
components.button(self, 1, 0, "ok", self.__ok)
# (Les méthodes __general_tab, __image_augmentation_tab, __text_augmentation_tab restent inchangées)
def __advanced_tab(self, master):
master.grid_columnconfigure(0, weight=1)
master.grid_columnconfigure(1, weight=1)
row = 0
# Noising parameters
components.label(master, row, 0, "Offset Noise Weight",
tooltip="Poids du bruit d’offset pour ce concept")
components.entry(master, row, 1, self.ui_state, "offset_noise_weight")
row += 1
components.label(master, row, 0, "Perturbation Noise Weight",
tooltip="Poids du bruit de perturbation pour ce concept")
components.entry(master, row, 1, self.ui_state, "perturbation_noise_weight")
row += 1
components.label(master, row, 0, "Min Noising Strength",
tooltip="Force minimale de bruit pour ce concept (entre 0 et 1)")
components.entry(master, row, 1, self.ui_state, "min_noising_strength")
row += 1
components.label(master, row, 0, "Max Noising Strength",
tooltip="Force maximale de bruit pour ce concept (entre 0 et 1)")
components.entry(master, row, 1, self.ui_state, "max_noising_strength")
row += 1
components.label(master, row, 0, "Noising Weight",
tooltip="Contrôle l’accentuation des niveaux de bruit (0 = désactivé)")
components.entry(master, row, 1, self.ui_state, "noising_weight")
row += 1
components.label(master, row, 0, "Noising Bias",
tooltip="Biais pour la distribution du bruit (valeur entre 0 et 1)")
components.entry(master, row, 1, self.ui_state, "noising_bias")
row += 2 # espace
# Masking parameters
components.label(master, row, 0, "Masked Training",
tooltip="Active ou désactive le masquage pour ce concept")
components.switch(master, row, 1, self.ui_state, "masked_training")
row += 1
components.label(master, row, 0, "Unmasked Probability",
tooltip="Probabilité de ne PAS appliquer le masque sur un échantillon")
components.entry(master, row, 1, self.ui_state, "unmasked_probability")
row += 1
components.label(master, row, 0, "Unmasked Weight",
tooltip="Poids des zones non masquées dans la fonction de perte")
components.entry(master, row, 1, self.ui_state, "unmasked_weight")
row += 1
components.label(master, row, 0, "Normalize Masked Area Loss",
tooltip="Normalise la perte en fonction de la surface masquée")
components.switch(master, row, 1, self.ui_state, "normalize_masked_area_loss")
def __ok(self):
self.destroy()
I was there when i gave up because it became too confusing for myself, i haven't even tried this code since i couldn't track everything by memory, maybe i should but i have zero faith in the result lol.
Thanks for trying to help !
I think this would be a useful feature. Do you want to submit a draft PR? It's easier to track there what you have changed and comment on it. If you don't know how to do that yet, ask your favourite AI.
First comments:
- I think you should not replace the current functionality with having the same settings in the concepts, but have it there as an optional override. Similar to other overrides, you set the main value in the training tab, and in the concept only if you want to overide it for that concept
- your code in get_timestep...() is called for an entire batch. a batch can be of multiple difference concepts - that's not considered in your code yet I think.
I think before doing this, a larger scope concepts rework would be needed, theres quite a number of different requests wanting more granular per concept or per image control:
https://github.com/Nerogar/OneTrainer/issues/238#issuecomment-2041001425
I've seen a couple feature requests now that involve some sort of per-image parameter. It's getting to the point that we ought to consider having each image have a configuration file (textproto, jsonnet, raw json, whatever) that defines those parameters (desired resolution, mask rate, what have you). Keeping everything in the filename is going to be a path to madness if multiple of these requests get implemented.