Add non-architecture Huggingface conversion parameters
🐞 Describe the Bug
When converting models config options not included in the architecture config are not imported from the Hugging Face model's config.json.
This creates an unexpected and undocumented requirement for manual configuration, which can lead to costly mistakes.
The following critical options are affected:
-
window_sizeandmax_windows_layersfor models trained with windowed attention (e.g. Qwen 2), see #163. -
router_aux_loss_coeffor MoEs such as Mixtral.
Currently, the load-from-HF-model feature suggests seamless integration, but this bug prevents complete and accurate model conversion. Users are likely to assume the conversion will "just work" and may unknowingly train models with incorrect configurations.
🔄 Steps to Reproduce
-
Load a HF model using Fast-LLM:
Use a HF model that requires non-architecture-specific parameters (e.g.,window_sizefor sliding window attention). -
Observe missing configurations:
Check the output model configuration. Notice that parameters not included in the architecture config are missing or set to default values, potentially breaking the model.
🎯 Expected Behavior
Fast-LLM should correctly import all relevant configuration options from the Hugging Face config.json, not just those in the architecture configuration. This ensures that models are fully converted and behave as expected, without requiring manual intervention or hidden knowledge about configuration quirks.
I have found that when importing metadata, first, configuration is imported as architecture-only: https://github.com/ServiceNow/Fast-LLM/blob/main/fast_llm/engine/checkpoint/external.py#L206.
However, later, it is reassigned as the full configuration class if the child class sets it that way (and converter handlers set it this way):
https://github.com/ServiceNow/Fast-LLM/blob/main/fast_llm/engine/checkpoint/external.py#L211
I am not sure why this approach is used. I have tried importing config as a full class, and the checkpoint tests pass, but I am unsure if this could trigger any side effects.
Perhaps we could check the type of cls._model_class and decide whether to import the configuration as architecture-only or not?
@jlamypoirier what do you think?
This is not a bug, Fast-LLM is designed that way. The architecture/non-architecture split is good enough for most situations, but to get the behaviour described here we need to rethink this principle. Forcing a full import of the model config isn't really a good option because it would prevent setting any other model parameter. (Also it would need major changes to the conversion mechanism which only supports architecture parameters.)
I can think of one option that would almost always give us what we want. Instead of having a base model config that may or may not be overridden in a confusing way, we could create directly create a full base model config (as @bigximik suggests), then treat the provided base model config as a set of explicit updates to that imported config. This wasn't really possible before, but now we can use the override method introduced in #168.
This should work as expected:
- Pretrained config, no base model config: All architecture parameters are imported, and so are relevant non-architecture parameters (ex.
window_size). Other non-architecture parameters take the Fast-LLM default. - Pretrained config, base model config with non-architecture parameters: Parameters explicitly specified in the base model config are taken, others are as above.
- Pretrained config, base model config with architecture parameters: We probably want to enforce matching values, and raise an error for any mismatch. (This would be an improvement because right now wrong values are silently ignored.)
- No pretrained config: Same as before.
Hey @jlamypoirier, regardless of whether we call this a bug or a design flaw, the outcome is the same: the current behaviour is incorrect and leads to broken model conversions.
That said, @bigximik already outlined a clear and simple path forward, and I had mentioned Tuesday that any deviations from this plan need to be brought to the team before investing time in them.
If this takes more than a day or two to fix, it should be postponed. Right now, LoRA is the priority. We need to align on that.
@tscholak For the reasons I stated above, I do not believe the solution proposed by @bigximik could work even as a stopgap solution. But I can't just complain that things don't work, I have to propose an alternative to the team which I did through the proposal #170 and short POC #171.
This issue has been raised repeatedly and is considered a "bug" by many, so I consider it a priority issue. (And I was assuming you think the same since I was assigned and it was put in the "ready" section.)
@jlamypoirier, the "ready" column does not mean top priority.
We discussed in the sync that LoRA is the immediate focus. This issue here was marked ready under the assumption that a quick fix was possible. It's not an urgent blocker. If Denis' approach is not feasible, then that should have been brought to the team first, not worked around in parallel.
Again, this issue is not a priority right now. For now, please shift back to LoRA.