Feature -> tensor layer number parameter and separate from layer name
-
ggml tensors need a layer number parameter I'd use layer 0 for global and 1+ (could also be -1 and 0+ of course)
-
When a ggml tensor is created the latest configured layer is used, default is 0
ggml_set_current_layer(il+1);This way it's only a single line of code in the eval loop to set the layer of each node. -
The model currently contains the name intermixed with the layer like "layers.58.attention.wo.weight" This also should be changed "attention.wo.weight" and layer number set to 59
Benefits:
- debug output will be more clean and informative, the layer of each calculation is part of the graph print now (without adding it hardcoded into the generic name)
- optimizations can be applied by layer name or weight name in a clean way
Good idea
Would love to take this.
Lmk if looks good @cmp-nct https://github.com/ggerganov/llama.cpp/pull/1692
Good idea
Apologies - I was wrong
The better approach to support such type of functionality is through the new extra member of ggml_tensor:
https://github.com/ggerganov/llama.cpp/blob/98ed16557432d7a5179c57eddcc3a08a7ae6d54d/ggml.h#L390
The reason is to keep ggml interface general-purpose.
The "layer" concept does not always make sense - i.e. it is specific for particular architectures. Therefore cannot be part of the interface.
You recall my "meta" recommendation a month or two ago? That's basically what "extra" appears to be. Though as a void pointer that would not be supported by IDE autocompletion, right ?
Wouldn't a simple struct be the better solution than a void pointer ? It's not as "flexible" but unused elements can just stay undefined/default.
Regarding the layer: I originally recommended using layer -1 or 0 for "no layer", so for tensors that are not layer specific that's a way to specify it.
I personally use something like this, it's not 100% up to latest ggml commits since falcon I shifted my focus but basically that should work (just extra instead of meta):
typedef struct
{
uint8_t layer; // 0 = not set, 1+ = layer num
char short_name[32]; // parameter weight name without layer
int8_t flag_high_performance; // if this tensor is high performance (relevant for hybrid core systems like Intel 12/13 gen)
int8_t flag_use_blas_cuda; // dst or src0 tensor, -1 = default, 0 = off, 1=on (if contigous)
int8_t caching_priority; // -1 no caching, 0 = default, 1+ = priority | caching is by layer number and layer name (0 = global layer)
} tensor_meta;
// default tensor meta:
static const tensor_meta GGML_DEFAULT_TENSOR_META = {
/*.layer =*/ 0,
/*.short_name =*/ "",
/*.flag_high_performance =*/ 1,
/*.flag_use_blas_cuda =*/ -1, // -1 = use default
/*.caching_priority =*/ -1, // no caching
};