Improve memory usage of `ModelToEncoder`

Open mikethea1 opened this issue 1 year ago • 0 comments

Thanks for creating and maintaining this great library!

What would you like to be added:

Today, ModelToEncoder calls ModelToEncoding which statically initializes a dictionary of 7 encodings. 6/7 are duplicates.

When each encoding is constructed, the constructor eagerly loads a bunch of data from manifest resources. As far as I can tell, this data gets loaded separately for each instance.

I would like to be able to call ModelToEncoder and only have it lazily load the encoding I care about. Furthermore, I'd like to see it share Encoding instances among models which map to the same encoding.

An example implementation might look like this:

public static Encoding? TryFor(string modelName)
{
    switch (modelName)
    {
        case "gpt-4o":
            return O200KCache.Instance;
        case "gpt-4":
        ...
        case "text-embedding-3-large":
            return Cl100KCache.Instance;
        default:
            return null;
    }
}

private static class O200KCache
{
    public static readonly O200KBase Instance = new();
}

private static class Cl100KCache
{
    public static readonly Cl100KBase Instance = new();
}

Why is this needed:

Reduce memory footprint and startup time, especially as more models are added.

Anything else we need to know?

I'd be happy to file a PR for this if you're interested!

Aug 14 '24 16:08 mikethea1