JSON and TypeAdapters produce unwanted values or empty list
Hi @hudson-ai!
Concerning the TypeAdapter constrained generation, here are some example of the issue mentioned here:
from guidance import models, capture
from guidance import json as jj
from pydantic import BaseModel, TypeAdapter
import json
from Noema.cfg import *
lm = models.LlamaCpp(
"../Models/Mistral-NeMo-Minitron-8B-Instruct.Q4_K_M.gguf",
n_gpu_layers=99,
n_ctx=512*8,
echo=False
)
lm.reset()
lm += "Generate a list of 3 integers between 1 and 4: " + capture(G.arrayOf(G.num()), name="generated_object")
print(lm["generated_object"])
# Output: ["1", "2", "3"]
lm.reset()
schema = TypeAdapter(list[int])
lm += "Generate a list of 3 integers between 0 and 4: " + jj(name="generated_object", schema=schema)
print(json.loads(lm["generated_object"]))
# Output: []
lm.reset()
lm += "Créé une liste des différentes étapes décrites ici: Ce matin je suis parti tot, puis j'ai acheté des pommes et enfin je suis allé au restaurant." + capture(G.arrayOf(G.sentence()), name="generated_object")
print(lm["generated_object"])
# Output: ["Ce matin je suis parti tot, puis j'ai acheté des pommes et enfin je suis allé au restaurant."]
lm.reset()
schema = TypeAdapter(list[str])
lm += "Créé une liste des différentes étapes décrites ici: Ce matin je suis parti tot, puis j'ai acheté des pommes et enfin je suis allé au restaurant." + jj(name="generated_object", schema=schema)
print(json.loads(lm["generated_object"]))
# Output: []
The file containing custom CFG is here.
This is just a workaround but it helps to produce a non empty list.
Concerning the JSON:
lm.reset()
class Schema(BaseModel):
weather: str
lm += "What is the weather today? " + jj(name="generated_object", schema=Schema)
print(json.loads(lm["generated_object"]))
# Output using Minitron 8B : {'weather': ', '}
# Output using llama3 instruct: {'weather': ':sunny:'}
I'm not sure to understand what the expected generation is, but it seems that characters from the format are interfering with the generated content.
Hi @AlbanPerli sorry for the late reply here :)
I think that part of what you are encountering here is that lists aren't forced to be non-empty by default (I think your custom grammar definitions enforce a minimum length of one). If you want to enforce this behavior with TypeAdapters, you can use typing.Annotated and annotated_types.MinLen like so:
from typing import Annotated
from annotated_types import MinLen
from pydantic import TypeAdapter
ta = TypeAdapter(Annotated[list[int], MinLen(1)])
ta.json_schema()
# Output: {'items': {'type': 'integer'}, 'minItems': 1, 'type': 'array'}
You can of course get this behavior by just writing the JSON schema directly, or if you're using a pydantic.BaseModel, you can do these annotations a bit more ergonomically with the pydantic.Field descriptor.
If this doesn't address the core issue you're seeing, just let me know and we can figure it out :)
Hi @hudson-ai , my turn to apologize for the response time! :)
The point was indeed the minimum length, I wasn't aware of this parameter for the TypeAdapter.
Thank you!
Ok, good to know that works for you! Let us know if you hit any other unexpected or unintuitive behaviors :)