typebox-codegen icon indicating copy to clipboard operation
typebox-codegen copied to clipboard

N^M algorithm in `ModelToArkType` + Fix

Open JonWolfeDrata opened this issue 10 months ago • 1 comments

Just tried this use this project on my huge swagger file with ~25,000 types (huge thanks btw!) and while it worked, it took 2.5 hours to process my model.

I looked into it and it seems to be a simple case of an N^M algorithm here when checking for invalid references.

https://github.com/sinclairzx81/typebox-codegen/blob/35000c1820740c0f5c21258507149f6c7566a083/src/model/model-to-arktype.ts#L237-L247

https://github.com/sinclairzx81/typebox-codegen/blob/35000c1820740c0f5c21258507149f6c7566a083/src/model/model-to-arktype.ts#L255-L257

You're checking every reference for every type, but references don't need to be re-checked every time since they in the same scope.

Changing those lines to below reduced my runtime to 3 seconds.

  function GenerateType(schema: Types.TSchema) {
    const buffer: string[] = []
    if (!reference_map.has(schema.$id)) return UnsupportedType(schema)
    const type = Collect(schema)
    buffer.push(`${schema.$id || `T`}: ${type}`)
    if (schema.$id) emitted_types.add(schema.$id)
    return buffer.join('\n')
  }
for (const reference of references) {
      reference_map.set(reference.$id, reference)
}
for (const type of model.types.filter((type) => Types.TypeGuard.IsSchema(type))) {
     buffer.push(`${GenerateType(type)},`)
}

This pre-checks all references, then we just do a lookup in the generation to see if it's invalid instead of re-checking everything.

Let me know if you want a PR!

JonWolfeDrata avatar Mar 31 '25 21:03 JonWolfeDrata

@JonWolfeDrata Hi, thanks for reporting


Just tried this use this project on my huge swagger file with ~25,000 types (huge thanks btw!) and while it worked, it took 2.5 hours to process my model.

Hey, this is a good find! (also ~25,000 types is going large!) Yeah, the original generator implementations were written in a rapid "just make it work" fashion and not really built with that kind of scale in mind. The reference / dereference implementations for each library generator were lifted from Zod (as that was the first generator), so I expect if the problem exists in Ark (as a derivative generator) it probably exists elsewhere too. I may do some digging to test that.


You're checking every reference for every type, but references don't need to be re-checked every time since they in the same scope.

It's been a while since I dug into the generator referential type code, but If i recall, the complexity there was written to support global / cross referenced types of the following form (where interface B needs to perform an outer lookup in A).

interface A {
  a: number
}
interface B {
  a: A // <-- non-local reference to A
}

The actual Model structures went through quite a bit of churn figuring out how best represent cross referenced types via TypeBox $ref and $id (and there was quite a bit of work done to try get Ark recursive types implemented (which had since been pulled due to downstream complexity but does need to go back in someday)). It is quite possible there isn't actually a requirement anymore to aggressively dereference remote types in the generator based on how the Model is being constructed upstream, so happy to trial the simplified logic.


This pre-checks all references, then we just do a lookup in the generation to see if it's invalid instead of re-checking everything. Let me know if you want a PR!

Yes please, that would be great! :) When making the update though, it would be good if you could comment the previous algorithm (rather than deleting) and drop a note in there about it being N^M + URL link to this issue. If there are any problems down the road, it just gives me a bit of historical context as to what's changed for that bit of code.

For testing, I actually approach developing and updating generators via the https://github.com/sinclairzx81/typebox-workbench project (as problems show as squiggles in editor). If you check this project out, run npm start, you can interactively test the updates via the workbench editor. Once you're happy with the update, copy it back to here, and submit the PR. I'll push the updates back to the workbench once the PR here is merged in (it's a bit of a arcane workflow, but helps in lieu of having good test automation for generated output)

Ark Generator Here: https://github.com/sinclairzx81/typebox-workbench/blob/main/src/codegen/model/model-to-arktype.ts

Many thanks! S

sinclairzx81 avatar Apr 01 '25 03:04 sinclairzx81