perceptron: Isaac-0.1 implementation
Perceptron Isaac Implementation
Perceptron released open weight models Isaac-0.1 and Isaac-0.1-Base a 2B dense model for perception.
Nice, lmk if you need any help or a quick review 🤗
@zucchini-nlp - we are just closing this out feel free to give us a quick review while we just finish up testing!
amazing, reviewing on Monday :)
@zucchini-nlp just pushed a round of updates addressing your feedback; give it another look when you get a moment
@zucchini-nlp just pushed another round of refinements based on the latest feedback. Would love for you to give it another pass when you get a chance
Hi @zucchini-nlp, gentle ping on this PR when you get a chance. Thanks for your time!
Hey @zucchini-nlp - just wanted to follow up again on this PR. It’s been a little while since the last round, and your review would help us move things forward. Appreciate it!
Leaving a comment and question from a quick review: Is it absolutely necessary for the core functionality of the model to be preserved to have dependencies in tensorstream?
TensorStream is a core primitive for us; it underpins how we handle multimodal computation, and future open-source releases will include incremental improvements to its ergonomics and performance aligned with model updates. We see it as the right abstraction boundary for multimodal modelling.
That said, we’ll be guarding the imports here to avoid issues on that front. As a heads up, we'll also following up with changes here that address feedback around pattern-reuse standards in transformers, feedback there is much appreciated.
Thanks @philippguevorguian , then I will see if we can integrate it. I can't say for now if we'll be able to integrate it as our principles for modeling files is to not abstract too much and keep the models "hackable", meaning a user should be able to intervene at any given point of a model's forward, adding a module, modifying the processing, etc. But if the rest is more transformers-aligned, and since the model is relevant in general, it'll be easier.
In particular attention classes that currently are relying on for instance integrations/flash_attention.py should be used for FA. To explain: the policy is to have one "naive" path, eager_attention_forward, that is a Callable explicitly defined in the modeling code and serves as a baseline for attention computation. The optimized paths (sdpa, fa, flex, etc) and associated masks are handled through config keys that swap this Callable into another one that wraps an efficient fa kernel.
For cross-document masking, combinations of existing utils should be sufficient, like masking_utils.py that defines and and or operators, rather than specific utils for this model. If these two "reuse" points are addressed it'll be excellent
@molbap would appreciate another round of reviews here
Hi @philippguevorguian, the main branch is very unstable right now as the next major version (v5) is around the corner, but will review soon!
I am currently reviewing the totality but wanted to come back on tensorstream. One thing that'll help the CI already will be to protect tensorstream imports that are not present in our test images: the majority of failing tests are just because the imports aren't guarded. The best would be to have an equivalent codepath in PyTorch for the utilities it covers, akin to how for instance we can run mamba models without their dedicated tensor contraction kernels.
protect
tensorstreamimports that are not present in our test images: the majority of failing tests are just because the imports aren't guarded.
Added the proper guards around the problematic imports, which clears out most of the CI failures
Ty will review today !
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.
[For maintainers] Suggested jobs to run (before merge)
run-slow: auto, isaac