Daniel D. Johnson

Results 23 comments of Daniel D. Johnson

There's a check for unrecognized configuration arguments in `llama_from_huggingface_model` because it is otherwise pretty difficult to make sure that the converted model has the same behavior as the original one....

Looks like there's a few small type errors right now, actually: ```python /home/runner/work/penzai/penzai/penzai/nn/linear_and_affine.py:1140:1: error: in _from_config: bad return type [bad-return-type] Expected: Union[Conv, ConvInPlace, ConvTranspose, ConvTransposeInPlace] Actually returned: AbstractGeneralConv return core_layer~~~~~~~~~~~~~~~~~~~~...

Hmm good question. At least in terms of the concrete layers that get built, I think it makes sense to have the Conv/ConvTranspose and AddBias layers separate, rather than a...

Thanks @guxm2021 for pointing out these issues! The `jit_wrapper.Jitted` issues should definitely be fixed before merging this. In general any attribute that affects the shape of the result should be...

Hm, that's odd. I wonder if the initialization issue and the runtime issue have the same cause or different causes. Do you have a traceback for when the allocation fails?...

Thanks! Do you have more of the first traceback? Looks like some of the interesting parts are in the ... (in particular the part that looks like ``` ---> 88...

Interesting, thanks for digging into it! From the stacktraces you shared in https://github.com/google-deepmind/penzai/issues/122#issuecomment-2957687086, my best guess about what's going on is: - During initialization, the large array containing all of...

Hm, that's very weird. Usually unnecessary memory copies like this can be optimized away by the compiler, at least under jit. The changing stack trace also seems strange.

Ah, this could be better documented! The initializer for Penzai's Linear layer doesn't directly use JAX-style initializers like `jax.nn.initializers.xavier_normal()`. You should be able to use `pz.nn.xavier_normal_initializer` instead.