Muhammad Anas Raza
Muhammad Anas Raza
Faced the same issue. While training.
Raised PR to port example A Vision Transformer without Attention: keras-team/keras-core#497
Raised a PR to port example to `keras-core`: Compact Convolutional Transformers keras-team/keras-core#523
Torch and Jax don't have standard implementation. Can we implement it from scratch using ops?
Will look like the tensorflow [implementation](https://github.com/tensorflow/tensorflow/blob/v2.13.0/tensorflow/python/ops/image_ops_impl.py#L4281) where it uses depthwise convolution.