jacobfulano
Results
4
issues of
jacobfulano
GLU applies surgery to any model that has architecture modules `BertIntermediate` and `BertOutput`. Not all models have these modules; for example, DistilBert from HuggingFace has different names for the modules....
Add Replit repo to our main README
This is a draft PR for FSDP implementation in BERT