jacobfulano

Results 4 issues of jacobfulano

GLU applies surgery to any model that has architecture modules `BertIntermediate` and `BertOutput`. Not all models have these modules; for example, DistilBert from HuggingFace has different names for the modules....

Add Replit repo to our main README

This is a draft PR for FSDP implementation in BERT