Dan Fu comments

Results 103 comments of


                                            Dan Fu

how long convolution ensures causal language modeling

Actually I remembered that this is not how this code works. self.L = L creates a kernel of length L that gets padded implicitly up to 2L later on. self.L...

Release of pretraining and fine tuning code

Code here: https://github.com/HazyResearch/safari We don’t have a config for fine tuning, but will look to add it soon! On Wed, Mar 8, 2023 at 2:10 PM ksrinivs64 ***@***.***> wrote: >...

Release of pretraining and fine tuning code

Thank you for the kind words! ``` If you do not have the time to cobble it together can you provide some hints on how a fine-tuning harness can be...

Training code?

Releasing the full training script is in our roadmap - will post an update here when we have more details about timing.

Training code?

These are available here now: https://github.com/HazyResearch/safari

2.7B Evaluations

Thanks for your interest! We plan to update the arxiv with the full evaluations soon. For now, we have the PPL of the 2.7B model against GPT-Neo-2.7B on the Pile:...

2.7B Evaluations

This is updated in the arXiv now: https://arxiv.org/abs/2212.14052

Releasing Training and Synthetic benchmarks

Yes, we plan to release the synthetics next week. Will post here with an update when we do!

Releasing Training and Synthetic benchmarks

These are available here now: https://github.com/HazyResearch/safari

ssm utils

Pushed now! Let us know if you run into any other problems.