Dan Fu

Results 103 comments of Dan Fu

Hi, thank your interest! Can you provide more details about the differences in output you're seeing? It may be slight numerical errors due to fp32/fp16/bf16. We have a test in...

Yes, that is within a standard numerical error that can come from two slightly different but mathematically equivalent implementations of the same operation, or from converting between fp32/fp16 and bf16.

We currently are not planning on training a multilingual version, but let me know if the finetuning works! It's currently using the BERT tokenizer, so I'm not sure how well...

Hi, great question! The FFTConv in this repo is a fused CUDA kernel for running general convolutions on sequences 8K or shorter - longer does not fit in SRAM, so...

@BlinkDL try with this fix now! For the KeOps issue - can you share details about your environment? PyTorch, CUDA, and KeOps versions would all be helpful.

There are examples to switch between the different models for text generation: https://github.com/HazyResearch/H3/tree/main/examples

We're looking into this, stay tuned!

Here are examples about how to load all the models, and example outputs: https://github.com/HazyResearch/H3/blob/main/examples/README.md

Thanks for the question! There was a very slight degradation in performance (maybe half a point MLM accuracy), so we kept them in. Our hypothesis is that since it's bidirectional,...

Good questions! This blog post may clear most things up: https://hazyresearch.stanford.edu/blog/2023-12-11-conv-tutorial . It has to do with the details of what the FFT convolution implements, and that explains the differences...