Dan Fu comments

Results 103 comments of


                                            Dan Fu

inconsistent output from fftconv_func and native pytorch fft

Hi, thank your interest! Can you provide more details about the differences in output you're seeing? It may be slight numerical errors due to fp32/fp16/bf16. We have a test in...

inconsistent output from fftconv_func and native pytorch fft

Yes, that is within a standard numerical error that can come from two slightly different but mathematically equivalent implementations of the same operation, or from converting between fp32/fp16 and bf16.

Multilingual?

We currently are not planning on training a multilingual version, but let me know if the finetuning works! It's currently using the BERT tokenizer, so I'm not sure how well...

FFT Conv on Seq > 8192?

Hi, great question! The FFTConv in this repo is a fused CUDA kernel for running general convolutions on sequences 8K or shorter - longer does not fit in SRAM, so...

Error running benchmarks/benchmark_generation.py

@BlinkDL try with this fix now! For the KeOps issue - can you share details about your environment? PyTorch, CUDA, and KeOps versions would all be helpful.

Error running benchmarks/benchmark_generation.py

There are examples to switch between the different models for text generation: https://github.com/HazyResearch/H3/tree/main/examples

Correct method to load 2.7B?

We're looking into this, stay tuned!

Correct method to load 2.7B?

Here are examples about how to load all the models, and example outputs: https://github.com/HazyResearch/H3/blob/main/examples/README.md

Using (Absolute) Positional Embeddings with Hyena Operators

Thanks for the question! There was a very slight degradation in performance (maybe half a point MLM accuracy), so we kept them in. Our hypothesis is that since it's bidirectional,...

Using (Absolute) Positional Embeddings with Hyena Operators

Good questions! This blog post may clear most things up: https://hazyresearch.stanford.edu/blog/2023-12-11-conv-tutorial . It has to do with the details of what the FFT convolution implements, and that explains the differences...