wonderful

Results 6 issues of wonderful

![image](https://user-images.githubusercontent.com/47274616/188069802-24f1f559-286b-44c4-8f25-bdcdeeb3f860.png) 我现在跑的芯片型号是NVIDIA,ARMv8 Processor rev 0 (v8l)。 我看知乎文章里说测试浮点峰值时FMA指令的排布数量 = FMA的发射数 * FMA指令的延迟。我并没有查到上面这个芯片的手册。但是我看了A57的手册,里面是这样记录的: ![image](https://user-images.githubusercontent.com/47274616/188070445-e1f460f6-08ae-4913-ad7a-3cf041402565.png) FMA指令的延迟是10,吞吐量是2。我不太清楚这个吞吐是否代表着芯片可以同时发射两条FMA指令(是芯片发射吗),但是我分别放置了10条FMA指令(OP_FLOATS = 80)和20条FMA指令(OP_FLOATS = 160)都测试了,发现在10条的时候是16.095492 GFLOPS, 20条是 18.759214 GFLOPS。这是什么原因呢? 我的猜测有两个: 1.10条FMA指令确实不是测试这款芯片的浮点峰值所需要的指令数。 2.可能编译器自动开启了多线程?这个比较有可能,因为从4条指令到10条指令性能差不多翻倍,但是10-20只增加了一点。

Great project, helps me understand DL code a lot. I used it like this: ``` python Patch_embed = Float32[torch.Tensor, f"B {PATCH_H} {PATCH_W} {PATCH_EMBED_DIM}"] Mlp_mid = Float32[torch.Tensor, f"B {PATCH_H} {PATCH_W} {MLP_HIDDEN}"]...

feature

![image](https://github.com/Ryota-Kawamura/How-Diffusion-Models-Work/assets/47274616/fa148784-7bf8-45f8-9b4f-15b83f44cc97) It should be > (1 - ab_t[t, None, None, None]).sqrt() * noise according to the formula in the DDPM paper. ![image](https://github.com/Ryota-Kawamura/How-Diffusion-Models-Work/assets/47274616/4019c17f-3dd2-4928-ad95-1bef05df3ff3)

Change to prune Gaussians when there actually is any point need to remove. The old version will do remove_points even when to_remove is all false, which will discard all gradients...

Hi, I am encountering an issue while attempting to reproduce results reported in your paper by using the provided checkpoints. I cloned habitat-lab version 0.2.2 and habitat-sim version 0.2.2, and...

### Contact Details [email protected] ### What happened? When I use llamafile with python api. But for 2 models I use, they all retain the end token in response string, that...

bug
low severity