Zhihua.Liu comments

Results 9 comments of


                                            Zhihua.Liu

About the experiment of MWPBert on math23k

Very grateful for your help. The model works fine then. However, I ended up with a score of 66.3, which is still lower than the result given in the paper.

About memory missing location information

I can also run through training. However, the current training results are not very good. I'm trying to train further

About memory missing location information

You can try to adjust the memory retrieval process to the end of 'apply_rotary_pos_emb' and compare the training performance. However, I did not try it further.

large memory usage

> The results of flash attention are somehow amazing... keep an eye on this. thanks, I'll check it

large memory usage

> And after reading the code, I have found that the ring attention should accept already-chunked qkv instead of the whole qkv. That is, qkv should be split into local...

pytorch model & ring attention

> Lucidrains has a pytorch implementation of RingAttention https://github.com/lucidrains/ring-attention-pytorch Have you tried this repo? I don’t know whether the experimental results are as expected. Seems that the model posted on...

pytorch model & ring attention

> What do you have in mind? Is this model suitable for tokenized ecosystem and bridging liquidity and creating a smart algorithm for bridging / blending / mending and growth...

`model_max_length` why is it 2048?

I encountered similar problem. When I use LWM-TEXT-512K (pytorch), warning that "Token indices sequence length is longer than the specified maximum sequence length for this model (42314 > 2048). Running...

The test results of “fastckpt” are not as expected

After setting repeat=True with different sequence_length, I got the following results. Are these results as expected? (When seq_len=1024, there is an obvious diff in the grad value; as seq_len increases,...