Songlin Yang comments

Results 46 comments of


                                            Songlin Yang

Instructions for Generating Dependency Trees for CTB

Same issue

Differentiating Through Marginals of Dependency CRF

I am pretty sure that the reason is due to the "Chart" class, one should set cache=False if want to reuse the computation graph

using ssm_state and conv_state during training

Are there any convenient ways to set up the initial state for mamba? I wanna use TBPTT to train mamba on longer ctx size, so there is no need to...

Big results difference when using `tl.store`

Mine is normal. NVIDIA A100 80GB PCIe, Triton nightly release ``` Testing BFloat16... tensor([[ -3.0000, 4.3125, -5.2812, ..., -3.1094, 4.4062, 1.4141], [ -4.1875, 8.4375, 7.3750, ..., -4.2188, 0.7227, 4.2188], [...

Code for sentence generation with trained PCFG rules

Sorry we didn't have this support. I think it would be relatively easy to implement this by recursion.

[feature request] support log-bmm to context-free grammars

I am afraid that it does not work in CFGs. logbmm can only pass two tensors, and the dimension of tensors is 3. For example, in compound PCFG, we have...

[feature request] support log-bmm to context-free grammars

I found a slightly better way to reduce O(batch, n-w, w, A, B, C) to O(batch, n-w, A, B, C) Instead of combing B and C first, we can combine...

[feature request] support log-bmm to context-free grammars

no, it is not an issue for dependency parsing since dependency parsing does not have "non-terminals". Dependency parsing can be regarded as lexicalized CFGs with non-terminals is Null for dependency...

[feature request] support log-bmm to context-free grammars

Thank you, i'll have a try

[feature request] support log-bmm to context-free grammars

btw, i found the autograd of pytorch uses amounts of gpu memories to calculate gradient. if I use linear-scan to explicitly implement the outside algorithm and use inside-outside algorithm to...