Junxian He comments

Results 18 comments of


                                            Junxian He

Oracle entity in Table 2 VS. Oracle keywords in Table 7

Hi, "Oracle entity" in Table 2 uses only the entity words in the groud-truth target, while "oracle keywords" contains non-entity words as well, as described in the paper

Oracle entity in Table 2 VS. Oracle keywords in Table 7

1. Yes, example_dataset/test.oraclewordns imply "oracle keywords" 2. The keywords used for training automatic keyword extractor are "oracle keywords", yet strictly speaking "oracle keywords" are not exactly "longest sub-sequences" -- as...

Oracle entity in Table 2 VS. Oracle keywords in Table 7

Hi, we use [stanza](https://stanfordnlp.github.io/stanza/) for NER, you may refer to some examples here: https://github.com/salesforce/ctrl-sum/blob/6468beaaceebf463b492992fffef0e4f693a3281/scripts/preprocess.py#L890

Unconditional Summarization Evaluation

Hi, you need to first train the keyword tagger and generate unconditional summaries. You can follow the README on "train the keyword tagger" and "evaluate CTRLsum" to replicate the results....

Unconditional Summarization Evaluation

The source file is the actual input to CTRLsum, for example, it would be `test.predwordsource` for unconditional summarization, and `test.oraclenssource` would produce the oracle performance.

Unconditional Summarization Evaluation

Hi, can you check [this thread](https://github.com/salesforce/ctrl-sum/issues/2#issuecomment-764466078) to see if there is any helpful information there? One important point is that we used the tagger checkpoint with the best validation loss...

Unconditional Summarization Evaluation

Hi, we used 8 16G v100 GPUs to train, which takes 1-2 days on the CNNDM dataset

Unconditional Summarization Evaluation

You can play with the `max_tokens` and `update_freq` variables in the training script to match our effective batch size: https://github.com/salesforce/ctrl-sum/blob/b9afc42be504f55795b0c3b3606163d77a7a852c/scripts/train_bart.sh#L17 If you want to train this on one GPU, then...

Unconditional Summarization Evaluation

Hi, I am not sure why this happens, have you turned on `--truncate-source` when training the model? Can you share your training log which would be better to debug?

Error: Could not find or load main class edu.stanford.nlp.process.PTBTokenizer

Hi, This seems to be an issue with the files2rouge package instead of our code, can you share a sample of ground-truth/predicted summaries so that I can reproduce this error?