Refresh icon indicating copy to clipboard operation
Refresh copied to clipboard

How to generate summary for my own data?

Open cherukuravi opened this issue 7 years ago • 7 comments

Hi Shashi,

I am trying to generate a summary of my own text article using the pretrained embeddings provided in the link. I created a doc file with the article text and saved as cnn.test.doc and also updated the corresponding title file. But when I am running the code it shows error as shown below.

File "/Users/ravi/Desktop/Sidenet-1/data_utils.py", line 263, in populate_data thissent = [int(item) for item in line.strip().split()] I have given a text document but it is accepting the integers. I guess do we need to provide the Word id's of the corresponding words in a sentence.

Can you please guide me on how to generate the summary for new text articles using this code.

cherukuravi avatar Nov 12 '18 17:11 cherukuravi

Could you send me an email? I will send you the code behind our demo.

shashiongithub avatar Nov 13 '18 15:11 shashiongithub

def stanford_processing(log, story, highlights): story_corenlp = None highlights_corenlp = None try: log += timestamp()+" Start Stanford Processing (SSegmentation,Tokenization,NERTagging) ...\n"

    story_corenlp = subprocess.check_output(['./corenlp.sh', story])
    highlights_corenlp = subprocess.check_output(['./corenlp.sh', highlights])
        
    log += timestamp()+" Stanford Processing finished.\n"
except Exception as e:
    log += timestamp()+" Stanford Processing failed.\n"+str(e)+"\n"

return log, story_corenlp, highlights_corenlp

corenlp.sh file is the same as provided in the stanford github page or is a custom created one when i am using the stanford one it it giving "permission denied" error

adisri2694 avatar Feb 15 '19 13:02 adisri2694

I think there should be one with the demo code. It is not the one at the stanford github page.

shashiongithub avatar Feb 16 '19 15:02 shashiongithub

Hey shashi , its not there ,will you mind sending me that . Thanks for your help

On Sat 16 Feb, 2019, 9:07 PM Shashi Narayan <[email protected] wrote:

I think there should be one with the demo code. It is not the one at the stanford github page.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/EdinburghNLP/Refresh/issues/11#issuecomment-464356418, or mute the thread https://github.com/notifications/unsubscribe-auth/Al43-NYK4jx8cxX8tzA7xk77XY9CfoU8ks5vOCW3gaJpZM4YaJGJ .

adisri2694 avatar Feb 16 '19 15:02 adisri2694

corenlp has this:

#!/bin/bash wget --post-data "$1" 'localhost:9000/?properties={"annotators": "tokenize,ssplit,pos,lemma,ner", "ssplit.newlineIsSentenceBreak": "always", "outputFormat": "text"}' -O -

shashiongithub avatar Feb 16 '19 15:02 shashiongithub

Could you send me an email? I will send you the code behind our demo.

Can you please send also to me ? How to send you my email ? thank you very much !

OmerET8 avatar Feb 20 '19 09:02 OmerET8

Hi, Shashiong. I also want to test with my own dataset, could you also send me the demo code of how to preprocessed the data to generate the files in your "preprocessed-input-directory"? Thanks!

luyunan0404 avatar Jan 09 '21 11:01 luyunan0404