Open-Assistant icon indicating copy to clipboard operation
Open-Assistant copied to clipboard

Data source: TED-talk transcripts

Open Ric71 opened this issue 2 years ago • 8 comments

As the content of the TED-talks is highly inspiring I suggest to use the offered transcripts of the talks for Open Assistant.

https://www.ted.com/search?cat=pages&q=TED+talks+with+transcript%28transcript%29

It’s covered by the license. https://creativecommons.org/licenses/by-nc-nd/3.0/ Don’t know if the usage for an LM applies to this.

Ric71 avatar Feb 17 '23 13:02 Ric71

TED Talks are monologues, I imagine it would be hard to convert it to a Question-Answer format required to train a chat model.

CoolSpot avatar Feb 17 '23 22:02 CoolSpot

E.g. "Hey Open assistant please write a funny lecture/presentation about procrastination"

Ric71 avatar Feb 17 '23 22:02 Ric71

Do you want to try converting to dialog? Maybe come up with creative ways to parse the text to turn into q/a style?

huu4ontocord avatar Feb 20 '23 13:02 huu4ontocord

I am very sorry to say, that my coding-abilities are not sufficient to do so. I could try to think of ideas for the parsing but not for it’s implementation. I also could try to do it manually at least for some presentations. But may I ask - wasn’t it sufficient to just use each presentation as the answer to a prompt asking for such presentation? Additionally an expert might have to check first, if using the transcripts as data-feed for LM is covered by the license.

Ric71 avatar Feb 20 '23 15:02 Ric71

Here it is the license: https://www.ted.com/about/our-organization/our-policies-terms/ted-talks-usage-policy.

And here the relative about Transcript:

"Transcripts and subtitles may be used under the same Creative Commons license in conjunction with the TED Talk video. Copyright on the transcripts is owned by TED and any edits, alternate usage rights or changes to these documents are not permitted without permission. Therefore, if you wanted to publish a TED Talk in a book, test, play, or any other publication, permission is required."

It seems quite closed corporately.

juanjfndz avatar Feb 20 '23 15:02 juanjfndz

I'll make a request

Ric71 avatar Feb 20 '23 15:02 Ric71

Hi all what is the status on this? Are we continuing this or if not, i can close the issue. thank you for looking into this :)

huu4ontocord avatar Feb 24 '23 06:02 huu4ontocord

@ontocord I made a request through the TED-inquiry-form but an answer is still pending.

Ric71 avatar Feb 24 '23 12:02 Ric71

Thank you!

huu4ontocord avatar Apr 09 '23 21:04 huu4ontocord

Unfortunately they want to charge the use of the transcipts. Sry that I did not tell earlier.

Ric71 avatar Apr 09 '23 21:04 Ric71