data icon indicating copy to clipboard operation
data copied to clipboard

Usage of dataset schedule/conference.json

Open Daniel-at-github opened this issue 7 years ago • 3 comments

As @jonemo pointed out in https://github.com/pyvideo/data/issues/494#issuecomment-390267316 there are schedule datasets in symposium conference webs. Example:

  • https://2017.pycon-au.org/schedule/conference.json
  • https://us.pycon.org/2018/schedule/conference.json

Anyone have a idea on how to join this with the conferences in https://github.com/pyvideo/data dataset?

Daniel-at-github avatar Aug 05 '18 18:08 Daniel-at-github

The same in #522 with https://www.pyohio.org/2018/schedule/conference.json

Daniel-at-github avatar Aug 05 '18 18:08 Daniel-at-github

Relevant code on where conference.json is produced: https://github.com/pinax/symposion/search?q=conference.json&unscoped_q=conference.json

Daniel-at-github avatar Aug 05 '18 20:08 Daniel-at-github

My approach for the one conference where I used conference.json as data source was to slugify the first n characters of the titles using python-slugify (where n was something in the order of 30). This gave me a talk identifier that was quite robust against false positives (mismatching Youtube titles to conference.json titles) while matching almost all talks to their Youtube videos.

An alternative might be to use a string similarity metric on the conference title, see this Stackoverflow question for a few ideas how to quickly create a simple string similarity function.

jonemo avatar Aug 05 '18 22:08 jonemo