Usage of dataset schedule/conference.json
As @jonemo pointed out in https://github.com/pyvideo/data/issues/494#issuecomment-390267316 there are schedule datasets in symposium conference webs. Example:
- https://2017.pycon-au.org/schedule/conference.json
- https://us.pycon.org/2018/schedule/conference.json
Anyone have a idea on how to join this with the conferences in https://github.com/pyvideo/data dataset?
The same in #522 with https://www.pyohio.org/2018/schedule/conference.json
Relevant code on where conference.json is produced:
https://github.com/pinax/symposion/search?q=conference.json&unscoped_q=conference.json
My approach for the one conference where I used conference.json as data source was to slugify the first n characters of the titles using python-slugify (where n was something in the order of 30). This gave me a talk identifier that was quite robust against false positives (mismatching Youtube titles to conference.json titles) while matching almost all talks to their Youtube videos.
An alternative might be to use a string similarity metric on the conference title, see this Stackoverflow question for a few ideas how to quickly create a simple string similarity function.