Datasource that interfaces with a TCAT instance
It works, and arguably fixes #117, but:
- The form looks hideous with the million query fields. Do we need them all for 4CAT? Is there a way to make it look better?
- The list of bins displayed in the 'create dataset' form simply lists bins from all instances. This can get really long really fast when supporting multiple instances. A custom form control may be necessary to make this user-friendly.
- The list of bins is loaded synchronously whenever
get_options()is run. The result should probably be cached or updated in the background (with a separate worker...?) - The data format now follows that of
twitterv2'smap_item(), but there is quite a bit more data in the TCAT export that we could include.
To enable this, in config.py:
DATASOURCES = {
"dmi-tcat": {
"instances": ["http://tcat7.digitalmethods.net"]
}
}
(for example)
* The form looks hideous with the million query fields. Do we need them all for 4CAT? Is there a way to make it look better?
We could hide some under an "Advanced Options" section since many are unlikely to be frequently used. I ordered them that way, but better to hide the "Advanced Options" section with a button or something similar.
* The list of bins displayed in the 'create dataset' form simply lists bins from all instances. This can get really long really fast when supporting multiple instances. A custom form control may be necessary to make this user-friendly.
I made some changes to what it displays (bin_name: num tweets from date to date), but I am not sure they best way to organize it. We could break it out by TCAT instance though that doesn't seem relevant to the users. Right now they are also ordered by instance than bin name; but I could at least order them by bin name to easily find what we want.
* The list of bins is loaded synchronously whenever `get_options()` is run. The result should probably be cached or updated in the background (with a separate worker...?)
From what I can tell, get_options() only runs when you select DMI-TCAT as the datasource type. Not sure if that is "too much" (probably with many TCAT instances it would be), but it is dynamic information (more so now since I added in datetimes to the bins). I think you are proposing a worker that runs periodically and, say, caches this data into a database or sort of background dataset somewhere? It would follow easily if we set up the database to store options/settings.
* The data format now follows that of `twitterv2`'s `map_item()`, but there is quite a bit more data in the TCAT export that we could include.
Mapped the rest of the TCAT data to the output. There was one oddity: thread_id. Technically a tweet can have both a reply_id and a quote_id (since you can retweet a reply or reply to a retweet). I wasn't sure how to prioritize them, but ultimately either will lead you to the correct "thread". Ideally, we'd find the original tweet and use that as the thread_id.
fixed dates as well as the AND/OR query to dmi-tcatv2 datasource
Dmi-tcatv2 datasource has been tested and I am happy with the results. The basic query should return expected results in the same format as twitterv2 (and has some robustness to return any additional TCAT data). I could possibly be improved on to better utilize some of TCAT's other tables, but I am not sure there is much additional value for most users.
Additionally there is the advanced query option. This allows a user to directly query any tables in the specific TCAT instance/database. It requires knowledge of the TCAT database structure which may not be readily available, but you could actually query for it if you like (e.g. SHOW TABLES).
OH, one oddity that I was not super sure how to resolve. In the collect_tcat_metadata class method, I needed a logger for the MySQLDatabase class we built. I could not figure out how to access the existing logger instance and ended up creating a new one. This currently has the unintended consequence of adding logger instances and making multiple log entries. Definitely needs a fix!
Updated the TCAT datasources to work with newer 4CAT changes (e.g. the config database).