Missing Tasks
This is a list of tasks not yet in ParlAI that would be great to have. Feel free to add more to the list also! We will remove individual items when they are done.
Chit Chat
- [x] DailyDialog https://arxiv.org/abs/1710.03957
- [x] Datasets in decaNLP that are missing: https://github.com/salesforce/decaNLP
- [ ] CoLA https://nyu-mll.github.io/CoLA/
- [ ] Movie Discussions with Knowledge: https://arxiv.org/pdf/1809.08205.pdf
- [x] MultiWoz : https://arxiv.org/abs/1810.00278
- [ ] Video stories?: https://research.fb.com/wp-content/uploads/2018/10/A-Dataset-for-Telling-the-Stories-of-Social-Media-Videos.pdf?
- [x] AirDialogue http://www.aclweb.org/anthology/D18-1419
- [ ] Movie chat with background knowledge: http://aclweb.org/anthology/D18-1255,
- [x] Movie chat with Wikipedia grounding: http://aclweb.org/anthology/D18-1076, https://github.com/festvox/datasets-CMU_DoG
- [ ] Craiglist bargain http://aclweb.org/anthology/D18-1256
- [ ] Datasets from DSTC7 http://alborz-geramifard.com/workshops/nips18-Conversational-AI/Papers/18convai-DSTC7.pdf
- [ ] Movie recommendation: https://www.microsoft.com/en-us/research/uploads/prod/2018/11/deep_conversational_recommendations__1_1.pdf
- [x] Redial dataset: https://redialdata.github.io/website/
- [ ] OTTers https://arxiv.org/pdf/2105.13710.pdf
Knowledge-grounded datasets:
- [ ] Conversational reading (https://arxiv.org/pdf/1906.02738.pdf)
- [ ] Knowledge Dataset from DSTC7 https://github.com/DSTC-MSR-NLP/DSTC7-End-to-End-Conversation-Modeling/tree/master/data_extraction
- [x] Holl-E (https://github.com/nikitacs16/Holl-E, https://arxiv.org/abs/1809.08205)
- [ ] OpenDialKG (https://github.com/facebookresearch/opendialkg)
Visual Dialogue / QA Tasks / Captioning:
- [ ] KVQA http://dosa.cds.iisc.ac.in/kvqa-2/01/mishra_CR.pdf (see paper for links to other VQA too)
- [ ] GQA (VQA-type) dataset https://cs.stanford.edu/people/dorarad/gqa/
- [ ] Visual Storytelling https://arxiv.org/pdf/1604.03968.pdf
- [ ] Multimodal shopping dialogue (with images) https://arxiv.org/pdf/1704.00200.pdf
- [ ] Visual Commonsense reasoning https://visualcommonsense.com/
- [ ] Netizen-Style Commenting on Fashion Photos, https://mashyu.github.io/NSC/
- [ ] Conceptual Captions https://github.com/google-research-datasets/conceptual-captions
QA Tasks:
- [x] Natural Questions: https://ai.google/research/pubs/pub47761
- [x] HotpotQA: https://hotpotqa.github.io/
- [ ] SearchQA https://github.com/nyu-dl/SearchQA
- [ ] Who Did What (cloze qa): https://arxiv.org/abs/1608.05457
- [ ] NewsQA https://datasets.maluuba.com/NewsQA
- [x] QuAC: https://arxiv.org/pdf/1808.07036.pdf
- [x] CoQA (#1674): https://arxiv.org/abs/1808.07042
- [x] DREAM Dialogue QA https://arxiv.org/pdf/1902.00164.pdf
- [ ] DROP https://arxiv.org/abs/1903.00161
- [ ] Common Sense from ConceptNet https://arxiv.org/pdf/1811.00937.pdf
- [x] AmazonQA: http://jmcauley.ucsd.edu/data/amazon/qa/
@jaseweston How about AQuA dataset by deepmind for algebraic questions? https://github.com/deepmind/AQuA I can add support for this if it is acceptable.
I can possibly work on others too.
yes, sure -- it would be great if you can add it (and any others you want)!
@dfcf93 sure it would be great if you can add it!
I would add HotpotQA https://hotpotqa.github.io/, which my group are currently working on getting into ParlAI
I would add HotpotQA https://hotpotqa.github.io/, which my group are currently working on getting into ParlAI
awesome!
I am commenting here to let interested people know that I have a working DailyDialog implementation running on this fork: https://github.com/Mrpatekful/ParlAI/tree/dialogwae.
I will submit a PR as soon as I get around to clean the code and run some tests. I have to note though that in my implementation I only use the chat dataset without any annotations, so I guess it's not a full implementation of DailyDialog, but I would be happy to collaborate if someone were up to it. I did implement two types of tasks though: a single-turn task, and a task where all dialog history is used, for agents that can handle multiple utterances.
There's a large new grounded dialogue dataset that came out last month: Conversing by Reading: Contentful Neural Conversation with On-demand Machine Reading https://arxiv.org/pdf/1906.02738.pdf Might be useful?
Yes, it was already in our list..
On Fri, Jul 26, 2019 at 10:46 PM Abi See [email protected] wrote:
There's a large new grounded dialogue dataset that came out last month: Conversing by Reading: Contentful Neural Conversation with On-demand Machine Reading https://arxiv.org/pdf/1906.02738.pdf Might be useful?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/facebookresearch/ParlAI/issues/492?email_source=notifications&email_token=ACUOJ6B3JC5MI26UB7TTOU3QBNPDNA5CNFSM4ELCZIVKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD25VQQI#issuecomment-515594305, or mute the thread https://github.com/notifications/unsubscribe-auth/ACUOJ6F4YODDCO5NZHU6MNTQBNPDNANCNFSM4ELCZIVA .
This issue has not had activity in 30 days. Please feel free to reopen if you have more issues. You may apply the "never-stale" tag to prevent this from happening.
Will work on
- [ ] Conversational reading (https://arxiv.org/pdf/1906.02738.pdf)
Any help or suggestion of a better model welcome @abisee