Ken Tsui comments

Results 15 comments of


                                            Ken Tsui

Make backend seed data more realistic

I am interested to contribute into this project. Is the goal to make dummy more data flexible? How about just read from a json file, whose path is configurable in...

Make backend seed data more realistic

@bitplane Yea, that's what I am thinking as well, so that it's easier to manage and can decouple so from the main.py I will propose this data structure. backend/test_data/ -...

Constructing Wikihow for QA with Metadata and Different Response Format

> I've only been able to find the full wikihowAll.csv in one location that seems to require manually downloading, I'm not sure if there's some reason for it not being...

Constructing Wikihow for QA with Metadata and Different Response Format

Thats what my notebook could generate now. I am still finetuning the formating/cleaning, prompt type and yet to add more template. Prompt types so far. Feel free to suggest. -...

Data quality filter for augmented data

To extend further, the pipeline can be applied to each dataset individually and all datasets as an aggregated one. Functionalities: Score: - leverage multiple reward models (our own, others in...

Data quality filter for augmented data

I had thought more about it, and started writing some interface, and some quick implementation. Please let me know if you have any comment. I am going to propose a...

Data quality filter for augmented data

@pruksmhc Thanks for your question! Yes the FilterPipeline can filter everything that we score in the ScorerPipeline based on absolute statistics and relative statistics. So its flexible enough to include...

ML Overview [temporary coordination issue, will be split up]

For `3. Evaluation` The [repo ](https://github.com/HLTCHKUST/chatgpt-evaluation)can be a benchmark with 23 datasets and they had tested against ChatGPT. Looks like a good framework and baseline for us to start with,...

OA Retrieval System Proposal

Also added a POC I had done: [REALM encoded wikipedia data](https://github.com/kenhktsui/open-information-retrieval)

OA Retrieval System Proposal

Adding one more consideration here: there are (at least) three ways of incorporating retrieval into LLM, with different degrees of coupling. 1. Embedding used for retrieval is trained jointly with...