Rick Battle

Results 9 comments of Rick Battle

Can someone please update the website to reflect that Clarity is not (yet) compatible with v13? I ran face-first into this issue and the website makes it sound like v13...

I'm wondering the same thing, but for a slightly different use-case. I'm wondering how to add/update/remove documents over time without pretraining from scratch each time.

https://github.com/google-research/language/blob/master/language/realm/generate_retrieval_corpus.py

Trained WebQuestions and Natural Questions models are available at gs://orqa-data/orqa_nq_model and gs://orqa-data/orqa_wq_model respectively.

It's not the easiest thing to use, but ColBERT does support pre-filtering: Here's the chunk I use: ``` if len(query.conditions) > 0: results = searcher.search(query.query, k=query.k, filter_fn=lambda pids: torch.tensor( [index...

You don't need to index metadata that won't help the search. For example, `lastmod` dates from HTML pages are useful metadata, but no one is searching for a `lastmod` date....

``` qa_pairs = [ { 'question': 'what is a hypervisor?', 'answer': 'A hypervisor is software that creates and runs virtual machines (VMs).' }, { 'question': 'what is evc?', 'answer': 'EVC...

I haven't profiled it, but that's what I assume is happening. Each process opens its own copy of the dataset, thus there's one copy of the dataset in RAM per...

DSPy has a small default for max tokens. Override it to get a longer response: lm = dspy.OpenAI( [...] max_tokens=4096, )