Rick Battle
Rick Battle
Can someone please update the website to reflect that Clarity is not (yet) compatible with v13? I ran face-first into this issue and the website makes it sound like v13...
I'm wondering the same thing, but for a slightly different use-case. I'm wondering how to add/update/remove documents over time without pretraining from scratch each time.
https://github.com/google-research/language/blob/master/language/realm/generate_retrieval_corpus.py
Trained WebQuestions and Natural Questions models are available at gs://orqa-data/orqa_nq_model and gs://orqa-data/orqa_wq_model respectively.
It's not the easiest thing to use, but ColBERT does support pre-filtering: Here's the chunk I use: ``` if len(query.conditions) > 0: results = searcher.search(query.query, k=query.k, filter_fn=lambda pids: torch.tensor( [index...
You don't need to index metadata that won't help the search. For example, `lastmod` dates from HTML pages are useful metadata, but no one is searching for a `lastmod` date....
``` qa_pairs = [ { 'question': 'what is a hypervisor?', 'answer': 'A hypervisor is software that creates and runs virtual machines (VMs).' }, { 'question': 'what is evc?', 'answer': 'EVC...
I haven't profiled it, but that's what I assume is happening. Each process opens its own copy of the dataset, thus there's one copy of the dataset in RAM per...
DSPy has a small default for max tokens. Override it to get a longer response: lm = dspy.OpenAI( [...] max_tokens=4096, )