llm-answer-engine icon indicating copy to clipboard operation
llm-answer-engine copied to clipboard

Suggestions: Use Readability (JSDOM) and Query de-structuring

Open oliviermills opened this issue 1 year ago • 1 comments

Suggestions:

  1. Use Mozilla's Readabiility (requires JSDOM) but does a great job reduce the size of content from the webpage for the context injection. https://github.com/mozilla/readability

  2. Use Query decomposition rather than rephrasing, and doing parallel searches, this helps for more complex queries (but does multiple the hits to brave/serarch engine by 3), e.g Your role is to generate a few short and specific search queries to answer the QUERY below. List 3 options, 1 per line. - then parse the response with a split on \n and remove numbers - or use json response. Unchecked code sample:

// get variations of input
const queryVariations = [...(await getVariations(input)), input]
// run the search for each variation
const braveTasks = queryVariations.map(q => searchWithBrave(q))
// remove any duplicates based on all 3 variations to avoid fetching the html of the same page more than one)
// you could use this to rerank instead of removing.. ie if the result showed more than one you could higher rank it for RAG
const uniqueResults = _.uniqBy(await Promise.all(braveTasks)), "url")

oliviermills avatar Mar 16 '24 17:03 oliviermills

Thank you for this! Let me dive in and try this :)

developersdigest avatar Mar 24 '24 03:03 developersdigest