docsearch icon indicating copy to clipboard operation
docsearch copied to clipboard

Option to keep previous record when JS page times out

Open ArthurFlag opened this issue 3 years ago • 4 comments

Describe the problem

Currently, the crawler regularly fails on one of my docs pages because it doesn't load fast enough. When this happens, the crawler reports a failure and seems to delete the records extracted during the last successful crawl, meaning my page is not indexed at all.

Describe the solution

I'd like to be able to specify what to do in case of timeout for example with an option like deleteRecordsOnFailure: false.

ArthurFlag avatar Feb 06 '23 12:02 ArthurFlag

Any update on this possiblity? I have to restart my crawl manually for this page daily 😬

ArthurFlag avatar Feb 24 '23 12:02 ArthurFlag

Hey @ArthurFlageul, let me check to see if there's an option that can help here.

shaneafsar avatar Feb 24 '23 21:02 shaneafsar

Hiya @shaneafsar , any update on this 😊 ?

ArthurFlag avatar Mar 01 '23 11:03 ArthurFlag

Hey @ArthurFlageul , thanks for the reminder. A couple comments:

  • Putting maxLostRecordsPercentage to 0 would block the crawl if any of the pages has failed. But, you'll have to manually unblock your crawler, and it will fail too if you remove content on purpose.
  • To improve load times, would it be possible to disable JS rendering on the crawler via renderJavaScript (if you haven't tried already)?

shaneafsar avatar Mar 01 '23 16:03 shaneafsar

Closing this issue.

randombeeper avatar Jul 10 '24 22:07 randombeeper