Option to keep previous record when JS page times out
Describe the problem
Currently, the crawler regularly fails on one of my docs pages because it doesn't load fast enough. When this happens, the crawler reports a failure and seems to delete the records extracted during the last successful crawl, meaning my page is not indexed at all.
Describe the solution
I'd like to be able to specify what to do in case of timeout for example with an option like deleteRecordsOnFailure: false.
Any update on this possiblity? I have to restart my crawl manually for this page daily 😬
Hey @ArthurFlageul, let me check to see if there's an option that can help here.
Hiya @shaneafsar , any update on this 😊 ?
Hey @ArthurFlageul , thanks for the reminder. A couple comments:
- Putting
maxLostRecordsPercentageto 0 would block the crawl if any of the pages has failed. But, you'll have to manually unblock your crawler, and it will fail too if you remove content on purpose. - To improve load times, would it be possible to disable JS rendering on the crawler via
renderJavaScript(if you haven't tried already)?
Closing this issue.