core icon indicating copy to clipboard operation
core copied to clipboard

fix(jobs): changed copy host job to use scroll API for host with large content volume

Open dsolistorres opened this issue 1 month ago • 0 comments

Closes #33661

This PR addresses performance issues and pagination errors in the site copy job by implementing ElasticSearch Scroll API for a large result set.

Proposed Changes

  • When copying sites with large numbers of contentlets, the copy host job was encountering deep pagination errors when the offset exceeded ElasticSearch's max_result_window (100,000), and also performance degradation with offset-based pagination for large result sets.
  • A refactoring was done on the indexSearchScroll method from the ESContentFactoryImp class to expose the ES scroll API in a new wrapper interface ESContentletScroll. The PaginatedContentlets class uses this new interface to iterate on results using the ES scroll API.
  • SQL queries in HostFactoryImpl were optimized to use structure_inode field from contentlet table to filter hosts, and also to use the ILIKE clause in SQL conditions to match case insensitive values.

Checklist

  • [x] Tests

dsolistorres avatar Dec 08 '25 19:12 dsolistorres