core
core copied to clipboard
fix(jobs): changed copy host job to use scroll API for host with large content volume
Closes #33661
This PR addresses performance issues and pagination errors in the site copy job by implementing ElasticSearch Scroll API for a large result set.
Proposed Changes
- When copying sites with large numbers of contentlets, the copy host job was encountering deep pagination errors when the offset exceeded ElasticSearch's max_result_window (100,000), and also performance degradation with offset-based pagination for large result sets.
- A refactoring was done on the
indexSearchScrollmethod from theESContentFactoryImpclass to expose the ES scroll API in a new wrapper interfaceESContentletScroll. ThePaginatedContentletsclass uses this new interface to iterate on results using the ES scroll API. - SQL queries in
HostFactoryImplwere optimized to usestructure_inodefield fromcontentlettable to filter hosts, and also to use the ILIKE clause in SQL conditions to match case insensitive values.
Checklist
- [x] Tests