[Bug]: remove_overlay_elements is not working
crawl4ai version
2025-feb-alpha-1
Expected Behavior
Adding remove_overlay_elements=True to your config should remove overlays from the scraped pages.
Current Behavior
It does not remove any overlays.
Is this reproducible?
Yes
Inputs Causing the Bug
Any page will have this problem as the current code will inject the js code from js_snippet/remove_overlay_elements.js but that code is never executed as this https://github.com/unclecode/crawl4ai/blob/15fd96db17fe748a2ac1cbde3b11a7f4d8805b30/crawl4ai/async_crawler_strategy.py#L1772 wraps the code, but does not actually execute this anonymous function.
Steps to Reproduce
Code snippets
OS
any
Python version
any
Browser
No response
Browser version
No response
Error logs & Screenshots (if applicable)
No response
Easy workaround is adding your own js_code to the CrawlerRunConfig for example:
js_code="""
document.body.scrollIntoView(false)
const elements = document.querySelectorAll("*");
elements.forEach((elem) => {
const style = window.getComputedStyle(elem);
if ((style.position === "fixed" || style.position === "sticky")) {
elem.remove();
}
});
""",
Confirming it's not working also at my side
RCA
The overlay removal script in crawl4ai/crawl4ai/js_snippet/remove_overlay_elements.js failed to detect and remove scroll-dependent overlays before attempting removal (because the script did not trigger scroll, before removal). Many modern websites only show certain overlays, popups, and banners after the user has scrolled to a specific position on the page.
Solution
Added document.body.scrollIntoView(false) to scroll to the bottom of the page before running the removal logic, triggering any scroll-dependent overlays to appear. Also increased the timeout from 100ms to 250ms to allow these elements to fully render.
With this fix, the script now successfully removes all overlay elements, that the workaround seem to be targeting.
I tested this with following site without this fix the cookie disclaimer text is also scraped and gets into the final markdown, but with the fix, this get's eliminated.
@ederuiter @ziudeso Can you try this with any URLs you faced issue with and let me know if you need to fix anything else.
The updated code is in bug fix branch for March