crawl4ai Incorrect scraped content (another page's content is scraped)

I noticed some strange behaviour when I was doing retrieval and it turns out I'm seeing wrong page content for the url provided. I have replicated this a few times and so far it looks like it's triggered when setting magic=True. My sense is simulating user behaviour might be resulting in inadvertently clicking on a link on the page?

Turning this off and enabling the protection methods except for simulate_user=True seems to make it behave as intended, at least as far as I can see. For reference this was happening on Weaviate's documentation page with many links on the nav bar, side bar, main content area, basically links everywhere.

Nov 15 '24 11:11 jtha

@jtha Thx for using our library, let me work on this and see what is going on over there.

Nov 20 '24 07:11 unclecode

@jtha I just tried out this issue with magic mode i.e magic=True and unable to reproduce this issue. Could you try with our latest version 0.6.0 and if the problem still exists, reopen this issue along with a code snippet, so its easier for us to reproduce and root cause the issue.

Thanks again for taking the effort to report this.

May 08 '25 05:05 aravindkarnam