Incorrect scraped content (another page's content is scraped)
I noticed some strange behaviour when I was doing retrieval and it turns out I'm seeing wrong page content for the url provided. I have replicated this a few times and so far it looks like it's triggered when setting magic=True. My sense is simulating user behaviour might be resulting in inadvertently clicking on a link on the page?
Turning this off and enabling the protection methods except for simulate_user=True seems to make it behave as intended, at least as far as I can see. For reference this was happening on Weaviate's documentation page with many links on the nav bar, side bar, main content area, basically links everywhere.
@jtha Thx for using our library, let me work on this and see what is going on over there.
@jtha I just tried out this issue with magic mode i.e magic=True and unable to reproduce this issue. Could you try with our latest version 0.6.0 and if the problem still exists, reopen this issue along with a code snippet, so its easier for us to reproduce and root cause the issue.
Thanks again for taking the effort to report this.