reader icon indicating copy to clipboard operation
reader copied to clipboard

Respect robots.txt and identify your system

Open respatialized opened this issue 1 year ago • 0 comments

Recently, some AI companies have given website administrators the option of opting out of AI training by using configuration options in robots.txt.

While this project is for prompting and RAG rather than training, I still think you should provide an option for website users to prevent their websites from becoming ad-hoc databases for or components of AI systems. It seems like you have made your software default to evading detection by using puppeteer's stealth plugin; the user-agent configuration that would allow website owners to identify your project's bots is commented out.

I think this default is deceptive and irresponsible. You should make sure users of your project respect these preferences by incorporating them into the software's defaults. Web administrators may not be inclined to support the additional traffic generated by people using their websites as a component of AI systems.

respatialized avatar Apr 14 '24 14:04 respatialized