Scan through HTTP/HTTPS possible?
As said in my first patch #128, we now use Mirrorbits for GIMP. Now Mirrorbits has some additional requirements from the simpler (ugly) round-robin we used to have, like it needs either read-only rsync or ftp access back to the mirrors in order to scan them (as I understand, used for health and security checks).
So a good part of our mirrors already had this, so it was easy, but a few didn't. We sent emails to admins and we already had a response from one of them which disabled rsync/ftp access completely because of too many hack attempts and massive flood.
Can't the scan be done through the same protocol as the mirror, i.e. HTTPS? I assume that rsync/ftp must provide facilities to make it more efficient, is that it? Still, it should be possible to scan through HTTP(S).
I assume that rsync/ftp must provide facilities to make it more efficient, is that it?
yes, see rsync -r --no-motd rsync://mirror2.sandyriver.net/pub/software/gimp
HTTP doesn't have any way to get directory listings, recursive file listings, or bulk file metadata, so I doubt this is possible.
MirrorBrain supports scans over HTTP. The mirror has to have HTML directory indices. These can either be statically or dynamically generated. Almost all web servers support them. There is also some ongoing effort add JSON output of mod_autoindex of Apache HTTP Server. Perhaps other web servers support this format.
Another option would be to add a manifest/file metadata/directory listing file and validate the file from the primary server against the mirrors. I think this would be the most efficient method as the requests can be parallelized.