Improve support for secure websites
This is not a bug report, but a suggestion for a new feature.
When a user accesses a website using https, images and other files are usually still fetched with http instead of https because the METS/MODS files typically contain URLs using http.
Example: https://digital.slub-dresden.de/werkansicht/dlf/98053/1/0/
Web browsers report for that page something like "This connection is insecure. Parts of this page are not secure." which is correct.
There are several possible solutions to get a 100 % secure connection.
- Use URLs with https in the METS/MODS files. This requires touching all existing METS/MODS files. In addition, those files will then always be fetched using https, even if the user accesses the website with http.
- Use HTTP Strict Transport Security. Then http is no longer possible.
- Modify the URLs read from METS/MODS before using them.
- Use relative URLs in the METS/MODS files. The URLs for all files referred to in a METS/MODS file would be relative to the URL of the METS/MODs file itself.
Solution 3 would need code in Kitodo.Presentation which fixes the URLs, for example by doing a simple string substitution. For the example, replacing http://digital.slub-dresden.de/ by a simple / would be sufficient.
Solution 4 might violate standards and break existing applications like the DFG viewer.
I added a 4th theoretical solution to the list. My personal preference is the 3rd solution.
I'd prefer the 3rd solution, too. But since the images are not always located at the same host as the Kitodo.Presentation installation itself, simply replacing http://digital.slub-dresden.de/ by / wouldn't be sufficient.
Instead there would have to be a check if the images are also available via HTTPS prior to replacing only the scheme part of the URI. This would be applicable for the DFG Viewer, too.
Yes, a check has to be done and this may be tricky if the HTTPS source doesn't respond quickly. We had such issues already with the DFG-Viewer and METS-files not accessible via HTTPS.
If we could trust, a HTTPS access is possible, we could remove the protocoll from the URL. This is done on many places to avoid the described warning. This is a workaround, which works in all current browsers. In IE6,7,8 this let to problems. But this is history ;-) http://stackoverflow.com/questions/4831741/can-i-change-all-my-http-links-to-just
http://digital.slub-dresden.de would get //digital.slub-dresden.de and the browser desides the protocol depending on the page protocol.
Some thoughts on performance:
- It should be possible to disable the HTTPS test in the configuration (either globally or by specifying a URL regexp).
- When a HTTPS access fails, the corresponding site could be stored in a cache, and further failures can then be avoided by looking in that cache before trying the same website without HTTPS support again.