pyvo DalResults does not appropriately handle 404 (or any HTTP error)

Any calls to DalResult.from_result_url resulting in an error condition (404, 5xx etc) will proceed to trying to parse the response as a votable.

For a user calling a higher-level function like DataLinkResults.from_result_url the resulting error is a parse error deep inside votable and distracts from the actual problem (the service doesn't have what they're looking for).

I've filed an example of this behavior occurring at an LSST data facility here. That bug has an example notebook which is a cut-down version of an LSST tutorial showcasing the bug in action.

The pyvo issue is here. Either from_result_url or _from_result_url should check .status on the HTTPResponse object returned from session.get and raise an appropriate exception.

I'm happy to submit a PR if there is support for the fix suggested above.

Nov 26 '24 18:11 mtauraso

On Tue, Nov 26, 2024 at 10:34:16AM -0800, Michael Tauraso wrote:

Any calls to DalResult.from_result_url resulting in an error condition (404, 5xx etc) will proceed to trying to parse the response as a votable.

I'd say it is reasonable to try and do this for the appropriate media types in order to figure out a DALI error message (in a QUERY_STATUS INFO as per DALI sect. 4.29; regrettably, SCS still has another signalling mechanism, but that should already be implemented somewhere in pyVO).

While early DAL standards said to return 200 even with these error messages, we certainly want to move towards actual HTTP status codes, and thus VOTable payloads for non-200 status codes should still be interpreted and formatted to nice exceptions.

For a user calling a higher-level function like DataLinkResults.from_result_url the resulting error is a parse error deep inside votable and distracts from the actual problem

That is certainly very unwelcome behaviour.

I'm happy to submit a PR if there is support for this fix.

That would be great. I'd say the behaviour should be:

For content-types application/x-votable+xml (I think we don't need to parse the media type, as it's unlikely we'll have parameters here) and text/xml, try to parse as a VOTable. If there is a DALI-type (or SCS-type) error message in there, take its text and make a DAL error out of it.
If VOTable parsing fails or for other content types, fall back to whatever exception requests produces.

Rationale: The publisher-produced message in a DALI-compliant VOTable is almost certain to be more informative than anything we could reasonably come up with ourselves. But if that fails, it's probably better to rely on the large user base of requests that will hopefully come up with useful diagnostics.

Nov 27 '24 07:11 msdemlei

That would be great. I'd say the behaviour should be: * For content-types application/x-votable+xml (I think we don't need to parse the media type, as it's unlikely we'll have parameters here) and text/xml, try to parse as a VOTable. If there is a DALI-type (or SCS-type) error message in there, take its text and make a DAL error out of it. * If VOTable parsing fails or for other content types, fall back to whatever exception requests produces. Rationale: The publisher-produced message in a DALI-compliant VOTable is almost certain to be more informative than anything we could reasonably come up with ourselves. But if that fails, it's probably better to rely on the large user base of requests that will hopefully come up with useful diagnostics.

This sounds excellent. I'm happy to start on this.

I clicked around quickly to see if there were any contribution instructions I should be sure I'm following. I didn't find anything beyond the Astropy CoC. Is there anything in particular I should be aware of when submitting a PR besides keeping CI clean?

Dec 02 '24 18:12 mtauraso

Yes, pyvo specific dev docs are coming. Until then the jist is: Keeping the commit history clean would be appreciated, and rebased when/if you need to refresh the base branch rather than github resolve and add a merge commits. Also, please refrain from running auto reformatting anything or add changes unrelated to the particular fix, if you run into any other bugs or stuff it's preferred to fix them in a separate PR.

And we have two types of tests, locally mocked ones, but also some functionality has tests that reaches out to the remote servers. You can run the latter lot by using the -R pytest option.

Dec 02 '24 18:12 bsipocz