ENH: SIA should raise exception for non-existing collection
[Edit:] This has started as an astroquery issue, but looking more into it, the exception should be raised already on the pyvo level, so I transferred the issue.
>>> import pyvo as vo
>>> sia2 = vo.dal.sia2.SIA2Service('https://irsa.ipac.caltech.edu/SIA')
>>> sia2.search((0,0,10), collection='foobar')
<DALResultsTable length=0>
s_ra s_dec facility_name instrument_name dataproduct_subtype ... pol_xel cloud_access o_ucd upload_row_id
deg deg ...
float64 float64 object object object ... int64 object object int64
------- ------- ------------- --------------- ------------------- ... ------- ------------ ------ -------------
============
Currently it's possible to query a collection that doesn't exist on the server and receive an empty result table. I would think a better UX would be to receive an exception in these cases.
E.g. currently this is what I get:
In [49]: Irsa.query_sia(pos=(coord, 10), collection='foobar')
Out[49]:
<DALResultsTable length=0>
s_ra s_dec facility_name instrument_name dataproduct_subtype ... pol_xel cloud_access o_ucd upload_row_id
deg deg ...
float64 float64 object object object ... int64 object object int64
------- ------- ------------- --------------- ------------------- ... ------- ------------ ------ -------------
This is actually an upstream issue, as we should get the exception on the pyvo level already.
On Thu, Feb 13, 2025 at 11:10:18AM -0800, Brigitta Sipőcz wrote:
bsipocz created an issue (astropy/pyvo#651)
Currently it's possible to query a collection that doesn't exist on the server and receive an empty result table. I would think a better UX would be to receive an exception in these cases.
Frankly, I don't think this is possible with reasonable effort, because outside of special cases you wouldn't know if the empty result set is because the collection does not exist or because there is just nothing matching the other constraints within that collection.
Somewhat relatedly, I have been arguing for having advanced column metadata in the tableset http://ivoa.net/documents/Notes/colstatnote/index.html, which eventually would include the possible values of enumerated columns (like obs_collection); DaCHS actually already implements that. Unfortunately, these efforts have not gained much traction yet.
If in this way we had a way to discover the legal values of obs_collection, my objection "impossible without reasonable effort" would not stand. But even then I suspect by outright rejecting queries that, judging by the column metadata, would return nothing, we'd cause more harm than benefit. We might have a "preflight" mode, though, where the query methods first check against the ranges and don't bother running the query if parameters are out of range. This might be clever in, for instance, global dataset discovery. But we're years away from having enough buy-in to better column metadata, so I don't think we need to plan for this now.
Thank you Markus. Yes, I got to the same conclusion that this would be very hard even if possible on the pyvo level. I know that e.g. at IRSA we have a metadata query response for the special case of maxrec=0, but that's not universal behaviour from the services, and we cannot even handle the IRSA case (https://github.com/astropy/pyvo/issues/519).
So I may just reopen this issue in astroquery where I could do the sanity check for the collection parameter using a cached list of possible collection names.
On Mon, Feb 17, 2025 at 07:50:02PM -0800, Brigitta Sipőcz wrote:
at IRSA we have a metadata query response for the special case of
maxrec=0, but that's not universal behaviour from the services, and we cannot even handle the IRSA case
Metadata on MAXREC=0 is standard (DALI) behaviour, but that does not include column statistics, so that won't help you here.
So I may just reopen this issue in astroquery where I could do the sanity check for the collection parameter using a cached list of possible collection names.
Frankly, if we think this is worthwhile behaviour (and while I'm obviously a big fan of column statistics, I'm not sure about that particular use case), let's push for an interoperable solution. Having more columns statistics would be a great thing for so many other purposes, too. And they're not horribly complex to do either.
An interoperable solution would be indeed best!
I have a related issue and found this one searching before I submit a new one. Starting from the beginning here:
>>> sia2 = vo.dal.sia2.SIA2Service('https://irsa.ipac.caltech.edu/SIA')
>>> sia2.search((0,0,10), collection='foobar')
What is the workflow to discover how to do this? The base URL would be in the Registry. But the value the client should specify for collection?
This relates to the RegTAP doc section 10.12 and the endorsed note about data collections. Section 2.1.4 of that note states:
the COLLECTION constraint can be hardcoded into the access URLs
and shows a metadata example with
<accessURL>
http://example.com/sia.xml?COLLECTION=southerntel&
</accessURL>
Which brings me to my issue: we do exactly that and it does not work with pyvo.dal.SIA2Service.
sia = vo.dal.SIA2Service("https://heasarcdev.gsfc.nasa.gov/xamin/vo/sia2?table=swiftbalog&",session=session)
result = sia.search(pos=(0.0,0.0,.001))
gets you an error. The reason is that the query PyVO sends to the server has stripped off everything after the question mark in the base URL before it adds the user-specified constraints. (At least, that's the end result; the URL sent ends "sia2".) It then adds the positional constraint and passes it to our server which naturally doesn't know which data is being asked for.
I think this is a PyVO bug that needs another issue. @msdemlei ?
On Thu, May 08, 2025 at 06:56:58AM -0700, trjaffe wrote:
I have a related issue and found this one searching before I submit a new one. Starting from the beginning here:
>>> sia2 = vo.dal.sia2.SIA2Service('https://irsa.ipac.caltech.edu/SIA') >>> sia2.search((0,0,10), collection='foobar')What is the workflow to discover how to do this? The base URL would be in the Registry. But the value the client should specify for collection?
For the access URL: The scenario for URL literals in pyVO programs is that people found the URL somewhere that's not the Registry and paste it into some code they've found somewhere or that they've been cobbling together for some other service. When people discover services in the Registry, they'd be calling get_service() on the registry result and things magically work.
The collections... sigh. Discovering column metadata (in this case: "which collections are there?") has been on my mind for a long time, and I welcome everyone that'll join me in my struggle to improve the situation. Obs_collection would be what https://ivoa.net/documents/Notes/colstatnote/20210429/NOTE-colstatnote-1.0-20210429.html#tth_sEc3.2 calls "discrete variables". I have a proposal for that, but that's all waiting for other data centres to join in.
The sad truth is: just with SIA2, there is at the moment no machine-readable way to figure out collections names. One may hope that people write information like this into their service descriptions, but I don't think many do.
Realistically, you'd probably go to an associated obscore service and try
select distinct obs_collection from ivoa.obscore
Which brings me to my issue: we do exactly that and it does not work with pyvo.dal.SIA2Service.
sia = vo.dal.SIA2Service("https://heasarcdev.gsfc.nasa.gov/xamin/vo/sia2?table=swiftbalog&",session=session) result = sia.search(pos=(0.0,0.0,.001))gets you an error. The reason is that the query PyVO sends to the server has stripped off everything after the question mark in the base URL before it adds the user-specified constraints. (At least, that's the end result; the URL sent ends "sia2".) It then adds the positional constraint and passes it to our server which naturally doesn't know which data is being asked for.
I think this is a PyVO bug that needs another issue. @msdemlei ?
heasarcdev isn't reachable from here (at the moment), but I think what's happening is that SIA2Service finds the capabilities (which is close to a miracle) and then uses the URL that's given there, which presumably does not include the extra parameter.
Now, it's not my code, and I certainly wish we'd not look at the capabilities in such a situation by default and just accept the access URL as-is (the capabilities magic here is for auth). But that's beside the deeper point.
While it's fine to file a bug against pyVO here, the underlying problem is a missing piece in the definition of auxiliary capabilities. What I think is the right way to deal with this situation::
swiftbalog should have an auxiliary sia2 capability in its resource record as per http://ivoa.net/documents/discovercollections/20190520/index.html.
And that auxiliary capability needs to give the collection name (that's different from the column metadata case above: there, you want all values; here, you want the specific value applying to that particular data collection).
This, regrettably, requires a Registry extension, presumably one that allows one to attach key-value-pairs to the capability. This one (proposed key: obs-collection) would be useful, too, for TAP auxiliary capabilities for obscore-published collections.
I'm happy to collaborate on this; I don't think it's terribly much work.