data.gov icon indicating copy to clipboard operation
data.gov copied to clipboard

Block harvest_source_list API endpoint on catalog

Open FuhuXia opened this issue 1 year ago • 1 comments

Endpoint from ckanext-harvest harvest_source_list includes deleted harvest sources in the result. Anonymous user is not supposed to see deleted packages. The API does not support pagination. In order to show catalog's all harvest sources, we have to set a very high limit (2000?) to include all current (active) and deleted (inactlive) sources in one API call, which is very slow.

I think we should block this API endpoint and guide user to use alternative APIs

  1. Call this API to get all harvest sources in paginated results: https://catalog.data.gov/api/action/package_search?fq=(dataset_type:harvest)&fl=id,name,url,organization&rows=1000

  2. Get details on a specific source with this API. You can use either id or name: https://catalog.data.gov/api/action/harvest_source_show?id=energy-json

How to reproduce

https://catalog.data.gov/api/action/harvest_source_list

search active: false in the result

Sketch

We have a list of blocked api endpoint in nginx config:

https://github.com/GSA/catalog.data.gov/blob/8dda50797980f40d6921aa3e299087ddfe31d8c9/proxy/nginx-common.conf#L27-L44

FuhuXia avatar May 01 '24 17:05 FuhuXia

Redirect the call to package search API call.

gujral-rei avatar May 02 '24 20:05 gujral-rei