stac-server icon indicating copy to clipboard operation
stac-server copied to clipboard

Search limit greater than arbitrary value returns status code 502

Open klsmith-usgs opened this issue 4 years ago • 6 comments

Exceeding the limit for a query on a collection provides an unhelpful server message, and leaves the user guessing what is wrong. This appears to be related to the overall response size, as different limits can be used with different STAC collections.

Examples using pystac-client and https://earth-search.aws.element84.com/v0

Succeeds:

search = sentinel2.search(collections=['sentinel-s2-l2a-cogs'],
                          bbox=(-120.23822859135915, 35.63894025515473, -118.19087145985415, 37.262086717429455),
                          datetime='2013-01-01/2020-12-31',
                          limit=500)

records = search.get_all_items_as_dict()

Fails:

search = sentinel2.search(collections=['sentinel-s2-l2a-cogs'],
                          bbox=(-120.23822859135915, 35.63894025515473, -118.19087145985415, 37.262086717429455),
                          datetime='2013-01-01/2020-12-31',
                          limit=650)

records = search.get_all_items_as_dict()
APIError: {"message": "Internal server error"}

Different collection

Succeeds:

search = sentinel2.search(collections=['sentinel-s2-l2a'],
                          bbox=(-120.23822859135915, 35.63894025515473, -118.19087145985415, 37.262086717429455),
                          datetime='2013-01-01/2020-12-31',
                          limit=750)

records = search.get_all_items_as_dict()

Fails:

search = sentinel2.search(collections=['sentinel-s2-l2a'],
                          bbox=(-120.23822859135915, 35.63894025515473, -118.19087145985415, 37.262086717429455),
                          datetime='2013-01-01/2020-12-31',
                          limit=800)

records = search.get_all_items_as_dict()
APIError: {"message": "Internal server error"}

klsmith-usgs avatar Nov 18 '21 20:11 klsmith-usgs

Status update on this ticket:

  • I looked in the CloudWatch logs for the lambda, and there are no error logs that indicate this is happening. Todo: look closer at other logs
  • One important details is that in the first example above, the server failure does not occur until the 3rd page with a limit of 650. It could be that the first two pages are just under some unknown size limit that's triggering this. Or, it could be that there's an item in the 3rd page that's inordinately larger than the others, and retrieving a page with that one in it causes things to blow up.
  • Next step is to query the api directly with requests to allow control over the exact page, etc.

philvarner avatar Feb 23 '22 16:02 philvarner

More updates:

  • The error is coming from the API Gateway and (confusingly) is a 502 (Bad Gateway) but the body is {"message": "Internal server error"} 🙄

Running with a page size of 325, this is the size of each page:

page status size (b) sum of last 2 pages page for limit 650
1: 200 3176323    
2: 200 2397581 5573904 1
3: 200 2412505    
4: 200 2417225 4829730 2
5: 200 2816015    
6: 200 3297559 6113574 3
7: 200 3203616    
8: 200 3201786 6405402 4
9: 200 3219906    
10: 200 2102250 5322156 5

Apparently, there is a hard limit in AWS of only returning 6MB from a Lambda, as used here behind API Gateway.

philvarner avatar Feb 23 '22 20:02 philvarner

I believe the right approach here is that if the response body is going to be > 6MB, we return a 400 with

{
   "code": "0001"
    "description": "The response body that resulted from this query was too large to be returned by API Gateway. Try a smaller limit."
}

philvarner avatar Feb 23 '22 20:02 philvarner

#193 should hopefully fix this

marchuffnagle avatar Feb 23 '22 21:02 marchuffnagle

I think it's going to make it better, but I think it will still fail with a limit of 10000 (and a query that has at least that many results) -- 10k is the upper limit from the OGC API - Features Part 1 spec.

philvarner avatar Feb 23 '22 22:02 philvarner

Bumping this out to 0.5.0. Supporting gzipping the responses will help increase the limit, but the 6MB response limit is in the Lambda, so there's no good or easy way to get around that. We should probably document that this can happen and that the workaround is to decrease the limit.

philvarner avatar Feb 24 '22 20:02 philvarner