usaspending-api icon indicating copy to clipboard operation
usaspending-api copied to clipboard

Download awards endpoint fails when specifying columns

Open nhinze opened this issue 6 years ago • 15 comments

The query fails, when I specify columns to download with the:

/api/v2/download/awards

endpoint. It works fine when I set:

"columns": []

Full JSON options:

{"columns":["obligated_amount","product_or_service_code","product_or_service_code_description","naics_code","naics_description"],"filters":{"keywords":["transportation"," aviation"],"time_period":[{"start_date":"2018-06-30","end_date":"2019-06-30"}]}}

I do not get any error message. Just 500 Internal Server Error.

nhinze avatar Jun 30 '19 16:06 nhinze

Can I ask which tool you're using to run this query or are you using the default GUI that comes up when you visit the URL?

kbard avatar Jul 01 '19 13:07 kbard

The online GUI and my code give me the same error. I use httpparty gem on RubyOnRails to access the API.

nhinze avatar Jul 01 '19 13:07 nhinze

We are currently experiencing an issue with the online GUI. If you're using an external tool, be sure to set the Content-Type header to application/json.

kbard avatar Jul 01 '19 13:07 kbard

I did set it to JSON. The following works just fine:

{"columns":[],"filters":{"keywords":["transportation"," aviation"],"time_period":[{"start_date":"2018-06-30","end_date":"2019-06-30"}]}}

I'm trying to reduce the size of the download.

nhinze avatar Jul 01 '19 13:07 nhinze

Have you tried your original query recently? It seems to be working for me this morning.

kbard avatar Jul 01 '19 13:07 kbard

Still not working. The status is:

running running running running running running running failed

However, I don't get anything in the error message telling me what's wrong. I'm going to try the same query in postman.

nhinze avatar Jul 01 '19 13:07 nhinze

Please (re)post the exact query and URL you are using. I will try to reproduce again.

kbard avatar Jul 01 '19 13:07 kbard

The file should be:

https://files.usaspending.gov/generated_downloads/all_prime_awards_subawards_20190701135515927430.zip

nhinze avatar Jul 01 '19 13:07 nhinze

The options in JSON:

{"columns":["obligated_amount","product_or_service_code","product_or_service_code_description","naics_code","naics_description"],"filters":{"keywords":["transportation","aviation"],"time_period":[{"start_date":"2018-07-01","end_date":"2019-07-01"}]}}

The response of the call to: 'https://api.usaspending.gov/api/v2/download/awards'

{"status":"ready","total_columns":null,"total_rows":null,"seconds_elapsed":null,"url":"https://files.usaspending.gov/generated_downloads/all_prime_awards_subawards_20190701135642182405.zip","total_size":null,"file_name":"all_prime_awards_subawards_20190701135642182405.zip","message":null} https://files.usaspending.gov/generated_downloads/all_prime_awards_subawards_20190701135642182405.zip

The call to the API:

self.class.post('/download/awards', :body => options.to_json, :headers => { 'Content-Type' => 'application/json' })

nhinze avatar Jul 01 '19 13:07 nhinze

When I query https://api.usaspending.gov/api/v2/download/awards with

{
    "columns": [
        "obligated_amount",
        "product_or_service_code",
        "product_or_service_code_description",
        "naics_code",
        "naics_description"
    ],
    "filters": {
        "keywords": [
            "transportation",
            "aviation"
        ],
        "time_period": [
            {
                "start_date": "2018-07-01",
                "end_date": "2019-07-01"
            }
        ]
    }
}

in Postman, I get

{
    "url": "https://files.usaspending.gov/generated_downloads/all_prime_awards_subawards_20190701140105877504.zip",
    "file_name": "all_prime_awards_subawards_20190701140105877504.zip",
    "total_columns": null,
    "message": null,
    "status": "ready",
    "total_rows": null,
    "seconds_elapsed": null,
    "total_size": null
}

Have you had a chance to try it in Postman yet so we can eliminate other issues?

kbard avatar Jul 01 '19 14:07 kbard

It get the same, what does the status give you? If I try to download the file of your query, it is not working:

https://files.usaspending.gov/generated_downloads/all_prime_awards_subawards_20190701140105877504.zip

nhinze avatar Jul 01 '19 14:07 nhinze

Trying the status of you query, I get the more detailed error message:

https://api.usaspending.gov/api/v2/download/status/?file_name=all_prime_awards_subawards_20190701140105877504.zip

{
    "url": "https://files.usaspending.gov/generated_downloads/all_prime_awards_subawards_20190701140105877504.zip",
    "file_name": "all_prime_awards_subawards_20190701140105877504.zip",
    "total_columns": 5,
    "message": "An exception was raised while attempting to process the DownloadJob:\nTraceback (most recent call last):\n  File \"/data-act/backend/usaspending_api/download/filestreaming/csv_generation.py\", line 58, in generate_csvs\n    parse_source(source, columns, download_job, working_dir, piid, zip_file_path, limit)\n  File \"/data-act/backend/usaspending_api/download/filestreaming/csv_generation.py\", line 165, in parse_source\n    temp_file, temp_file_path = generate_temp_query_file(source_query, limit, source, download_job, columns)\n  File \"/data-act/backend/usaspending_api/download/filestreaming/csv_generation.py\", line 290, in generate_temp_query_file\n    csv_query_annotated = apply_annotations_to_sql(generate_raw_quoted_query(source_query), source.columns(columns))\n  File \"/data-act/backend/usaspending_api/download/filestreaming/csv_generation.py\", line 337, in apply_annotations_to_sql\n    raise Exception(\"Length of alises doesn't match the columns in selects\")\nException: Length of alises doesn't match the columns in selects\n",
    "status": "failed",
    "total_rows": 296369,
    "seconds_elapsed": "11.680957",
    "total_size": null
}

nhinze avatar Jul 01 '19 14:07 nhinze

Ok. You are correct. There does, indeed, seem to be an issue with downloads. With the information you provided I was able to nail down exactly where things are going awry. Thank you very much for your help.

I will discuss this issue with the project lead and post back here when I know something.

kbard avatar Jul 01 '19 14:07 kbard

I have created two bug tickets that have been added to the prioritized bug list. I do not have an ETA.

https://federal-spending-transparency.atlassian.net/browse/DEV-2998 https://federal-spending-transparency.atlassian.net/browse/DEV-2999

Thank you again!

kbard avatar Jul 01 '19 19:07 kbard

This should now be resolved

ross-williford avatar Oct 24 '19 23:10 ross-williford