synapsePythonClient icon indicating copy to clipboard operation
synapsePythonClient copied to clipboard

Slow uploads of data with single records

Open esurface opened this issue 4 years ago • 1 comments

As discussed in the RTI/Synapse call:

We are using Synapse tables for the storage and curation of data for a multi-site study. Our data lives in a document data store in JSON files. We process the data and flatten it into a data table structure for upload to Synapse. Most of the documents have many entries which create more than 152 columns. We wrote a python module that splits the data into 152 column sections and uploads the data to Synapse in columns with type STRING and 50 characters in length.

We are processing documents one-at-a-time as they are received in the document store database. Even when only one row is being uploaded, we see long delays in the API call (multiple seconds in most cases). With more than 120,000 to process, our upload strategy became untenable as the processing time reached almost a month.

To reproduce the issue, run the python3 test.py in our synapse-span-table module.

Is there any improvement to the use of the API you suggest that will speed up the process?

We understand that Synapse is many used and optimized for uploading batched records, but have run into issues with that strategy as well (see: Issue 867)

esurface avatar Jun 15 '21 00:06 esurface

What is the average/typical size of a data upload?

We understand that Synapse is many used and optimized for uploading batched records, but have run into issues with that strategy as well (see: Issue 867)

This is a bug in your code.

brucehoff avatar Jun 16 '21 04:06 brucehoff

Closing as this is an older issue, please re-open if still relevant

thomasyu888 avatar Apr 11 '23 07:04 thomasyu888