aleph BUG: Error uploading big files

Describe the bug I deployed a productive instance of aleph. Specs: Ubuntu Server 22.04, 2 TB Space, 128 GB RAM and 8 Core CPU. I installed the latest release: 4.0.2. I am able to upload files up to 15 GB via UI and alephclient. Files which are bigger then 15 GB are not finishing the upload phase. Files which are bigger then 50 GB are leading to an never ending upload loop. No error at all.

To Reproduce Steps to reproduce the behavior:

Go to Investigation
Click on Upload Document
Upload a file with 25 GB of file size
See error

Expected behavior Show a reason why the upload went wrong. Show a hint before uploading "file is to big" or somethin like that. Show a hint what to do as an admin to enable the instance to handle such files.

Aleph version 4.0.2

Additional context I checked the documentation, but found nothing helpfull. Is there any chance to change a environment variable, or a setting to enable my aleph instace to handle such big files?

Dec 20 '24 09:12 rs-develop

How does your deployment look like? Is there a reverse proxy with e.g. a timeout in front of it? This looks more like a "bug" in the specific deployment setup, not necessarily in the app itself.

Another question would be, what kind of file is this? If a compressed archive, I'd suggest extracting it first before uploading it anyways.

Dec 20 '24 14:12 simonwoerpel

No, it is a local network, offline deployment. No reverse proxy, no firewall or other systems between. The files are uncompressed in CSV format. Even if the upload succeeds, and aleph is processing, after it is finished no files/data shows up. If I upload e.g. 500 lines of the big CSV files, the data do show up in the ui.

Can I somehow debug this?

Dec 20 '24 14:12 rs-develop

I am having a similar issue and had to flush the queue to get anything to process. It was like things were stuck. I'm wondering if there is a limit somewhere. Also cannot generate entities from CSV files that are large either. I'm experimenting to see if I can get a smaller file of the same format to actually show up and allow entity generation. We know large files can be supported if you look at the OCCRP instance as they have some pretty large CSV files that work just fine. what we don't know or have is any information on how to tune these instances to process larger files more efficiently. I have upped the instance size many times with no change and resources are not pegging so there has to be some limits somewhere else in the platform or the components that could be tweaked for performance.

Tired of seeing this screen where it never populates especitally since that's where the value of this platform lies in the entity extraction. The columns never load and it's not really all that big of a file.

Jan 19 '25 02:01 jigsawsecurity

Greetings. While uploading a big document (2 GB CSV), the ingest stage actually crashes because of the following error:

 {"logger": "pika.channel", "timestamp": "2025-03-16 23:10:54.338531", "message": "Received remote Channel.Close (406): 'PRECONDITION_FAILED - delivery acknowledgement on channel 1 timed out. Timeout value used: 1800000 ms. This timeout value can be configured, see consumers doc guide to learn more' on <Channel number=1 OPEN conn=<SelectConnection OPEN transport=<pika.adapters.utils.io_services_utils._AsyncPlaintextTransport object at 0x7751321c2280> params=<ConnectionParameters host=rabbitmq port=5672 virtual_host=/ ssl=False>>>", "severity": "WARNING"}
ingest-file-1    | Traceback (most recent call last):
ingest-file-1    |   File "/usr/local/bin/ingestors", line 8, in <module>
ingest-file-1    |     sys.exit(cli())
ingest-file-1    |   File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 1157, in __call__
ingest-file-1    |     return self.main(*args, **kwargs)
ingest-file-1    |   File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 1078, in main
ingest-file-1    |     rv = self.invoke(ctx)
ingest-file-1    |   File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 1688, in invoke
ingest-file-1    |     return _process_result(sub_ctx.command.invoke(sub_ctx))
ingest-file-1    |   File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 1434, in invoke
ingest-file-1    |     return ctx.invoke(self.callback, **ctx.params)
ingest-file-1    |   File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 783, in invoke
ingest-file-1    |     return __callback(*args, **kwargs)
ingest-file-1    |   File "/ingestors/ingestors/cli.py", line 37, in process
ingest-file-1    |     code = worker.run()
ingest-file-1    |   File "/usr/local/lib/python3.8/dist-packages/servicelayer/taskqueue.py", line 701, in run
ingest-file-1    |     channel.start_consuming()
ingest-file-1    |   File "/usr/local/lib/python3.8/dist-packages/pika/adapters/blocking_connection.py", line 1883, in start_consuming
ingest-file-1    |     self._process_data_events(time_limit=None)
ingest-file-1    |   File "/usr/local/lib/python3.8/dist-packages/pika/adapters/blocking_connection.py", line 2049, in _process_data_events
ingest-file-1    |     raise self._closing_reason  # pylint: disable=E0702
ingest-file-1    | pika.exceptions.ChannelClosedByBroker: (406, 'PRECONDITION_FAILED - delivery acknowledgement on channel 1 timed out. Timeout value used: 1800000 ms. This timeout value can be configured, see consumers doc guide to learn more')

In fact, when I upload the document and check the RabbitMQ queue in the management UI, I can actually confirm that there is a unacked message in the queue.

With that being said, the error I am getting might be related to the error mentioned in this issue. So there is actually two solutions:

Handle this error in the code
Increase the delivery ack timeout to 60 minutes?

Mar 16 '25 23:03 filipedfr