Duplicates deletion error
Bug description
I found the issue with duplicates deletion. When I enabled this feature I started to face such errors sometimes during report uploading:
{"message":"Internal server error, check logs for details"}
In django logs I found the error:
ERROR [dojo.api_v2.exception_handler:32] insert or update on table "dojo_test_import_finding_action" violates foreign key constraint "dojo_test_import_fin_finding_id_28fe8e2d_fk_dojo_find"
DETAIL: Key (finding_id)=(742929) is not present in table "dojo_finding".
django.db.utils.IntegrityError: insert or update on table "dojo_test_import_finding_action" violates foreign key constraint "dojo_test_import_fin_finding_id_28fe8e2d_fk_dojo_find"
As I understand DD tried to find a finding that was deleted as a duplicate.
@valentijnscholten commented on Slack: "The job to delete duplicates runs every minute, so if you have an import that runs longer than that, there may be a race condition that newly imported findings which are duplicates of existing findings are deleted before the import is complete. I can't remember exactly what the transactional model is in django, but probably not everything runs in a transaction so duplicates become visible to the job that deletes duplicates before the import is completed."
Steps to reproduce Steps to reproduce the behavior:
- Create a project.
- Enable Deduplicate findings and Delete duplicates features in settings.
- Import a report with a lot of findings (more than 100).
- Import a few reports with duplicates to the project.
- After some imports you will face an error in logs:
{"message":"Internal server error, check logs for details"}
Expected behavior The report should be imported successfully.
Deployment method (select with an X)
- [ ] Docker Compose
- [
X] Kubernetes - [ ] GoDojo
Environment information
- DefectDojo version 2.9.1
Logs
ERROR [dojo.api_v2.exception_handler:32] insert or update on table "dojo_test_import_finding_action" violates foreign key constraint "dojo_test_import_fin_finding_id_28fe8e2d_fk_dojo_find"
DETAIL: Key (finding_id)=(742929) is not present in table "dojo_finding".
django.db.utils.IntegrityError: insert or update on table "dojo_test_import_finding_action" violates foreign key constraint "dojo_test_import_fin_finding_id_28fe8e2d_fk_dojo_find"
I would really like to activate duplicate deletion but I guess it would be safer to fix this issue first. Is there any investigation on this issue?
we also have the same issue. any plan to fix it soon :-(
@Maffooch
Same for us. We have the same but in 2.19.2 version.
There is a possible solution in the linked Draft Pull Request, it's currently having trouble with the JIRA integration's unit tests. This is holding it back from leaving its draft state. Feel free to take a look - maybe you have a idea for that situation.
Is there any update on this or plan when this will be fixed? I think I have the same issue with newest version :/
Also affects uploading large trivy.json results via pipeline if import takes over 1 minute.
A hot fix of changing the following settings allowed me to upload the 5mb trivy scan import.
DD_ASYNC_FINDING_IMPORT=True
DD_ASYNC_FINDING_IMPORT_CHUNK_SIZE=100
I faced with the same problem. Main reason is race condition. It's already been discussed and @fhoeborn offer pull request to fix it. But it still not implemented. #7444
any updates regarding this?
The PR is quite a big change in transactional behaviour, which might be hard to predict what the impact will be on all the different use cases people are using Defect Dojo imports. A quickfix could be to make the 1 minute setting configurable. Currently the duplicate delete job runs every minute. If you change that to 5 minutes, it will drastically reduce the chance of this race condition occurring for small imports. We could change the default to 5 minutes and/or make it configurable via a System Setting or environment variable.
It would also be an option to create the Test_Import_Finding_Action objects instantly during import instead of after the import process has completed.
@Maffooch Do you think some suppressions like #11549 could help here as well?
The history is created using bulk_create:
https://github.com/DefectDojo/django-DefectDojo/blob/d3b3c261336057f8334c85ce4133540b4f9d4616/dojo/importers/base_importer.py#L360-L391
The with suppress(...) construct will not be helpful here as the error is happening inside Postgres while executing the query. So the query will have failed. There's no way to tell Postgress to ignore the error and continue inserting records for findings that still exist. The ignore_conflicts parameter in Django will send the ON CONFLICT CONTINUE clause to Postgress, but this will only work for duplicate key violations or similar violations. It has no effect on ForeignKey violations like we are running into here.
The only "correct" solution is to indeed go with transactions, but as stated before that impact is or might be quite big so needs further consideration and testing.
For now the safest way would be to create the import history records one-by-one instead of using bulk_create. That way we can catch errors for each record, but continue with the other records. Alternatively we create the import history recorf for each finding immediately after creating (or closing/reopening) the finding itself.
What do you think @Maffooch ?
Great analysis on this one. I think your proposal here would be the safe bet for a baby step approach. If all well goes there, we could further explore the idea of creating finding actions right after the action on the finding is made
For now the safest way would be to create the import history records one-by-one instead of using bulk_create. That way we can catch errors for each record, but continue with the other records. Alternatively we create the import history recorf for each finding immediately after creating (or closing/reopening) the finding itself.
I have created multiple PRs.
Option A: Use try-catch to catch the IntegrityError -> #11739 I tried reproducing the error in a unit test or even running a local instance with some extra code to delete a finding after importing it. But Django has some safeguards in place during bulk_create or even during create/save. So the PR just catches IntegrityError and hopes for the best.During the PR creation I realized that also the code after the import history might be affected where tags are added to findings and endpoint of findings.In theory here the race condition could also happen, so I added the same try-catch here. I am not sure if this is a sustainable solution as we have to keep this in mind for any future further processing at this point. The root cause is that the deduplication for findings is triggered before the history is created, before tags are added, etc. And this PR makes us look we don't know what we are doing with all this try-catch happening.
Option B: Inline the creation of history and tag objects -> #11740
I tried to see what the code would look like if we moved the import history creation before the dedupe is triggered via finding.save(). Although it feels like less clean code because I had to move it around a bit, it at least looks we know what we're doing. I guess this just removes the premature optimization we had in place with the history being created in bulk instead of on the fly.
Option C: Postponse dedupe and JIRA pushing -> #11741 I wanted to see what the code would look like if we would truely to all importing and processing first before doing deduplication and before pushing to JIRA. This PR has those changes. To be honest I didn't test this and it might not even pass the unit tests with the JIRA recording and all. This PR would be the most "fun" option to pursue from a personale/developer perspective, but it might not be the best way to spend my/our time. If we spend time on a bigger refactoring like this, I think it should be introducing transactions and make Dojo work correctly with those.
The PRs are all in draft to get some feedback from other Defect Dojo developers including @Maffooch
I'm a bit torn between Option A and Option B.
Option A was merged into bugfix and should be out in 2.43.3.
@kokhanevych-macpaw @bgoareguer @sameeraksc @MathRig @fhoeborn @quirinziessler @MRagulin
thank you for addressing this