rustic_core icon indicating copy to clipboard operation
rustic_core copied to clipboard

fix(commands): return error if check fails

Open aawsome opened this issue 1 year ago β€’ 11 comments

closes #222

aawsome avatar May 06 '24 19:05 aawsome

Codecov Report

Attention: Patch coverage is 54.49735% with 86 lines in your changes missing coverage. Please review.

Project coverage is 43.4%. Comparing base (1e7b4a8) to head (5b98b3b). Report is 82 commits behind head on main.

Files with missing lines Patch % Lines
crates/core/src/commands/check.rs 54.9% 82 Missing :warning:
crates/core/src/repository.rs 42.8% 4 Missing :warning:
Additional details and impacted files
Files with missing lines Coverage Ξ”
crates/core/src/error.rs 14.2% <ΓΈ> (ΓΈ)
crates/core/src/repository.rs 34.7% <42.8%> (+0.2%) :arrow_up:
crates/core/src/commands/check.rs 55.8% <54.9%> (-7.8%) :arrow_down:

... and 21 files with indirect coverage changes

codecov[bot] avatar May 06 '24 19:05 codecov[bot]

I'll look through it tomorrow. πŸ‘πŸ½

simonsan avatar May 28 '24 22:05 simonsan

After looking through it, I understand better now, what I found confusing about it, back in May. It's really because it introduces a special way to handle errors only for check (via self), although we want to have a unified way of handling errors. And apply that throughout rustic_core, which makes it easier to maintain, than having a special way to handle check errors.

The CheckError in errors.rs is really nice though, and this is probably the approach, we should take all over the code base. Then we might construct them in-place with all the information we need and create a well-formed error message towards the user from it.

For collecting the errors in a method/function we could use lazy_errors or change the signature to something like either:

  • Result<T, Vec<RusticError>>
  • struct ResultWithErrors<T> {
        value: Option<T>,
        soft_errors: Vec<RusticError>,
        hard_errors: Vec<RusticError>,
    }
    
  • or return a tuple of (T, Vec<RusticError>) from a function, in case we want to handle multiple errors
  • we could also go for making RusticError a trait and implement common behaviour on structs for each possible error struct

But this topic is definitely something we should talk about to find a good approach.

simonsan avatar Oct 05 '24 20:10 simonsan

In principle I like the ides of lazy_errors - I however think that we want to collect not only errors, but also warnings - or, more general, a kind of ErrorLevel together with the error. Also, we have two cases:

  • situations where we want to collect errors, but continue, i.e. we want to return a result and a error list
  • situation where we want to abort on errors, i.e. we want to return only the error list and indicate that things went wrong...

aawsome avatar Oct 06 '24 06:10 aawsome

I sketched out an idea, to fix the pain points you mentioned in a more generic way, than introducing special error handling/collection for check only. Could you take a look at https://github.com/rustic-rs/rustic_core/pull/318 ?

simonsan avatar Oct 06 '24 17:10 simonsan

I think the main question here is: Should we merge this PR with maybe just small modifications (quick&dirty) to fix the check issues and then overwork the error handling once we have a general error handling framework? Or should we wait for the framework? I have a tendency towards a quick&dirty merge here, actually...

aawsome avatar Oct 06 '24 21:10 aawsome

I think the main question here is: Should we merge this PR with maybe just small modifications (quick&dirty) to fix the check issues and then overwork the error handling once we have a general error handling framework? Or should we wait for the framework? I have a tendency towards a quick&dirty merge here, actually...

I would wait, I don't think (no offence really, we are doing a lot here) this should be merged as it is. We are introducing new errors here (CheckError), etc. We should just focus on this error handling topic, after the next rustic release and have it fixed long-term. This would just move it further away, while we feel we 'fixed' something, but we didn't really. We just moved a well-formed fix further away.

Also, the design implies, we implement checking on some ResultCollector, which I find counterintuitive due to the refactor in #317. That should probably still be freestanding functions, so they are decoupled and easier to test.

simonsan avatar Oct 06 '24 21:10 simonsan

Let's rather do it right one time with a bit more effort now, than doing it more than once, just to be quick and dirty. πŸ˜…

simonsan avatar Oct 06 '24 21:10 simonsan

  • situations where we want to collect errors, but continue, i.e. we want to return a result and a error list

For this situation, can you give an example, because this is worth to explore why it is like that. There could be many reasons for it, e.g. architectural problems, prolonged error handling, instead of doing it locally, etc.

When it comes to collecting errors in a thread, we can go for a channel and handle it in the main thread.

simonsan avatar Oct 07 '24 01:10 simonsan

  • situations where we want to collect errors, but continue, i.e. we want to return a result and a error list

For this situation, can you give an example, because this is worth to explore why it is like that. There could be many reasons for it, e.g. architectural problems, prolonged error handling, instead of doing it locally, etc.

An example is Repository::get_snapshots. This loads all Snapshotfiles from the repository and decrypts them into a `Vec<Snapshotfile>'.

There are errors like not being able to list snapshots which will cause this command to fail. But each individual snapshot may fail either to load from backend or to decrypt (e.g. data corruption). Currently, this functions fails if any snapshot fails to load+decrypt. This means if we have a corrupt snapshot file, the repository can be basically no longer used until this is fixed.

We however may want to introduce resiliency: It is e.g. very nasty if the whole repository is not accessible just because that old and now unimportant snapshotfile has an issue. If we introduce this, then the command would return the successful loaded snapshots and a list of failed snapshots with the error(s) - and we would at least need to indicate that something went wrong by not sending a 0 return code in some cases...

aawsome avatar Oct 11 '24 14:10 aawsome

I would imagine it like this somehow (draft):

2024-10-14 22_00_49-snapshotfile rs (Working Tree) (snapshotfile rs) - rustic (Workspace) - Rust Dev

So we have a Vec<RusticError> that we return from SnapshotFile::fill_missing:

2024-10-14 22_04_30-snapshotfile rs (Working Tree) (snapshotfile rs) - rustic (Workspace) - Rust Dev

, and we reserve the Err(E) of RusticResult for hard errors. But I think this is not nice. 😬

But in the end, it's the question of why we want to return a list of the non-working snapshots and not just log an error there. What value has the information for the caller, that there are broken snapshots it cannot be operated on? We just don't want it to fail, right? If we return that Vec<RusticError> to the caller, what else will it do, than just logging it/printing to the user? If we would want to handle the broken snapshots as well, we might rather return something like: RusticResult<Vec<RusticResult<SnapshotFile>>> with outer Result being the operation itself that is fallible and indicates something went horribly wrong. And the inner Results being the Snapshots itself, that the caller wants to handle the Ok(T) values of and do something with the errors.

simonsan avatar Oct 14 '24 20:10 simonsan