conserve icon indicating copy to clipboard operation
conserve copied to clipboard

Optimize look-write-rename for writing blocks

Open sourcefrog opened this issue 3 years ago • 1 comments

Yep, it does currently

  1. See if the block is already present, in which case we don't need to do the work to compress it.
  2. Write to a temporary file, so that if the process is interrupted we won't be left with an incomplete file under the final name. (This is not 100% guaranteed by the filesystem, but it's the usual pattern.)
  3. Rename into place.

This is pretty reasonable (although perhaps not optimal) locally but not good if the filesystem is very high latency.

A few options:

  1. Just issue more parallel IO.
  2. Remember which blocks are referenced by the basis index: we can already assume they're present and should not need to check. (The most common case, of an unchanged file, does not check, but there might be other edge cases. This should be pretty rare.)
  3. Similarly, remember blocks that we've already seen are present. (#106)
  4. If we have a Transport API for the remote filesystem, then in some cases that may already support a reliable atomic write that cannot leave the file half-written. For example this should be possible on S3. Then we don't need the rename.
  5. Even on Unix or Windows maybe a faster atomic write is possible?

Originally posted by @sourcefrog in https://github.com/sourcefrog/conserve/issues/177#issuecomment-1214227442

sourcefrog avatar Aug 13 '22 23:08 sourcefrog

The rename happens inside the Transport, in https://github.com/sourcefrog/conserve/blob/57df38bc94e9c47102dbf8a3e0880e6257f72a3d/src/transport/local.rs#L89-L105

sourcefrog avatar Aug 15 '22 16:08 sourcefrog