kopia icon indicating copy to clipboard operation
kopia copied to clipboard

[bug] sftp sync-to resync everything

Open celogeek opened this issue 6 months ago • 1 comments

Hi, I'm trying to sync a repo from my Mac to an sftp remote repository. I'm facing a resync issue, specifically with --times or a previous rsync command.

Here my command:

go run . repo sync-to sftp --host=nas --external --username celogeek --path /media/Backup/test --times

I've decided to see the problem with this patch:

diff --git a/cli/command_repository_sync.go b/cli/command_repository_sync.go
index f0c2690f..b23fd7fd 100644
--- a/cli/command_repository_sync.go
+++ b/cli/command_repository_sync.go
@@ -124,6 +124,8 @@ func (c *commandRepositorySyncTo) runSyncWithStorage(ctx context.Context, src bl
                dstmd, exists := dstMetadata[srcmd.BlobID]
                delete(dstMetadata, srcmd.BlobID)

+               fmt.Printf("Comparing %90s: %s vs %s\n", srcmd.BlobID, srcmd.Timestamp, dstmd.Timestamp)
+
                switch {
                case !exists:
                        blobsToCopy = append(blobsToCopy, srcmd)

Here is what I can see on the output: (it resyncs everything)

Comparing                                                                           kopia.repository: 2025-10-27 16:59:05.295416849 +0100 CET vs 2025-10-27 16:59:05 +0100 CET
Comparing                               xn0_95e843fcdcfd094cb7f07469997d708f-s3b5d7cab784d461313a-c1: 2025-10-27 16:59:51.623798237 +0100 CET vs 2025-10-27 16:59:51 +0100 CET
Comparing                               xn0_b5373a98680cefec67e967a6a3211c51-se2536e3f980029fb13a-c1: 2025-10-27 17:11:24.146733598 +0100 CET vs 2025-10-27 17:11:24 +0100 CET
Comparing                               xn0_c79c1f02e660bad5cddf313b5c2739ef-sf040dae1a990872d13a-c1: 2025-10-27 16:59:05.631298919 +0100 CET vs 2025-10-27 16:59:05 +0100 CET
Comparing                                     q9a94f00d0ed5c7e99fc57dd9daa83f94-sf040dae1a990872d13a: 2025-10-27 16:59:05.628454356 +0100 CET vs 2025-10-27 16:59:05 +0100 CET
Comparing                                     q85a918c6721dc89b2a2d8c1a4052b1ae-se2536e3f980029fb13a: 2025-10-27 17:11:24.14611181 +0100 CET vs 2025-10-27 17:11:24 +0100 CET
Comparing                                     q6683e8f8e6569d16e5ef9abcd220332b-s3b5d7cab784d461313a: 2025-10-27 16:59:51.620867467 +0100 CET vs 2025-10-27 16:59:51 +0100 CET
Comparing                                     p668181f56005204a54c362df5cfbe378-s3b5d7cab784d461313a: 2025-10-27 16:59:51.623056576 +0100 CET vs 2025-10-27 16:59:51 +0100 CET
Comparing          _log_20251027165905_6979_1761580745_1761580745_1_42c2d08295518ed0896acc43de0884b8: 2025-10-27 16:59:05.638993405 +0100 CET vs 2025-10-27 16:59:05 +0100 CET
Comparing          _log_20251027165951_828a_1761580791_1761580791_1_378cfda2c28b4c65fdf334d5856e806d: 2025-10-27 16:59:51.641975939 +0100 CET vs 2025-10-27 16:59:51 +0100 CET
Comparing          _log_20251027171124_292d_1761581484_1761581484_1_3384150ab12268db2d087ff5c070ab6a: 2025-10-27 17:11:24.150307863 +0100 CET vs 2025-10-27 17:11:24 +0100 CET

So the main issue, is that the sftp module only support uint32 modtime, meaning that we don't have access to the sub seconds like on the filesystem. So in order to fix the issue, or we need to find a way to support high resolution timestamp on the sftp module, or we need to compare the same thing, so adding something like:

diff --git a/cli/command_repository_sync.go b/cli/command_repository_sync.go
index f0c2690f..9e695aad 100644
--- a/cli/command_repository_sync.go
+++ b/cli/command_repository_sync.go
@@ -128,7 +128,7 @@ func (c *commandRepositorySyncTo) runSyncWithStorage(ctx context.Context, src bl
                case !exists:
                        blobsToCopy = append(blobsToCopy, srcmd)
                        totalCopyBytes += srcmd.Length
-               case srcmd.Timestamp.After(dstmd.Timestamp) && c.repositorySyncUpdate:
+               case srcmd.Timestamp.Truncate(time.Second).After(dstmd.Timestamp) && c.repositorySyncUpdate:
                        blobsToCopy = append(blobsToCopy, srcmd)
                        totalCopyBytes += srcmd.Length

I'll try to see if I can fix the issue using a sftp mount instead.

Wraping up

So the "--times" lead to blog having the same exact timestamp, but the sftp module only return the modtime truncate to the second. It leads into a resync all in that case. The rsync command have the same issue.

celogeek avatar Oct 27 '25 16:10 celogeek

I ran into the same issue but it doesn't involve SFTP in particular. In my case, I am syncing my repository to a NAS that has a ecryptfs filesystem. Thanks to your troubleshooting, I found the issue was in the file timestamps. I found out (after a lot of testing) that due to bug #1890486 in ecryptfs, timestamps are truncated to one second precision when set.

How to test

SSH into the NAS and...

# Create one test file, wait a moment and create the another one.
my-user@my-nas:/testdir $ touch testfile1.txt
my-user@my-nas:/testdir $ touch testfile2.txt

# Observe what the timestamps are.
my-user@my-nas:/testdir $ stat *
File: testfile1.txt
...
Access: 2025-12-12 21:23:19.278881733 +0200
Modify: 2025-12-12 21:23:19.278881733 +0200
Change: 2025-12-12 21:23:19.278881733 +0200
File: testfile2.txt
...
Access: 2025-12-12 21:23:25.068882228 +0200
Modify: 2025-12-12 21:23:25.068882228 +0200
Change: 2025-12-12 21:23:25.068882228 +0200

# Set the timestamps of file 2 to the timestamps of file 1.
my-user@my-nas:/testdir $ touch -r testfile1.txt testfile2.txt

# Observe that precision beyond seconds was lost.
my-user@my-nas:/testdir $ stat *
File: testfile1.txt
...
Access: 2025-12-12 21:23:19.278881733 +0200
Modify: 2025-12-12 21:23:19.278881733 +0200
Change: 2025-12-12 21:23:19.278881733 +0200
File: testfile2.txt
...
Access: 2025-12-12 21:23:19.000000000 +0200
Modify: 2025-12-12 21:23:19.000000000 +0200
Change: 2025-12-12 21:23:32.974882903 +0200
my-user@my-nas:/testdir $ 

This makes it impossible for me to use Kopia's sync-to command at the moment without always re-syncing everything, regardless of what file transfer protocol I use, since the timestamp is always some milliseconds behind what it's supposed to be.

Not every filesystem even has the same timestamp precision, so syncing repositories between such filesystems will result in this same issue. I think it's necessary to add a threshold to the timestamp check, like you suggested. However, wouldn't it make sense to also truncate the destination timestamp in case the destination filesystem has a more precise timestamp than the source? I might be nitpicking, but I'll mention that truncating to the precision of one second isn't enough to fix this issue for some filesystems like FAT (which uses 2 second precision)...

As an additional resource, this rclone forum post talks about the issue I was facing. It seems rsync has a flag called --modify-window that allows the user to set the timestamp tolerance. Perhaps something similar to that would be a good idea for Kopia too.

Anyway, I'll just build your patch now so I can stop needing to re-sync everything all the time. Thanks!

UserTeemu avatar Dec 12 '25 19:12 UserTeemu