B2_Command_Line_Tool icon indicating copy to clipboard operation
B2_Command_Line_Tool copied to clipboard

Cannot seem to figure out --excludeRegex

Open PeterNerlich opened this issue 8 years ago • 20 comments

I cannot seem to figure out how --excludeRegex works. I read all examples I could find, but they did not prove to be very helpful.

Variants tried (with b2 sync --excludeRegex <regex> /home/USER/Documents b2://...):

'^/home/USER/Documents/programming/Asterisk/test/'
'^/home/USER/Documents/programming/Asterisk/test/.*'
'(^/home/USER/Documents/programming/Asterisk/test/.*)'
'(^\/home\/USER\/Documents\/programming\/Asterisk\/test\/.*)'
'(\/home\/USER\/Documents\/programming\/Asterisk\/test\/.*)'
'(\/programming\/Asterisk\/test\/.*)'
'(programming\/Asterisk\/test\/.*)'

I might have tried more, but I cannot remember anymore. I also noticed --excludeRegex has to be before mentioning sync source and target for the command to do anything at all, which seems very arbitrary and frustration–inducing. After fixing that I still see the sync messages though:

upload programming/Asterisk/test/asterisk/addons/ooh323c/src/ootrace.h     
upload programming/Asterisk/test/asterisk/addons/ooh323c/src/perutil.c
upload programming/Asterisk/test/asterisk/addons/ooh323c/src/rtctype.h
upload programming/Asterisk/test/asterisk/addons/ooh323cDriver.h    
upload programming/Asterisk/test/asterisk/addons/ooh323c/src/rtctype.c
...

What am I overlooking?

PeterNerlich avatar Dec 27 '17 00:12 PeterNerlich

OS:

$ uname -a
Linux peter-elementaryOS-tux 4.9.18-040918-generic #201703260832 SMP Sun Mar 26 12:34:37 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

$ lsb_release -a
Distributor ID:	elementary
Description:	elementary OS 0.4.1 Loki
Release:	0.4.1
Codename:	loki

B2 version: (installed via pip install b2)

$ b2 version
b2 command line tool, version 1.1.0

PeterNerlich avatar Dec 27 '17 00:12 PeterNerlich

The regex needs to match the relative path names from the source of the sync, which are the path names it prints while syncing.

You are syncing from /home/USER/Documents. So instead of '^/home/USER/Documents/programming/Asterisk/test/.*', try the regex '^programming/Asterisk/test/.*'.

bwbeach avatar Dec 27 '17 13:12 bwbeach

Thanks for the suggestion, but it doesn't work either.

P$ b2 sync --excludeRegex '^programming/Asterisk/test/.*' /home/USER/Documents b2://...
...
WARNING: /home/USER/Documents/programming/Asterisk/test/root could not be accessed (no permissions to read?)
WARNING: /home/USER/Documents/programming/Asterisk/test/dev/ram16 could not be accessed (no permissions to read?)
WARNING: /home/USER/Documents/programming/Asterisk/test/dev/ram6 could not be accessed (no permissions to read?)
...

...and the rest of this endless annoyance I'm trying to avoid (that directory was a chroot or so to build Asterisk)


Also, I notice some weird

....
WARNING: /home/peter/Documents/programming/Asterisk/test/usr/bin/X11/X11/X11/X11/X11/X11/X11/X11/X11/X11/X11/X11/X11/X11/X11/X11/X11/X11/X11/X11/X11/X11/X11/X11/X11/X11/X11/X11/X11/X11/X11/X11/X11/X11/X11/X11/X11/X11/X11/X11/x86_64-linux-gnu-gcc-ar could not be accessed (broken symlink?)
WARNING: /home/peter/Documents/programming/Asterisk/test/usr/bin/X11/X11/X11/X11/X11/X11/X11/X11/X11/X11/X11/X11/X11/X11/X11/X11/X11/X11/X11/X11/X11/X11/X11/X11/X11/X11/X11/X11/X11/X11/X11/X11/X11/X11/X11/X11/X11/X11/X11/X11/editor could not be accessed (broken symlink?)
WARNING: /home/peter/Documents/programming/Asterisk/test/usr/bin/X11/X11/X11/X11/X11/X11/X11/X11/X11/X11/X11/X11/X11/X11/X11/X11/X11/X11/X11/X11/X11/X11/X11/X11/X11/X11/X11/X11/X11/X11/X11/X11/X11/X11/X11/X11/X11/X11/X11/X11/python2 could not be accessed (broken symlink?)
...

logs, as if it wouldn't be able to handle symlinks (especially recursive ones) properly. Is that expected?

PeterNerlich avatar Dec 27 '17 15:12 PeterNerlich

Are you sure that is does indeed upload the excluded files? Those warnings can still occur if you excluded the directory, because all files are indexed first and only then files matching the regex are excluded. This is a known issue (see #364).

As for the symlinks, the b2 CLI currently does follow symlinks which is problematic for recursive symlinks. In issue #390 an option to change symlink behavior was requested.

svonohr avatar Dec 27 '17 16:12 svonohr

True, if there is no major delay in displaying it in the web interface, it seems to have worked this time around. Then I guess the \/ is taken literally and that didn't match? Otherwise mine were pretty much the same, right?

PeterNerlich avatar Dec 27 '17 16:12 PeterNerlich

It's doing it again, and I have no idea why. I have the script print out the command before executing it, so there shouldn't be any clumsy misread on my part (a theory on how it seemingly worked before, a delay in the server side file cache → uploaded files did not show yet in web interface or did still exist in api cache and thus didn't need to be uploaded)

b2 sync --excludeRegex '^programming/Asterisk/test/.*' --skipNewer /home/<USER>/Documents b2://<...>

screenshot from 2017-12-28 14-15-38

PeterNerlich avatar Dec 28 '17 13:12 PeterNerlich

I'm puzzled. This test worked for me, with the excludeRegex copy/pasted from your post:

$ mkdir -p programming/Asterisk/test
$ touch upload.txt
$ touch programming/Asterisk/test/skip.txt
$ pwd
/Users/brianb/sandbox/B2_Command_Line_Tool/tmp
$ b2 sync --excludeRegex '^programming/Asterisk/test/.*' /Users/brianb/sandbox/B2_Command_Line_Tool/tmp b2://bwb-ca001/sync-alfa
upload upload.txt                                                  
$ find . -type f                    
./programming/Asterisk/test/skip.txt
./upload.txt
$ b2 ls bwb-ca001 sync-alfa
sync-alfa/upload.txt

Is there any chance that any of the characters in the path are different Unicode characters than what they look like?

bwbeach avatar Dec 28 '17 20:12 bwbeach

Having a similar issue:

b2 sync --threads 3 --excludeRegex '^/var/www/app/storage/.*' /var/www/ b2://something/ Using https://api.backblazeb2.com WARNING: /var/www/app/storage/sessions/610b13c8eb318f756a3abd3958fec347d6137b92 could not be accessed (broken symlink?) WARNING: /var/www/app/storage/sessions/811134bfebd8a6e6ea9b128101021797a82adc69 could not be accessed (broken symlink?) WARNING: /var/www/app/storage/sessions/e59fc3738c54829b6340442ee36fe1f270d4c834 could not be accessed (broken symlink?) WARNING: /var/www/app/storage/sessions/16c05d9f9889fd52ce92b3ea7e91847f4bd3f056 could not be accessed (broken symlink?) WARNING: /var/www/app/storage/sessions/d9db04a2a690fb50a204e2514b587b54c34e9cb6 could not be accessed (broken symlink?) WARNING: /var/www/app/storage/sessions/80b4fa63350d7bc6d26771250f1b5cf2e41a2a8c could not be accessed (broken symlink?) WARNING: /var/www/app/storage/sessions/d738b8185c0f8a5cb6c976b4a4b272566df2a8ef could not be accessed (broken symlink?) WARNING: /var/www/app/storage/sessions/bfbdd58993437e23a678181d900dbbc583eafca8 could not be accessed (broken symlink?) WARNING: /var/www/app/storage/sessions/f140184124657bc7c70bf27f0bc117bad07a6e5a could not be accessed (broken symlink?) WARNING: /var/www/app/storage/sessions/9b880cc3e0cc1e05863dc203d062bedc6a4b0e60 could not be accessed (broken symlink?

The documentation says it's tested against full path, but also tried:

  • '^app/storage/.*'
  • 'app/storage/.*'
  • '[a-zA-Z0-9]{40}'

Nothing seems to work.

This is particularly bothersome since it stops the sync since sessions folder is filled with thousands of files that are erased and created, so sometimes it says the file doesn't exist and it just breaks.

shishanyu avatar Feb 18 '18 10:02 shishanyu

[...] could not be accessed (broken symlink?)

I think that's because it first builds the whole file tree and then filters out results though. So it tries to follow the symlinks before even thinking about excluding any path/file

I other words, it's probably harmless™

PeterNerlich avatar Feb 18 '18 10:02 PeterNerlich

I think removing the single quotes did it. Just ran:

app/storage/sessions/.*

shishanyu avatar Feb 18 '18 10:02 shishanyu

Ran:

b2 sync --threads 3 --excludeRegex app/storage/sessions/.* /var/www/ b2://something/

And it's executing without issue and omitting the folder I wanted it to omit, probably it should be added to the help page (?) to avoid confusion. An example of how to format the Regex would be nice, or where to check documentation (ex. Python).

Edit: also maybe, the code should remove unescaped single quotes or double quotes or something? since it doesn't seem to have been able to handle them well, gave me:

ERROR:b2.console_tool:ConsoleTool unexpected exception Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/b2/console_tool.py", line 1074, in run_command return command.run(args) File "/usr/lib/python2.7/site-packages/b2/console_tool.py", line 857, in run allow_empty_source=allow_empty_source File "/usr/lib/python2.7/site-packages/logfury/v0_1/trace_call.py", line 84, in wrapper return function(*wrapee_args, **wrapee_kwargs) File "/usr/lib/python2.7/site-packages/b2/sync/sync.py", line 261, in sync_folders source_folder, dest_folder, args, now_millis, reporter File "/usr/lib/python2.7/site-packages/b2/sync/sync.py", line 135, in make_folder_sync_actions dest_file) in zip_folders(source_folder, dest_folder, reporter, exclusions, inclusions): File "/usr/lib/python2.7/site-packages/b2/sync/sync.py", line 92, in zip_folders current_a = next_or_none(iter_a) File "/usr/lib/python2.7/site-packages/b2/sync/sync.py", line 36, in next_or_none return six.advance_iterator(iterator) File "/usr/lib/python2.7/site-packages/b2/sync/sync.py", line 48, in _filter_folder for f in folder.all_files(reporter): File "/usr/lib/python2.7/site-packages/b2/sync/folder.py", line 89, in all_files for file_object in self._walk_relative_paths(self.root, '', reporter): File "/usr/lib/python2.7/site-packages/b2/sync/folder.py", line 175, in _walk_relative_paths for subdir_file in self._walk_relative_paths(local_path, b2_path, reporter): File "/usr/lib/python2.7/site-packages/b2/sync/folder.py", line 175, in _walk_relative_paths for subdir_file in self._walk_relative_paths(local_path, b2_path, reporter): File "/usr/lib/python2.7/site-packages/b2/sync/folder.py", line 175, in _walk_relative_paths for subdir_file in self._walk_relative_paths(local_path, b2_path, reporter): File "/usr/lib/python2.7/site-packages/b2/sync/folder.py", line 178, in _walk_relative_paths file_mod_time = int(round(os.path.getmtime(local_path) * 1000)) File "/usr/lib64/python2.7/genericpath.py", line 54, in getmtime return os.stat(filename).st_mtime OSError: [Errno 2] No such file or directory: '/var/www/app/storage/sessions/165d7f488e047948b37f72c3b5eae3b398b47eb3'

shishanyu avatar Feb 18 '18 10:02 shishanyu

Nevermind, it's still failing, dunno why it keeps breaking at random times. It had been running ok for like 20 minutes and then it broke.

WARNING: /var/www/app/storage/sessions/341751ce7f1c8daab6b328694f7ceb3a96f3a9fc could not be accessed (broken symlink?)
WARNING: /var/www/app/storage/sessions/a1204920aaf00f924a6fe911b613adb8e0ed473c could not be accessed (broken symlink?)
WARNING: /var/www/app/storage/sessions/300b7bcb4fae7c9fe9b1d766e683f72be35ea8f1 could not be accessed (broken symlink?)
WARNING: /var/www/app/storage/sessions/f6c3c2fe3e78d37e79ab78de134bdbeb1020011b could not be accessed (broken symlink?)
WARNING: /var/www/app/storage/sessions/b789ca5eaea267d9b74a477bebd6f2a325d1fd70 could not be accessed (broken symlink?)
WARNING: /var/www/app/storage/sessions/3cb805159e4d294008ceb3ae9eb386884668a693 could not be accessed (broken symlink?)
WARNING: /var/www/app/storage/sessions/ae51d04ce969f7c0accce4427b481c3be5b41273 could not be accessed (broken symlink?)
WARNING: /var/www/app/storage/sessions/1c950cdb64cd9d6caf04297f3c6172834ef07211 could not be accessed (broken symlink?)
WARNING: /var/www/app/storage/sessions/4ae8ea6442088e5a5bf595368e64c8bc9fb7f685 could not be accessed (broken symlink?)
WARNING: /var/www/app/storage/sessions/81f707bb7f31e4055871ddd3436d952be703222f could not be accessed (broken symlink?)
WARNING: /var/www/app/storage/sessions/26dec730bd4d7ec00179a7ff9300210c7a38461d could not be accessed (broken symlink?)
ERROR:b2.console_tool:ConsoleTool unexpected exception
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/b2/console_tool.py", line 1074, in run_command
    return command.run(args)
  File "/usr/lib/python2.7/site-packages/b2/console_tool.py", line 857, in run
    allow_empty_source=allow_empty_source
  File "/usr/lib/python2.7/site-packages/logfury/v0_1/trace_call.py", line 84, in wrapper
    return function(*wrapee_args, **wrapee_kwargs)
  File "/usr/lib/python2.7/site-packages/b2/sync/sync.py", line 261, in sync_folders
    source_folder, dest_folder, args, now_millis, reporter
  File "/usr/lib/python2.7/site-packages/b2/sync/sync.py", line 135, in make_folder_sync_actions
    dest_file) in zip_folders(source_folder, dest_folder, reporter, exclusions, inclusions):
  File "/usr/lib/python2.7/site-packages/b2/sync/sync.py", line 92, in zip_folders
    current_a = next_or_none(iter_a)
  File "/usr/lib/python2.7/site-packages/b2/sync/sync.py", line 36, in next_or_none
    return six.advance_iterator(iterator)
  File "/usr/lib/python2.7/site-packages/b2/sync/sync.py", line 48, in _filter_folder
    for f in folder.all_files(reporter):
  File "/usr/lib/python2.7/site-packages/b2/sync/folder.py", line 89, in all_files
    for file_object in self._walk_relative_paths(self.root, '', reporter):
  File "/usr/lib/python2.7/site-packages/b2/sync/folder.py", line 175, in _walk_relative_paths
    for subdir_file in self._walk_relative_paths(local_path, b2_path, reporter):
  File "/usr/lib/python2.7/site-packages/b2/sync/folder.py", line 175, in _walk_relative_paths
    for subdir_file in self._walk_relative_paths(local_path, b2_path, reporter):
  File "/usr/lib/python2.7/site-packages/b2/sync/folder.py", line 175, in _walk_relative_paths
    for subdir_file in self._walk_relative_paths(local_path, b2_path, reporter):
  File "/usr/lib/python2.7/site-packages/b2/sync/folder.py", line 178, in _walk_relative_paths
    file_mod_time = int(round(os.path.getmtime(local_path) * 1000))
  File "/usr/lib64/python2.7/genericpath.py", line 54, in getmtime
    return os.stat(filename).st_mtime
OSError: [Errno 2] No such file or directory: '/var/www/app/storage/sessions/1595ee1c09f17194b28344f17580c037e1fe1f33'
Traceback (most recent call last):
  File "/bin/b2", line 9, in <module>
    load_entry_point('b2==1.1.0', 'console_scripts', 'b2')()
  File "/usr/lib/python2.7/site-packages/b2/console_tool.py", line 1193, in main
    exit_status = ct.run_command(decoded_argv)
  File "/usr/lib/python2.7/site-packages/b2/console_tool.py", line 1074, in run_command
    return command.run(args)
  File "/usr/lib/python2.7/site-packages/b2/console_tool.py", line 857, in run
    allow_empty_source=allow_empty_source
  File "/usr/lib/python2.7/site-packages/logfury/v0_1/trace_call.py", line 84, in wrapper
    return function(*wrapee_args, **wrapee_kwargs)
  File "/usr/lib/python2.7/site-packages/b2/sync/sync.py", line 261, in sync_folders
    source_folder, dest_folder, args, now_millis, reporter
  File "/usr/lib/python2.7/site-packages/b2/sync/sync.py", line 135, in make_folder_sync_actions
    dest_file) in zip_folders(source_folder, dest_folder, reporter, exclusions, inclusions):
  File "/usr/lib/python2.7/site-packages/b2/sync/sync.py", line 92, in zip_folders
    current_a = next_or_none(iter_a)
  File "/usr/lib/python2.7/site-packages/b2/sync/sync.py", line 36, in next_or_none
    return six.advance_iterator(iterator)
  File "/usr/lib/python2.7/site-packages/b2/sync/sync.py", line 48, in _filter_folder
    for f in folder.all_files(reporter):
  File "/usr/lib/python2.7/site-packages/b2/sync/folder.py", line 89, in all_files
    for file_object in self._walk_relative_paths(self.root, '', reporter):
  File "/usr/lib/python2.7/site-packages/b2/sync/folder.py", line 175, in _walk_relative_paths
    for subdir_file in self._walk_relative_paths(local_path, b2_path, reporter):
  File "/usr/lib/python2.7/site-packages/b2/sync/folder.py", line 175, in _walk_relative_paths
    for subdir_file in self._walk_relative_paths(local_path, b2_path, reporter):
  File "/usr/lib/python2.7/site-packages/b2/sync/folder.py", line 175, in _walk_relative_paths
    for subdir_file in self._walk_relative_paths(local_path, b2_path, reporter):
  File "/usr/lib/python2.7/site-packages/b2/sync/folder.py", line 178, in _walk_relative_paths
    file_mod_time = int(round(os.path.getmtime(local_path) * 1000))
  File "/usr/lib64/python2.7/genericpath.py", line 54, in getmtime
    return os.stat(filename).st_mtime
OSError: [Errno 2] No such file or directory: '/var/www/app/storage/sessions/1595ee1c09f17194b28344f17580c037e1fe1f33'

shishanyu avatar Feb 18 '18 11:02 shishanyu

Now when I run the exact same command it gives me the -help sync output and does nothing.

shishanyu avatar Feb 18 '18 12:02 shishanyu

that means there is a syntax error. it doesn't report that though

PeterNerlich avatar Feb 18 '18 14:02 PeterNerlich

b2 sync --threads 3 --excludeRegex app/storage/sessions/.* /var/www/ b2://something/ can be expanded by the shell sometimes, you should quote or escape * to prevent it from happening

ppolewicz avatar Feb 18 '18 15:02 ppolewicz

neither works, I think it really has a lot of trouble dealing with symlinks and expluceRedex command.

shishanyu avatar Feb 19 '18 00:02 shishanyu

I think that this error is happening because the file is gone:

OSError: [Errno 2] No such file or directory: '/var/www/app/storage/sessions/1595ee1c09f17194b28344f17580c037e1fe1f33'

It probably found the file while listing the directory, and then it was gone by the time it tried to upload it.

What's the right behaviour here? Should it ignore files that disappear during the sync? That would work here, but might not meet other people's expectations.

bwbeach avatar Feb 19 '18 01:02 bwbeach

In current version the behavior is appropriate.

Soon #407 will add capability to prevent the contents of such directory from being scanned.

ppolewicz avatar Feb 19 '18 01:02 ppolewicz

Yes, it's a session file, and it's temporary. I'm trying to backup a whole laravel app and omit that directory entirely. Can't really move it somewhere since it's my client's setup and can't modify it, so it'd really be great if we could just skip some directories altogether.

shishanyu avatar Feb 19 '18 01:02 shishanyu

If you can check out the latest and run it, there is now a --excludeDirRegex option that will prevent it from looking inside a directory when syncing.

bwbeach avatar Mar 16 '18 00:03 bwbeach

Please reopen if issue with excludeRegex persists in current version.

mjurbanski-reef avatar Nov 10 '23 13:11 mjurbanski-reef