rez icon indicating copy to clipboard operation
rez copied to clipboard

rez-suite broken in 2.104.7- cannot add second context

Open jlgerber opened this issue 3 years ago • 6 comments

The following fails for me:

generate two contexts from two different packages

rez env sgq_schema -o sgqschema.rxt rez env am_query -o amquery.rxt

create the suite

rez suite --create ./misc_suite

add the first context

rez suite --add ./sgqschema.rxt --context sqgschema

attempt to add the second context per the documentation

rez suite --add ./amquery.rxt --context amquery

Traceback (most recent call last):
  File "/laika/depts/prod_tech/rez/lib/python/linux/rez/bin/rez/rez-suite", line 8, in <module>
    sys.exit(run_rez_suite())
  File "/net/ent-prod.nfs.laika.com/ifs/laika/depts/prod_tech/rez/lib/python/linux/rez-2.104.7/lib/python2.7/site-packages/rez/cli/_entry_points.py", line 239, in run_rez_suite
    return run("suite")
  File "/net/ent-prod.nfs.laika.com/ifs/laika/depts/prod_tech/rez/lib/python/linux/rez-2.104.7/lib/python2.7/site-packages/rez/cli/_main.py", line 191, in run
    returncode = run_cmd()
  File "/net/ent-prod.nfs.laika.com/ifs/laika/depts/prod_tech/rez/lib/python/linux/rez-2.104.7/lib/python2.7/site-packages/rez/cli/_main.py", line 183, in run_cmd
    return func(opts, opts.parser, extra_arg_groups)
  File "/net/ent-prod.nfs.laika.com/ifs/laika/depts/prod_tech/rez/lib/python/linux/rez-2.104.7/lib/python2.7/site-packages/rez/cli/suite.py", line 200, in command
    suite.save(opts.DIR)
  File "/net/ent-prod.nfs.laika.com/ifs/laika/depts/prod_tech/rez/lib/python/linux/rez-2.104.7/lib/python2.7/site-packages/rez/suite.py", line 442, in save
    shutil.rmtree(path)
  File "/net/ent-prod.nfs.laika.com/ifs/laika/home/j/jgerber/packages/python/2.7.18/platform-linux/arch-x86_64/os-CentOS-7.7.1908/lib/python2.7/shutil.py", line 270, in rmtree
    rmtree(fullname, ignore_errors, onerror)
  File "/net/ent-prod.nfs.laika.com/ifs/laika/home/j/jgerber/packages/python/2.7.18/platform-linux/arch-x86_64/os-CentOS-7.7.1908/lib/python2.7/shutil.py", line 279, in rmtree
    onerror(os.rmdir, path, sys.exc_info())
  File "/net/ent-prod.nfs.laika.com/ifs/laika/home/j/jgerber/packages/python/2.7.18/platform-linux/arch-x86_64/os-CentOS-7.7.1908/lib/python2.7/shutil.py", line 277, in rmtree
    os.rmdir(path)
OSError: [Errno 39] Directory not empty: '/net/ent-prod.nfs.laika.com/ifs/laika/dist/rel/packages/rez-suites/linux/cli_utils/contexts'

At the point of failure, the bin directory and suite.yaml have both been removed. (why?)

I know that this used to work with previous versions.

jlgerber avatar Feb 18 '22 19:02 jlgerber

That's odd, I can't repro this. What platform/os are you on? Are you writing to local disk? A

On Sat, Feb 19, 2022 at 6:33 AM jlgerber @.***> wrote:

The following fails for me: generate two contexts from two different packages

rez env sgq_schema -o sgqschema.rxt rez env am_query -o amquery.rxt create the suite

rez suite --create ./misc_suite add the first context

rez suite --add ./sgqschema.rxt --context sqgschema attempt to add the second context per the documentation

rez suite --add ./amquery.rxt --context amquery

Traceback (most recent call last): File "/laika/depts/prod_tech/rez/lib/python/linux/rez/bin/rez/rez-suite", line 8, in sys.exit(run_rez_suite()) File "/net/ent-prod.nfs.laika.com/ifs/laika/depts/prod_tech/rez/lib/python/linux/rez-2.104.7/lib/python2.7/site-packages/rez/cli/_entry_points.py", line 239, in run_rez_suite return run("suite") File "/net/ent-prod.nfs.laika.com/ifs/laika/depts/prod_tech/rez/lib/python/linux/rez-2.104.7/lib/python2.7/site-packages/rez/cli/_main.py", line 191, in run returncode = run_cmd() File "/net/ent-prod.nfs.laika.com/ifs/laika/depts/prod_tech/rez/lib/python/linux/rez-2.104.7/lib/python2.7/site-packages/rez/cli/_main.py", line 183, in run_cmd return func(opts, opts.parser, extra_arg_groups) File "/net/ent-prod.nfs.laika.com/ifs/laika/depts/prod_tech/rez/lib/python/linux/rez-2.104.7/lib/python2.7/site-packages/rez/cli/suite.py", line 200, in command suite.save(opts.DIR) File "/net/ent-prod.nfs.laika.com/ifs/laika/depts/prod_tech/rez/lib/python/linux/rez-2.104.7/lib/python2.7/site-packages/rez/suite.py", line 442, in save shutil.rmtree(path) File "/net/ent-prod.nfs.laika.com/ifs/laika/home/j/jgerber/packages/python/2.7.18/platform-linux/arch-x86_64/os-CentOS-7.7.1908/lib/python2.7/shutil.py", line 270, in rmtree rmtree(fullname, ignore_errors, onerror) File "/net/ent-prod.nfs.laika.com/ifs/laika/home/j/jgerber/packages/python/2.7.18/platform-linux/arch-x86_64/os-CentOS-7.7.1908/lib/python2.7/shutil.py", line 279, in rmtree onerror(os.rmdir, path, sys.exc_info()) File "/net/ent-prod.nfs.laika.com/ifs/laika/home/j/jgerber/packages/python/2.7.18/platform-linux/arch-x86_64/os-CentOS-7.7.1908/lib/python2.7/shutil.py", line 277, in rmtree os.rmdir(path) OSError: [Errno 39] Directory not empty: '/net/ent-prod.nfs.laika.com/ifs/laika/dist/rel/packages/rez-suites/linux/cli_utils/contexts'

At the point of failure, the bin directory and suite.yaml have both been removed. (why?)

I know that this used to work with previous versions.

— Reply to this email directly, view it on GitHub https://github.com/nerdvegas/rez/issues/1222, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMOUSUAT6I42MDGD4DCG23U32NIBANCNFSM5OY36PGQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you are subscribed to this thread.Message ID: @.***>

nerdvegas avatar Feb 18 '22 20:02 nerdvegas

this is on Centos 7 and the location in question is mounted over NFS.

Interestingly enough, when I inspect the directory in question, it appears empty ( ls -a yields nothing).

I will give this a try on local disk to rule out issues around shared storage

jlgerber avatar Feb 23 '22 22:02 jlgerber

This works locally. So i suppose it is related to NFS. Interestingly, this just started happening for us. This appears to be a known outcome when using shutil.rmtree on nfs mounted directories.... I wonder if it would be reasonable to add ignore_errors=True to the shutil.rmtree call in save....

one would also have to handle a subsequent call to os.makedirs in order to handle the case where the context directory was not deleted...

jlgerber avatar Feb 23 '22 23:02 jlgerber

I patched our code to see if this approach fixed our issue and it did.

jlgerber avatar Feb 24 '22 00:02 jlgerber

Ah righto, could you add some more info so I can follow this up and potentially fix in rez also? So you're saying there's a known issue with shutil.rmtree over nfs..? Do you know how that's specifically then manifesting in the second 'context add' failing?

Cheers A

On Thu, Feb 24, 2022 at 11:32 AM jlgerber @.***> wrote:

I patched our code to see if this approach fixed our issue and it did.

— Reply to this email directly, view it on GitHub https://github.com/nerdvegas/rez/issues/1222#issuecomment-1049363212, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMOUSQKOHTOWOUIZFNV4FDU4V4DNANCNFSM5OY36PGQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you commented.Message ID: @.***>

nerdvegas avatar Feb 24 '22 00:02 nerdvegas

sure thing. I based the comment on a cursory google search of "shutil.rmtree nfs". I will instrument the call to see if I can determine what is triggering the failure. By the time the exception is thrown, the directory is in fact empty.

As I suspected, there is a file beginning with '.nfs' that exists at the time which shutil.rmtree is doing its thing. NFS uses these files for book keeping purposes. they are managed by the nfs client. So this appears to be a race condition of sorts.

jlgerber avatar Feb 24 '22 01:02 jlgerber