gridengine icon indicating copy to clipboard operation
gridengine copied to clipboard

Getting failure to stat directory for stdout

Open toomanycats opened this issue 1 year ago • 3 comments

I'm tracking down a very obscure error, where about 30% of submitted jobs, go into the Eqw state. The error is always the same,

error reason          1:      08/19/2024 14:10:31 [1730373583:9092]: can't stat() "/grid_test" as stdout_path: Permission denied KRB5CCNAME=none uid=xxx gid=xxx 101 600  xxx  xxx xxx

We thought this was due to using a brand new storage appliance. However, when permissions are get wide open there's no change in the behavior. I've captured NFS traffic and been analyzing it in Wireshark. I don't see any FSSTAT failling.

I'm wondering, if the SGE daemon creates the stdout and stderr file in the sge root directory and the client then copies it out ??

Any ideas are appreciated.

toomanycats avatar Aug 19 '24 22:08 toomanycats

At first this could be an error caused by a MAC solution. Can you check if AppArmor or SELinux could be the culprit, i.e. disabling either one of those and seeing if the error disappears.

grisu48 avatar Aug 20 '24 06:08 grisu48

That's a good idea but it didn't help. I set selinux into permissive mode, rebooted and received the same error. This new storage is a cluster so I was hoping that might work.

What do you think about this function: sge_filecmp in source/libs/uti/sge_io.c line 166.

/****** uti/io/sge_filecmp() **************************************************
  1 *  NAME
  2 *     sge_filecmp() -- Compare two files
  3 *
  4 *  SYNOPSIS
  5 *     int sge_filecmp(const char *name0, const char *name1)
  6 *
  7 *  FUNCTION
  8 *     Compare two files. They are equal if:
  9 *        - both of them have the same name
 10 *        - if a stat() succeeds for both files and
 11 *          i-node/device-id are equal

toomanycats avatar Aug 20 '24 15:08 toomanycats

Not sure, but given that the error message says explicitly Permission denied I would assume the error is somewhere in the file system permissions.

grisu48 avatar Aug 21 '24 06:08 grisu48

Closing the issue due to inactivity.

grisu48 avatar Dec 19 '24 12:12 grisu48