mdsplus icon indicating copy to clipboard operation
mdsplus copied to clipboard

Accessing tree info after tree has been externally recreated results in SIGBUS

Open merlea opened this issue 1 year ago • 5 comments

Affiliation SPC-EPFL

Version(s) Affected Stable 7.142.81

Platform(s) Ubuntu 24.04

Installation Method(s) Official DEB repository

Describe the bug If a tree is open in process 1 and process 2 replaces this tree with a new copy, then reopening the tree in process 1 results in a SIGBUS error.

To Reproduce Steps to reproduce the behavior:

  1. Open a terminal set default_tree_path=/tmp and execute mdstcl and the following commands
TCL> edit test/shot=1 /new
TCL> write
TCL> close
  1. Open another terminal set default_tree_path=/tmp and execute tdic
TDI> TreeOpen("test",1)

Leave the tdic session open. 3. Go back to the original terminal of step 1 and repeat step 1 4. Go back to the terminal of step 2 and try to open the tree again

TDI> TreeOpen("test",1)
  1. tdic crashes with SIGBUS error

Expected behavior No crash, maybe an error message explaining that the db entry seems to not be valid anymore and should be closed, or close the tree silently and continue with the user input.

merlea avatar Nov 25 '24 11:11 merlea

This is 'known behavior'. You can not edit a tree that is open. We should (to the extent we can) notice that this is going on, and either prevent the tree from being edited since it is open, or as noted in the issue, warn the user.

I am not sure either of these is doable

joshStillerman avatar Nov 25 '24 18:11 joshStillerman

The TreeEstablishRundownEvent which was probably for this purpose, is not implemented. It is in treeshr/dummy.c

joshStillerman avatar Nov 25 '24 19:11 joshStillerman

code to tree open could be added to:

  • take out a shared read lock on the tree file on open
  • attempt to take out an exclusive lock on the tree file on edit - fail if it can not
  • release the shared read lock on suspension
  • Attempt to re acquire the read lock on resumption - fail if it an not
  • this is to prevent editing of open trees

It does appear it would work from tests

I would only do it for local IO (which would translate to doing it for remote IO)

Of course it is a bit more complicated than that.

  • remote IO reads the tree file on open (It may (could be closing it)?)
  • the lock must be kept until the tree is closed regardless
  • This could be dealt with by keeping the tree file open for read or using bsd style locks on a duplicate file descriptor.
  • When processes holding the read lock are suspended, they could release the lock and attempt to re-aquire the lock on resumption.
  • The re acquire idea is NG since if the tree has been edited it must be re opened, and re read - way to complicated. It would need to somehow check that the file had not been replaced on re acquire. Probably then throwing an error

All of this is pretty complicated, so should be thought about before implimentation.

joshStillerman avatar Jan 03 '25 15:01 joshStillerman

Sample code that does some of this:

shared_lock.c

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>

int main() {
    const char *filename = "testfile.txt";
    int fd = open(filename, O_RDONLY);
    if (fd == -1) {
        perror("Error opening file");
        exit(EXIT_FAILURE);
    }

    struct flock lock;
    lock.l_type = F_RDLCK;    // Shared read lock
    lock.l_whence = SEEK_SET;
    lock.l_start = 0;         // Start of the file
    lock.l_len = 0;           // Lock the whole file

    if (fcntl(fd, F_SETLK, &lock) == -1) {
        perror("Error acquiring shared lock");
        close(fd);
        exit(EXIT_FAILURE);
    }

    printf("Shared lock acquired. Holding for 30 seconds...\n");
    sleep(30); // Hold the lock for 30 seconds

    // Release the lock
    lock.l_type = F_UNLCK;
    if (fcntl(fd, F_SETLK, &lock) == -1) {
        perror("Error releasing lock");
    }

    close(fd);
    return 0;
}

exclusive_lock.c

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>

int main() {
    const char *filename = "testfile.txt";
    int fd = open(filename, O_WRONLY);
    if (fd == -1) {
        perror("Error opening file");
        exit(EXIT_FAILURE);
    }

    struct flock lock;
    lock.l_type = F_WRLCK;    // Exclusive write lock
    lock.l_whence = SEEK_SET;
    lock.l_start = 0;         // Start of the file
    lock.l_len = 0;           // Lock the whole file

    if (fcntl(fd, F_SETLK, &lock) == -1) {
        perror("Error acquiring exclusive lock");
        close(fd);
        exit(EXIT_FAILURE);
    }

    printf("Exclusive lock acquired.\n");

    // Release the lock
    lock.l_type = F_UNLCK;
    if (fcntl(fd, F_SETLK, &lock) == -1) {
        perror("Error releasing lock");
    }

    close(fd);
    return 0;
}

joshStillerman avatar Jan 03 '25 15:01 joshStillerman

To deal with resuming and telling if you need to now throw an error because the tree was edited:

struct stat initial_stat;
if (fstat(fd, &initial_stat) == -1) {
    perror("fstat error");
    // Handle error
}

and

struct stat current_stat;
if (stat(filepath, &current_stat) == -1) {
    perror("stat error");
    // Handle error
}

and

if (initial_stat.st_ino != current_stat.st_ino || initial_stat.st_dev != current_stat.st_dev) {
    printf("The file has been replaced by another process.\n");
    // Handle the file replacement scenario
} else {
    printf("The file has not been replaced.\n");
}

joshStillerman avatar Jan 03 '25 16:01 joshStillerman