Accessing tree info after tree has been externally recreated results in SIGBUS
Affiliation SPC-EPFL
Version(s) Affected Stable 7.142.81
Platform(s) Ubuntu 24.04
Installation Method(s) Official DEB repository
Describe the bug If a tree is open in process 1 and process 2 replaces this tree with a new copy, then reopening the tree in process 1 results in a SIGBUS error.
To Reproduce Steps to reproduce the behavior:
- Open a terminal set
default_tree_path=/tmpand executemdstcland the following commands
TCL> edit test/shot=1 /new
TCL> write
TCL> close
- Open another terminal set
default_tree_path=/tmpand executetdic
TDI> TreeOpen("test",1)
Leave the tdic session open.
3. Go back to the original terminal of step 1 and repeat step 1
4. Go back to the terminal of step 2 and try to open the tree again
TDI> TreeOpen("test",1)
-
tdiccrashes with SIGBUS error
Expected behavior No crash, maybe an error message explaining that the db entry seems to not be valid anymore and should be closed, or close the tree silently and continue with the user input.
This is 'known behavior'. You can not edit a tree that is open. We should (to the extent we can) notice that this is going on, and either prevent the tree from being edited since it is open, or as noted in the issue, warn the user.
I am not sure either of these is doable
The TreeEstablishRundownEvent which was probably for this purpose, is not implemented. It is in treeshr/dummy.c
code to tree open could be added to:
- take out a shared read lock on the tree file on open
- attempt to take out an exclusive lock on the tree file on edit - fail if it can not
- release the shared read lock on suspension
- Attempt to re acquire the read lock on resumption - fail if it an not
- this is to prevent editing of open trees
It does appear it would work from tests
I would only do it for local IO (which would translate to doing it for remote IO)
Of course it is a bit more complicated than that.
- remote IO reads the tree file on open (It may (could be closing it)?)
- the lock must be kept until the tree is closed regardless
- This could be dealt with by keeping the tree file open for read or using bsd style locks on a duplicate file descriptor.
- When processes holding the read lock are suspended, they could release the lock and attempt to re-aquire the lock on resumption.
- The re acquire idea is NG since if the tree has been edited it must be re opened, and re read - way to complicated. It would need to somehow check that the file had not been replaced on re acquire. Probably then throwing an error
All of this is pretty complicated, so should be thought about before implimentation.
Sample code that does some of this:
shared_lock.c
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
int main() {
const char *filename = "testfile.txt";
int fd = open(filename, O_RDONLY);
if (fd == -1) {
perror("Error opening file");
exit(EXIT_FAILURE);
}
struct flock lock;
lock.l_type = F_RDLCK; // Shared read lock
lock.l_whence = SEEK_SET;
lock.l_start = 0; // Start of the file
lock.l_len = 0; // Lock the whole file
if (fcntl(fd, F_SETLK, &lock) == -1) {
perror("Error acquiring shared lock");
close(fd);
exit(EXIT_FAILURE);
}
printf("Shared lock acquired. Holding for 30 seconds...\n");
sleep(30); // Hold the lock for 30 seconds
// Release the lock
lock.l_type = F_UNLCK;
if (fcntl(fd, F_SETLK, &lock) == -1) {
perror("Error releasing lock");
}
close(fd);
return 0;
}
exclusive_lock.c
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
int main() {
const char *filename = "testfile.txt";
int fd = open(filename, O_WRONLY);
if (fd == -1) {
perror("Error opening file");
exit(EXIT_FAILURE);
}
struct flock lock;
lock.l_type = F_WRLCK; // Exclusive write lock
lock.l_whence = SEEK_SET;
lock.l_start = 0; // Start of the file
lock.l_len = 0; // Lock the whole file
if (fcntl(fd, F_SETLK, &lock) == -1) {
perror("Error acquiring exclusive lock");
close(fd);
exit(EXIT_FAILURE);
}
printf("Exclusive lock acquired.\n");
// Release the lock
lock.l_type = F_UNLCK;
if (fcntl(fd, F_SETLK, &lock) == -1) {
perror("Error releasing lock");
}
close(fd);
return 0;
}
To deal with resuming and telling if you need to now throw an error because the tree was edited:
struct stat initial_stat;
if (fstat(fd, &initial_stat) == -1) {
perror("fstat error");
// Handle error
}
and
struct stat current_stat;
if (stat(filepath, ¤t_stat) == -1) {
perror("stat error");
// Handle error
}
and
if (initial_stat.st_ino != current_stat.st_ino || initial_stat.st_dev != current_stat.st_dev) {
printf("The file has been replaced by another process.\n");
// Handle the file replacement scenario
} else {
printf("The file has not been replaced.\n");
}