gstat icon indicating copy to clipboard operation
gstat copied to clipboard

Using predict.gstat function with nsim>0 causes segfault error

Open lea-c opened this issue 1 year ago • 9 comments

Hi,

I've been encountering a problem I don't understand with the predict.gstat function.

I would like to use it to interpolate residuals using a cokriging model. Running the function with nsim=0 works fine but if I increase it to a larger number (even a small one like 10, although I need to run it eventually with nsim=300) it automatically causes the following error, even on computers with large computing power and memory: *** caught segfault *** address 0x38, cause 'memory not mapped'

Could you help me identify the issue ? I checked for duplicate coordinates with zerodist, for unmatching CRSs, I tried with many values of maxdist and nmax, as well as with fewer points, so I'm at a loss here.

Below is a minimal example, and here is the link to download data: https://filesender.renater.fr/?s=download&token=911d7d4c-6bf5-4bec-a316-8adc5417b9d8 Thank you very much in advance. I remain available for any complementary information.

library(gstat)

# Load variogram load("./var_awc.RData") awc_sem <- cv.fit

# Load data to predict val_sp <- readRDS("./point_data.Rdata")

# Works fine gstat:::predict.gstat(awc_sem, newdata = val_sp)

# Crashes gstat:::predict.gstat(awc_sem, newdata = val_sp, nsim=10, maxdist=500)

lea-c avatar May 07 '24 08:05 lea-c

Hi, I've analysed the problem more in depth and it seems to stem from a projection error. The points in val_sp are projected too far away from the points used to fit the variogram and it may cause a memory overload ? This is my hypothesis here. Unfortunately, I still have not been able to solve the issue.

Fyi, here is a dataset for which the 300 simulations can be carried out: https://filesender.renater.fr/?s=download&token=c23e2359-6334-42ae-b425-2127bd302b63 It contains a lot more points that val_sp so this is in line with the fact that the number of points in val_sp plays is not responsible here.

This dataset has the same CRS and coordinates of similar magnitude to val_sp's, but it does not look like their extents overlap. Weirdly, when I plot grd and then val_sp, they look like they are in the exact same projection. grd <- readRDS("./fine_data.Rdata") plot(grd) plot(val_sp, add=T) But when I plot val_sp before grd, an inconsistency appears, and the plots don't overlap. plot(val_sp) plot(grd, add=T)

I've dealt with many CRSs issues before but I cannot figure this one out. I've reprojected the datasets over and over to make sure the CRSs match, but without success. Could you just help me make sure that val_sp and grd align? This will certainly solve the issue. Thank you very much for your help! And again, I remain available if you have any question

lea-c avatar May 13 '24 14:05 lea-c

Thanks for the update; I've looked at the issue and can reproduce the error you're getting. I can see where it happens when run in a debugger, but haven't been able to identify its cause.

edzer avatar May 13 '24 14:05 edzer

Thank you very much for looking into this problem! Here is another update: actually I don't think this is a CRS issue. I found a dataset very similar to mine and with the same CRS which does not cause any issue. You can download the new data here... https://filesender.renater.fr/?s=download&token=7bc3178c-2c7a-4fd7-89b4-b15e2afcb419 ...and run the following code:

library(sp)
library(gstat)

# Load variogram
variogram <- readRDS("./variogram.Rdata")

# Load the two datasets
spatial_points_problem <- readRDS("./spatial_points_problem.Rdata")
spatial_points_working <- readRDS("./spatial_points_working.Rdata")

# Compare CRS
proj4string(spatial_points_problem)
proj4string(spatial_points_working)

# Compare extents
plot(extent(spatial_points_problem@bbox), col="red")
plot(extent(spatial_points_working@bbox), col="blue",add=T)

# Check for duplicates
zerodist(spatial_points_problem, z=0.9)
zerodist(spatial_points_working, z=0.9)

# Compare number of points - the only difference here but again, it should not cause any issue
print(nrow(spatial_points_problem))
print(nrow(spatial_points_working))

# Plot the two datasets
plot(spatial_points_problem, col="red")
plot(spatial_points_working, col="blue", add=T)

# Works 
gstat:::predict.gstat(variogram, newdata = spatial_points_working, nsim=10) # I get a warning here due to the fact that I had to modify the dataset a bit but I don't think this is important

# Crashes
gstat:::predict.gstat(variogram, newdata = spatial_points_problem, nsim=10)  # crashes also with various maxdist and nmax values

This is very weird...

lea-c avatar May 15 '24 07:05 lea-c

Hi again, I tried to go further in diagnosing the problem and identified the points that may be responsible for the error. Below is a figure where:

  • the points of spatial_points_working are in blue
  • the points of raster::intersect(spatial_points_problem, spatial_points_working) are in green
  • the remaining points of spatial_points_problem are in red -> It seems that the extent of blue points is the delimitator between "good" and "bad" points here, but why would it play a role, since:
  • the bounding box is the extent of the points which served to adjust the variogram: it contains all the points points_problematiques

The predict.gstat function runs fine with nsim>0 on the green points: however, it crashes as soon as a red point is added.

Also, I have tried to debug the function myself but I have not been able to run it outside the package, because of the C calls.

I hope this piece of additional information is helpful

Best regards

lea-c avatar May 29 '24 07:05 lea-c

This might be a clue:

> zerodist2(spatial_points_problem, variogram$data$awc_0_30_m$data)
     [,1] [,2]
[1,]  519 1711
> zerodist2(spatial_points_working, variogram$data$awc_0_30_m$data)
     [,1] [,2]

edzer avatar Jun 19 '24 11:06 edzer

g = gstat:::predict.gstat(variogram, newdata = spatial_points_problem[-519,], nsim=10)

works!

edzer avatar Jun 19 '24 11:06 edzer

Sorry for my late reply. It works, thank you so much for your time and effort !

lea-c avatar Jun 26 '24 14:06 lea-c

Thanks; note that this is a work-around, the bug is still present.

edzer avatar Jul 09 '24 08:07 edzer

Yes, I've noticed: for now I just removed tens of points from my dataset (larger than the one I sent to you). I would be curious to understand what happens.

lea-c avatar Jul 11 '24 08:07 lea-c