SSX (grid scan) spot finding not parallelized and using a lot of memory
Using v3.14.2 (downloaded from website) on RHEL 8 computers. Datasets are collected as oscillations, though they should be considered stills. To convert the data to be seen as stills, we read datasets in as grid scans, 400 images (20x20). When we run the default spot finding algorithm, it insists on doing a couple of things that do not seem correct:
- running with only a single core (despite setting njobs or chunksize)
- constantly increasing in RAM usage until it crashes the process (when trying to spot find 4x 400 full Eiger 16M datasets - 256GB RAM). This actually works fine with a single dataset, but we wanted to test whether it can handle 4 full datasets. Note the output below is for 2x 400 image datasets, and seems to work ok.
dials.import as_grid_scan=True grid_size=20,20 directory=.
DIALS (2018) Acta Cryst. D74, 85-97. https://doi.org/10.1107/S2059798317017235 DIALS 3.14.2-gf955030ae-release The following parameters have been modified:
input { directory = "." as_grid_scan = True grid_size = 20 20 }
format: <class 'dxtbx.format.FormatNXmxEigerFilewriter.FormatNXmxEigerFilewriter'> num images: 800 sequences: still: 0 sweep: 0 num stills: 800
Writing experiments to imported.expt
dials.find_spots imported.expt
DIALS (2018) Acta Cryst. D74, 85-97. https://doi.org/10.1107/S2059798317017235 DIALS 3.14.2-gf955030ae-release The following parameters have been modified:
input { experiments = imported.expt }
Setting spotfinder.filter.min_spot_size=3 Configuring spot finder from input parameters
Finding strong spots in imageset 0
Finding spots in image 1 to 1... Setting nproc=128 Setting chunksize=1 Extracting strong pixels from images Using multiprocessing with 1 parallel job(s)
Found 1827 strong pixels on image 1
Hi @JunAishima; as an initial guess, does the setting dials.import convert_stills_to_sequences=true help at all? It looks like the NXmx file is being picked up as the original "Stills" processing (which dials.stills_process uses), instead of the "Sequence of Stills" form (which is better for running the normal dials tooling on still images).
I am also not certain "as_grid_scan" does anything useful and it could even be actively problematic in this situation.
If you import without that, does it "work right"?
I am also not certain "as_grid_scan" does anything useful and it could even be actively problematic in this situation.
If you import without that, does it "work right"?
In the case of a set of still images whose metadata is shown as an oscillation dataset, we needed to use the as_grid_scan option. Otherwise, we get nothing useful by the time we try indexing.
Is there a good description for exactly what as_grid_scan does? It seemed to tell DIALS that our dataset is a bunch of stills, which was useful for the way these datasets were collected.
Hi @JunAishima; as an initial guess, does the setting
dials.import convert_stills_to_sequences=truehelp at all? It looks like the NXmx file is being picked up as the original "Stills" processing (whichdials.stills_processuses), instead of the "Sequence of Stills" form (which is better for running the normal dials tooling on still images).
Thanks for this tip - looks like we are now able to get each dataset processed as a group, so each image isn't processed individually! It was previously taking 10s of minutes on a fast computer (and using a lot of RAM), but it's taking just a few minutes for spotfinding now!
Appreciate that the problem is fixed, however as documentation you can also override the rotation on import, with geometry.scan.oscillation=0,0 I think
Appreciate that the problem is fixed, however as documentation you can also override the rotation on import, with
geometry.scan.oscillation=0,0I think
Thanks for this as well! Will give it a try next time. Just in time for #2452 it sounds like...
I came across this memory issue on an NXmx file I created from the SACLA CITIUS 20.2M detector. As suggested by @ndevenish, convert_stills_to_sequences=True solved the problem. This problem occurs on non-NXmx still data as well, at least on MPCCD (see SACLA-MPCCD-Phase3-21528-5images.h5 in dials-data).
@ndevenish @graeme-winter
Perhaps we should make convert_stills_to_sequences=True the default. Or mention this in the SSX processing guide at least?