STIR icon indicating copy to clipboard operation
STIR copied to clipboard

Modifications for psmr2024

Open NikEfth opened this issue 1 year ago • 28 comments

Changes in this pull request

Testing performed

Related issues

Checklist before requesting a review

  • [] I have performed a self-review of my code
  • [] I have added docstrings/doxygen in line with the guidance in the developer guide
  • [] I have implemented unit tests that cover any new or modified functionality (if applicable)
  • [] The code builds and runs on my machine
  • [] documentation/release_XXX.md has been updated with any functionality change (if applicable)

NikEfth avatar May 15 '24 19:05 NikEfth

If this commit breaks many tests, I would consider it expected.

NikEfth avatar May 15 '24 19:05 NikEfth

tests fail with

ERROR: Non-TOF data with inconsistent Time-of-Flight bin number - Aborted operation.

KrisThielemans avatar May 16 '24 08:05 KrisThielemans

We can turn this to a warning, for now. Or, drop the old nonTOF altogether. But it will break people reading old headers.

NikEfth avatar May 16 '24 14:05 NikEfth

A few more comments. Please merge master as CI is currently not running due to a conflict.

I hadn't noticed that ...

NikEfth avatar May 16 '24 22:05 NikEfth

ah well. ctests are still failing. I cannot look at this now though.

KrisThielemans avatar May 16 '24 22:05 KrisThielemans

ah well. ctests are still failing. I cannot look at this now though.

I bet it has do to with the new TOF behavior.

NikEfth avatar May 16 '24 22:05 NikEfth

The new utility still needs about a 2 hour to finish the conversion for the demo scanner. It is an improvement.

NikEfth avatar May 17 '24 04:05 NikEfth

I need to get some sleep. Now, we can load the fanProjdata from the disk. I also added some preliminary support for the LM cache loading.

NikEfth avatar May 17 '24 05:05 NikEfth

My test_time_of_flights pass successfully. I made some small changes hopefully the CI is happy now.

NikEfth avatar May 17 '24 15:05 NikEfth

Sadly ./run_test_listmode_recon.sh is failing on the TOF data. Not sure yet why.

KrisThielemans avatar May 17 '24 16:05 KrisThielemans

error is


INFO: Computing sensitivity

INFO: Calculating sensitivity for subset 0

ERROR: TOF code for sensitivity needs work
terminate called after throwing an instance of 'std::runtime_error'
  what():  TOF code for sensitivity needs work

KrisThielemans avatar May 17 '24 17:05 KrisThielemans

Now, I think old nonTOF Scanner templates will not work anymore, as the coincidence window defaults to something non-sensical.

NikEfth avatar May 17 '24 17:05 NikEfth

This is a TOF test using the supplied TOF .hroot.

KrisThielemans avatar May 17 '24 17:05 KrisThielemans

I just ran it and worked fine.


=== Generate attenuation image

WARNING: image duration not set, so time frame definitions will not be initialised

INFO: Processing next shape...
=== Calculate ACFs
=== Creating my_test_lm_frame.fdef (time frame definitions)
=============== Using frame_nonTOF definition =============
=== Unlist listmode data (for comparison)
---- Executable ran ok
=== Create additive sino my_additive_sinogram_frame_nonTOF.hs
---- Executable ran ok
=== Reconstruct listmode data without cache
---- Executable ran ok
=== Reconstruct listmode data with cache and store it on disk
---- Executable ran ok
=== Compare reconstructed images with and without caching LM file
---- This test seems to be ok !
=== Reconstruct listmode data with cache loaded from the disk
---- Executable ran ok
=== Compare reconstructed images without caching LM file and with loading cache from disk
---- This test seems to be ok !
=== Reconstruct projection data for comparison
---- Executable ran ok
=== Compare sensitivity images
---- This test seems to be ok !
=== Compare reconstructed images
---- This test seems to be ok !
=============== Using counts_nonTOF definition =============
=== Unlist listmode data (for comparison)
---- Executable ran ok
=== Create additive sino my_additive_sinogram_counts_nonTOF.hs
---- Executable ran ok
=== Reconstruct listmode data without cache
---- Executable ran ok
=== Reconstruct listmode data with cache and store it on disk
---- Executable ran ok
=== Compare reconstructed images with and without caching LM file
---- This test seems to be ok !
=== Reconstruct listmode data with cache loaded from the disk
---- Executable ran ok
=== Compare reconstructed images without caching LM file and with loading cache from disk
---- This test seems to be ok !
=== Reconstruct projection data for comparison
---- Executable ran ok
=== Compare sensitivity images
---- This test seems to be ok !
=== Compare reconstructed images
---- This test seems to be ok !

============= TOF tests using root_header.hroot ==========

=== Generate attenuation image

WARNING: FactoryRegistry:: overwriting previous value of key in registry.
     key: None

WARNING: image duration not set, so time frame definitions will not be initialised

INFO: Processing next shape...
=== Calculate ACFs
=== Creating my_test_lm_frame.fdef (time frame definitions)
=============== Using frame_TOF definition =============
=== Unlist listmode data (for comparison)
---- Executable ran ok
=== Create additive sino my_additive_sinogram_frame_TOF.hs
---- Executable ran ok
=== Reconstruct listmode data without cache
---- Executable ran ok
=== Reconstruct listmode data with cache and store it on disk
---- Executable ran ok
=== Compare reconstructed images with and without caching LM file
---- This test seems to be ok !
=== Reconstruct listmode data with cache loaded from the disk
---- Executable ran ok
=== Compare reconstructed images without caching LM file and with loading cache from disk
---- This test seems to be ok !
=== Reconstruct projection data for comparison
---- Executable ran ok
=== Compare sensitivity images
---- This test seems to be ok !
=== Compare reconstructed images
---- This test seems to be ok !
=============== Using counts_TOF definition =============
=== Unlist listmode data (for comparison)
---- Executable ran ok
=== Create additive sino my_additive_sinogram_counts_TOF.hs
---- Executable ran ok
=== Reconstruct listmode data without cache
---- Executable ran ok
=== Reconstruct listmode data with cache and store it on disk
---- Executable ran ok
=== Compare reconstructed images with and without caching LM file
---- This test seems to be ok !
=== Reconstruct listmode data with cache loaded from the disk
---- Executable ran ok
=== Compare reconstructed images without caching LM file and with loading cache from disk
---- This test seems to be ok !
=== Reconstruct projection data for comparison
---- Executable ran ok
=== Compare sensitivity images
---- This test seems to be ok !
=== Compare reconstructed images
---- This test seems to be ok !

--------------- End of tests -------------

Everything seems to be fine !
You could remove all generated files using "rm -f my_* *.log"

NikEfth avatar May 17 '24 17:05 NikEfth

I just ran it and worked fine.

not on GitHub Actions, and not for me. Debugging now.

KrisThielemans avatar May 17 '24 19:05 KrisThielemans

Can you try adding TOF mashing factor:= 1 in the root_header. Maybe it will help

NikEfth avatar May 17 '24 20:05 NikEfth

I was a bit behind. It works now! Great.

GHA now has a segfault in find_ML_normfactors3D test.

KrisThielemans avatar May 17 '24 20:05 KrisThielemans

@NikEfth I've pushed 2 parallelisations using reduction, and also some fixes with do_block. I think for reading with apply, we need to write all 1s for the blocks, so I do that now (but don't calculate anything I hope).

Note that the Array::sum() parallelisatoins are straightforward, but not necessarily great. It's going to parallelise at every "dimensions" of the array, and therefore the number of threads could quickly explode.

KrisThielemans avatar May 17 '24 22:05 KrisThielemans

@NikEfth CI seems to be going well. I've pushed a few more parallelisations. Hopefully they help. Somewhat surprisingly, with g++-11 I seem to be able to do far more omp atomic than before. This could be a lot faster than a critical section (hardware lock normally), and can be used in many more places that are otherwise hard. Question is if you need it of course, so timing will be crucial to check.

Unfortunately, the apply currently uses make_fan_data and set_fan_data. I guess you're saying that these are very slow. (I don't quite understand why though). Possibly here atomic could help.

KrisThielemans avatar May 17 '24 22:05 KrisThielemans

Ok, the sum is no longer a problem. One trick fits all .... :D

NikEfth avatar May 18 '24 01:05 NikEfth

One of our actions earlier, to make the test pass, broke the new conversion. Now TOF single bin is never activated.

NikEfth avatar May 18 '24 08:05 NikEfth

One of our actions earlier, to make the test pass, broke the new conversion. Now TOF single bin is never activated.

This seems to be working fine for me. I've done the following experiment:

  • edit https://github.com/UCL/STIR/blob/6b882344b4d5ebf5da8c78dfc44f47244cc02b2c/recon_test_pack/root_header.hroot#L15-L16 with 3 settings for size of TOF bin/timing res
    • 820/400 (original)
    • 120/400
    • 120/40
  • run OSMAPOSL with LM objfun to output sensitivity files sens_TOF

I'll have to think very hard if this is what I expected, but it certainly shows that it gets into the relevant code path, so that's great.

KrisThielemans avatar May 18 '24 11:05 KrisThielemans

good catch. Another case for getting rid of this special handling relying on the generic one...

KrisThielemans avatar May 18 '24 13:05 KrisThielemans

@markus-jehl you're probably not able to, but in case you can, this PR should speed up the ML_norm code a fair amount, so you could potentially try that. It'd be good to have some reassurance on real data, as opposed to our simple test.

KrisThielemans avatar May 18 '24 19:05 KrisThielemans

Ubuntu clang job fails with

usr/bin/ld: CMakeFiles/test_BSplinesRegularGrid.dir/test_BSplinesRegularGrid.cxx.o: in function `.omp_outlined.':
test_BSplinesRegularGrid.cxx:(.text+0x39bf): undefined reference to `__atomic_load'
/usr/bin/ld: test_BSplinesRegularGrid.cxx:(.text+0x39f3): undefined reference to `__atomic_compare_exchange'
clang: error: linker command failed with exit code 1 (use -v to see invocation)usr/bin/ld: CMakeFiles/test_BSplinesRegularGrid.dir/test_BSplinesRegularGrid.cxx.o: in function `.omp_outlined.':
test_BSplinesRegularGrid.cxx:(.text+0x39bf): undefined reference to `__atomic_load'
/usr/bin/ld: test_BSplinesRegularGrid.cxx:(.text+0x39f3): undefined reference to `__atomic_compare_exchange'
clang: error: linker command failed with exit code 1 (use -v to see invocation)

Potentially related to https://gitlab.kitware.com/cmake/cmake/-/issues/23021 but it's weird tst_BSplines links ok.

KrisThielemans avatar Jun 01 '24 19:06 KrisThielemans

test_ML_norm segfaults in the MacOS Debug job

KrisThielemans avatar Jun 01 '24 21:06 KrisThielemans

Potentially related to https://gitlab.kitware.com/cmake/cmake/-/issues/23021 but it's weird tst_BSplines links ok.

Now https://gitlab.kitware.com/cmake/cmake/-/issues/26037

KrisThielemans avatar Jun 09 '24 03:06 KrisThielemans

Linking problem was fixed in #1449 which has taken the sum/find_min/max parallelisation commits, and is now merged to master, so I've merged that back on here.

I suggest we split this PR in 2: one for the sensitivity calculation, and one for the parallelisation of the norm code. We can do that again by cherry-picking relevant commits. Maybe we can resolve things here first though, as you prefer.

KrisThielemans avatar Jun 12 '24 21:06 KrisThielemans