Implement per-axis periodicity and box size
This PR allows the user to specify a periodic box length in each dimension, rather than assuming a cube. Internally, Corrfunc already supports this, so the only necessary change was to add support for passing a Python tuple as the box size instead of a scalar.
This change is API and ABI backward-compatible. Currently, I've only implemented it for theory.DD, but all the other modules continue to work without modification.
@johannesulf, would you be willing to test this PR? I've tested that boxsize=L and boxsize=(L,L,L) get the same answer, but I haven't actually verified the code gets the right answer for Lx!=Ly!=Lz!
Will close #247.
Hello @lgarrison! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
There are currently no PEP 8 issues detected in this Pull Request. Cheers! :beers:
Comment last updated at 2022-07-19 15:10:00 UTC
Fantastic! I'll check this PR against halotools in the next few days and report back.
I currently don't find agreement with halotools. Take the following code that counts pairs in spheres with radius of 10 in a uniformly sampled, non-cubic box.
import Corrfunc
from Corrfunc.theory.DD import DD
from halotools.mock_observables import tpcf
N = 100000
boxsize = (100.0, 100.0, 50.0)
nthreads = 4
autocorr = 1
seed = 42
np.random.seed(seed)
X = np.random.uniform(0, boxsize[0], N)
Y = np.random.uniform(0, boxsize[1], N)
Z = np.random.uniform(0, boxsize[2], N)
r_bins = np.array([1e-10, 10])
weights = np.ones_like(X)
results = DD(autocorr, nthreads, r_bins, X, Y, Z, weights1=weights,
weight_type='pair_product', boxsize=boxsize, periodic=True)
pos = np.vstack([X, Y, Z]).T
xi = tpcf(pos, r_bins, period=boxsize)
print('Number of pairs...')
print('halotools: {:.0f}'.format(
(xi + 1) * 4 * np.pi / 3 * r_bins[-1]**3 / np.prod(boxsize) * N**2))
print('Corrfunc: {:.0f}'.format(results['npairs'][0]))
print('Expectation: {:.0f}'.format(
N**2 * 4 * np.pi / 3 * r_bins[-1]**3 / np.prod(boxsize)))
I get agreement with halotools and the expectation value (within error) for cubic boundary conditions. For the above code with a non-cubic box, Corrfunc currently does not agree with halotools. However, if I only consider points with Z<40, there is agreement with halotools. So it seems that the periodicity along the z-axis might not be taken into account by Corrfunc in the above code.
Awesome! Just tested the updated version and everything looks good.
The current PR doesn't implement non-cubic boundary conditions for Corrfunc.theory functions other than DD, right? Is it possible to also add it for the other pair counters, especially DDsmu?
Great! And yes, I haven't ported this feature to the other modules yet. It's not too bad, just involves echoing the _countpairs.c change to each module in that file (and updating the docstrings), and then updating each theory/*/*impl.c.src with the snippet from theory/DD/countpairs_impl.c.src. The last bit is updating the Corrfunc/theory/*.py interfaces (and docstrings).
@manodeep, can you review the change to DD before we proceed to the other modules?
I took one quick look and noted some comments, but I need to look more closely. Different boxsizes per dimension might impact the min-separation + periodic features, particularly in the calculation of the cell-pairs.
I agree, if you could check those areas of the code, that would be great. Fortunately, there were few places in the code that used boxsize—most functions take xdiff, ydiff, and zdiff separately. And Corrfunc has always supported non-cubic periodic boxes whenever boxsize=0 was used, although that code path was probably less tested. It's also reassuring that non-cubic DD agrees with halotools exactly!
I've implemented the semantics from https://github.com/manodeep/Corrfunc/pull/276#discussion_r882985310. Specifically, a boxsize of -1 along any dimension now means that dimension is not periodic. The surface area of the change was pretty minimal; it just affected the generation of the cell pairs.
I've tested that turning off periodicity per-dimension works by comparing to the periodic answer where the boxsize along that dimension has been extended beyond the particle distribution, such that it is effectively non-periodic. The answers agree exactly.
@manodeep, can you take a look?
Thanks for the review; I think we've uncovered an unintended behavior in the binsize calculation predating this PR. Can you take a look at https://github.com/manodeep/Corrfunc/pull/276#discussion_r888974272 and see if you agree?
Hey @manodeep, can you take a look at this when you get the chance?
I had looked at the comment and it seems fine. (Realised just now that I never "submitted" the comment - so it was pending ...)
Great! The code now uses the spatial extent instead of the boxsize, and all the tests are passing, so I think we're ready to port to the other modules. Will work on that next week.
@manodeep: xi and wp take a double boxsize argument in their C API, so they actually ignore options->boxsize. Do you think we should leave xi and wp alone in this PR, or should they also support non-cubic (periodic) boxes? If so, do you think we should add double boxsize_y, double boxsize_z to the C function, or remove double boxsize and only use options->boxsize?
We could leave the C command line as is but change the shared library to use options->boxsize. This does result in a divergence of supported options from the command-line vs the shared library.
Either way, if we support non-cubic boxes, then the normalization calculation within wp and xi will have to be updated.
The other route would be to remove the boxsize argument from the command-line. In which case, perhaps it would call for a major release rather than a minor.
(We have accumulated quite a few upgrades - so a major release could be in order)
Not sure if this answers your question - happy to discuss further to refine
I think I'd be okay with declaring that xi and wp only support cubic boxes. They are already restricted to periodic boxes, and we can point people to DD and DDrppi if they need the fully flexible, non-cubic and/or non-periodic options. It seems like overkill to change multiple levels of the API and add and test this new feature when there are good alternatives available!
I am fine with that - cubic periodic boxes should be the most common and the intended target for wp and xi anyway.
Just so I understand properly - we will keep wp and xi as is and only add the non-cubic feature to the other pair counters
Yes, exactly!
@johannesulf This PR should now support per-axis boxsize and periodicity for the other modules, including DDsmu. Would you like to check that DDsmu works as expected before we merge?
@johannesulf This PR should now support per-axis boxsize and periodicity for the other modules, including
DDsmu. Would you like to check thatDDsmuworks as expected before we merge?
Yes, wonderful! Let me do a few checks and report back to you.
I performed some initial tests of DDsmu against halotools using per-axis periodicity and the results look good.
Excellent, thanks. @manodeep, good to merge?
I did some more in-depth tests and now I get some errors but only sometimes. Let me see whether I can find out what exactly triggers it.
@lgarrison I get errors when the number of tracers is very small. For example, if I take my code above and set N=2, it'll crash for me. I'm not sure whether this is supposed to happen. In principle, with a maximum search radius of 10 and box dimensions of 100, 100 and 50, no tracer should be counted twice.
../../utils/gridlink_utils_double.c> ERROR: nlattice = 2 is so small that with periodic wrapping the same cells will be counted twice ....exiting
../../utils/gridlink_utils_double.c> Please reduce Rmax = 10.000000 to be a smaller fraction of the particle distribution region = 13.333546
../../utils/gridlink_utils_double.c> ERROR: nlattice = 1 is so small that with periodic wrapping the same cells will be counted twice ....exiting
../../utils/gridlink_utils_double.c> Please reduce Rmax = 10.000000 to be a smaller fraction of the particle distribution region = 0.001206
Received xstatus = 0 ystatus = 1 zstatus = 1. Error
@johannesulf Corrfunc requires a minimum of 3 cells per dimension and so will crash based on the calculated number of cells -- check is here. It looks like the actual particle extent is significantly smaller than 100 (or 50) - e.g., the two tracers are separated by ~13 Mpc in Y (-> only 2 cells) and ~1e-3 in Z (-> only 1 cell).
You can avoid this crash by explicitly passing the boxsize.
I just checked, the crash still happens if the boxsize is passed (as Johannes' above example does). But this makes sense, because we changed the gridding to only span the particle extent, whereas before it was using the boxsize. So now, regardless of the boxsize, we may have few cells if the particle extent is narrow.
I think the fix is actually pretty simple: only apply the 3 cell minimum if any pairs can be counted across the wrap; that is, if xdiff + Rmax >= xwrap. This is actually a little conservative, but I think will cover almost all practical cases.
Works much better. But I'm getting an out-of-bounds error when calculating an auto-correlation function with e.g. Corrfunc.theory.DD using just 1 particle, even if the particle is within the bounds.
Wonderful! Everything looks good now.
Great! @manodeep do you want to check my last two commits?
Thanks @manodeep, I've added the warning and the parentheses as you suggested.
I'm not sure I follow the suggestion about the inverse xdiff. Wouldn't using the boxsize when xdiff is zero just create a lot of empty cells to process? And if the user requested automatic detection of the particle extent, then we don't even have a sensible non-zero value of the boxsize to use.
I think the code handles the planar case fine: once all the particles are gridded, it no longer matters that they were in a plane, or even what the size of their containing cell is. The binsize calculation will still raise an error if the user tries to apply periodicity in that dimension that could cause repeat counts.