abundance matching is very confusing.
I want to generate a selection column for a given abundance, ranked by another column. The current ad-hoc way of doing so has confused Elena, and I think it will confuse others down the road as well.
This can be accomplished with two steps:
-
generate an Abundance column from another column, using the ranking (collective argsort), (maybe divide by the volumn to convert to number density
-
generate a selection column from an Abundance column.
How does this proposal interfere with halotools and the current Halo source? @nickhand
Yes, I think that is fine. The to_halotools function of the HaloCatalog source accepts a selection keyword that specifies which halos go in to the catalog. So you just need to add the Selection column based on the Abundance to the HaloCatalog and then only those halos will be populated.
@rainwoodman where are we on this? Do we want to add a sort() / argsort() to Catalog objects? Feels like that would be nice now that the selecting subsets of catalogs is a bit easier.
We should also think through whether implementing actual slices or integer lists is useful and whether that should be collective or non-collective. I can imagine sorting by mass and then saying give me the top X objects, but that could be difficult in parallel....
Computing a sorting rank then filter is easier than sorting the actual data.
sorting rank can be done with two mpsort calls. The problem is that MP-sort only takes integer keys (it is a radix sort -- dynamic range of double is too big.)
@rainwoodman so if we want to sort by something like mass, how would we do this exactly with mpsort? We would need to make a 'u8' sort rank column first?
Yes. I'd first suppress the dynamic range with a log, then scale it up to integers. The result won't be always be exactly sorted because several floating number may map into the same integer. (hence I did not think it was a good idea to let mpsort do this). It shall be good enough for abundance matching.
Okay I think I understand what's going on here. Some simple tests seems to indicate that we can do something like:
precision = '4'
sorting_keys = np.fromstring(data.astype('f'+precision).tobytes(), dtype='u'+precision)
which should re-interpret the floating point binary representation as integers, which also preserves the rank ordering for positive input. And we can take the log of data first if we think that is necessary
No this will not work. mpsort not only need the rank order. It needs the radix to be numerical. It does a binary search for histograming. This will mess up the exponents.
On Thu, May 4, 2017 at 2:06 PM, Nick Hand [email protected] wrote:
Okay I think I understand what's going on here. Some simple tests seems to indicate that we can do something like:
precision = '4' sorting_keys = np.fromstring(data.astype('f'+precision).tobytes(), dtype='u'+precision)
which should re-interpret the floating point binary representation as integers, which also preserves the rank ordering for positive input. And we can take the log of data first if we think that is necessary
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/bccp/nbodykit/issues/304#issuecomment-299309137, or mute the thread https://github.com/notifications/unsubscribe-auth/AAIbTBKlItA6khvE1xo8wt6YMuXKhKFcks5r2j3DgaJpZM4LwmYu .