posterior icon indicating copy to clipboard operation
posterior copied to clipboard

Feature Request: Support for `units` objects in `rvar` data type

Open henningte opened this issue 1 year ago • 6 comments

I often use the units package to keep track of measurement units and I think it may be interesting to support units objects with the rvar class. I wanted to know whether there is interest to include this feature into the posterior package.

Currently, it is possible to create rvar objects where the draws attribute is a units object:

suppressPackageStartupMessages(library(posterior))
#> Warning: package 'posterior' was built under R version 4.3.2
suppressPackageStartupMessages(library(units))

a <- rvar(units::set_units(1:5, "g"), dim = 1)
draws_of(a)
#> Units: [g]
#>   [,1]
#> 1    1
#> 2    2
#> 3    3
#> 4    4
#> 5    5
class(draws_of(a))
#> [1] "units"

Created on 2024-12-10 with reprex v2.0.2

And simple arithmetic operations work:

class(draws_of(a + a))
#> [1] "units"

But some operations drop the units class. For example:

class(mean(a)) # should return a units object, but returns a numeric vector
#> [1] "numeric"
class(draws_of(rvar_mean(a))) # should keep the units class, but drops it
#> [1] "matrix" "array"

I only did a couple of quick tests, so I have no complete overview, which operations are supported and which are not supported. I can also imagine that adapting some functions may be more difficult.

henningte avatar Dec 10 '24 14:12 henningte

Thank you for opening this issue! @mjskay since you are the developer of rvar, what are your thoughts on this?

paul-buerkner avatar Dec 17 '24 17:12 paul-buerkner

Hmm, in principle it seems like it should be possible to support other backing datatypes so long as they implement the necessary array operations. If we're going down that road I'd want to think about how to do it in a general enough way that we don't have to explicitly support every array-like datatype under the sun, but instead base it on some consistent extension points.

Part of the problem is that several rvar methods currently use specialized functions under the hood for performance reasons but which only (I think?) work on numeric vectors (e.g. matrixStats::col***, tensorA::mul.tensor()). One solution could be a parameterized type, say rvar[T], where T is a type (like units) implementing the necessary interface. Then we could create a default implementation of rvar methods that do not use specialized functions, and move specialized implementations to (say) rvar[numeric]. A starting point could be to factor out the code that was needed to get factor-like rvars to work, creating an rvar[numeric], rvar[factor], etc.

However, I would not want to do this using multiple dispatch as implemented in {vctrs}, because it is (frankly) a giant hack that will require a lot of boilerplate code (and may not even be doable...). Instead, I think the right solution would be to build it on top of S7. I'm not sure if S7 is quite mature enough yet for some of the things we want to do, but it's close. I have been playing with building parameterized types on top of S7 in another project and it can work pretty well (and I think they may also add some kind of official support for it). The main thing will be making sure we can retain compatibility with {vctrs}, though I assume they will make vctrs S7-compatible at some point since several of the same folks are involved in each.

mjskay avatar Dec 17 '24 19:12 mjskay

Thank you for your thoughts! In this light, I would say we may want to revisit this issue later on once S7 is more stable. Even then, it may not necessarily be a good idea to go this route, since (I assume) it would make the rvar code even more involved. Currently, we already largely rely on you, @mjskay, to maintain this (really cool) feature. And every increase in complexity of rvar may make it harder to maintain for others in case that would be needed at some point.

paul-buerkner avatar Dec 18 '24 08:12 paul-buerkner

@paul-buerkner an excellent point. Perhaps we should make maintainability a criterion of any large overhaul?

I think we could do a shift to S7 where the goal is to make rvar more maintainable, keeping feature parity. Having played with S7 quite a bit now, I think it will be much easier to read and write than vctrs. Then, if the shift to S7 works out, we could consider new features, like supporting other backing data types.

mjskay avatar Dec 18 '24 17:12 mjskay

I think that sounds like a great plan!

Matthew Kay @.***> schrieb am Mi., 18. Dez. 2024, 18:32:

@paul-buerkner https://github.com/paul-buerkner an excellent point. Perhaps we should make maintainability a criterion of any large overhaul?

I think we could do a shift to S7 where the goal is to make rvar more maintainable, keeping feature parity. Having played with S7 quite a bit now, I think it will be much easier to read and write than vctrs. Then, if the shift to S7 works out, we could consider new features, like supporting other backing data types.

— Reply to this email directly, view it on GitHub https://github.com/stan-dev/posterior/issues/385#issuecomment-2551902320, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADCW2ACGDKKZONTWEMJN6UT2GGWRDAVCNFSM6AAAAABTLKN55OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNJRHEYDEMZSGA . You are receiving this because you were mentioned.Message ID: @.***>

paul-buerkner avatar Dec 18 '24 19:12 paul-buerkner

I also think this is a reasonable plan. Thank you all for discussing my issue so quickly!

henningte avatar Dec 20 '24 15:12 henningte