make `invert-categorical-map` more strict on unknown reverse mapping values
In order to make categorical mapping related code less brittle, I think we should check and fail in more situations, one is this one:
(require '[tech.v3.dataset.categorical :as ds-cat]
'[tech.v3.dataset.modelling :as ds-mod]
'[tech.v3.dataset :as ds])
(def cat-map
(->
(ds/->dataset {:a [:x :y]})
(ds-cat/fit-categorical-map :a)))
(ds-cat/invert-categorical-map (ds/->dataset {:a [0.342 1.6657]})
{:src-column :a
:lookup-table (:lookup-table cat-map)})
The initial mapping was derived as x -> 1 and y -> 0, but the current code happily maps back 0.342. This should fail in my view, in the same way as other numbers like 3 and 4 fail: " Unable to find src value for numeric value 0.342"
Not sure really what to do here. If you had chosen values that do not round to 0 and 1 you would have gotten an exception, perhaps we should use Math/round as opposed to a pure long cast.
This looks error prone to me, but not sure what to fix neither. The below mapping back works due to the long cast
(->(ds/->dataset {:x [:a :b]})
(ds/categorical->number [:x])
:x
meta
:categorical-map
:lookup-table)
;; => {:a 0, :b 1}
| :x |
|----:|
| 0.0 |
| 1.0 |
I would expect that the above produces a look up map:
{:a 0.0., :b 1.0} and that all values except 0.0 and 1.0 would fail when mapping back.
The issue there is floating point comparison