NumericTransformer
We're running into a common issue where nodes that should be able to support either DenseVector[Double] or DenseVector[Float] generically are not because breeze supports these two datatypes inconsistently and converting to/from some input type generically is hard.
@shivaram suggested we create a new "NumericTransformer" API which handles this in a type-safe way and hides some of the complexity under the covers. Things this API might have in it include static checking for input/output types, efficient implicit conversions for the case where conversion is a noop, etc.
More discussion on #74
cc @tomerk
@etrain and @shivaram How would you guys feel about this:
def reducePoint[T : FloatOrDouble](x: DenseVector[T], mat: DenseMatrix[T]):
DenseVector[T] = {
val ops = implicitly[FloatOrDouble[T]]
import ops._
(x.t * mat).t
}
also writable as:
def reducePoint[T](x: DenseVector[T], mat: DenseMatrix[T])(implicit ops: FloatOrDouble[T]):
DenseVector[T] = {
import ops._
(x.t * mat).t
}
https://github.com/scalanlp/breeze/issues/404#issuecomment-100368432
Just to make sure - if we had a "NumericTransformer" trait I imagine that both the implicit ops argument and the "import ops._" could happen inside the trait definition? in which case we could write reducePoint[T] pretty much as we had it before, right?
I'm not sure. It's a little unclear to me how scala implicit imports work, and it looks like we have to do the implicitly import within the method
— Best Wishes, Tomer Kaftan
On Fri, May 8, 2015 at 3:29 PM, Evan Sparks [email protected] wrote:
Just to make sure - if we had a "NumericTransformer" trait I imagine that both the implicit ops argument and the "import ops._" could happen inside the trait definition? in which case we could write reducePoint[T] pretty much as we had it before, right?
Reply to this email directly or view it on GitHub: https://github.com/amplab/keystone/issues/77#issuecomment-100385683
I'm able to get this to work:
case class LinearMapper[T: BreezeNumeric](x: DenseMatrix[T])
extends NumericTransformer[T] {
import ops._
/**
* Apply a linear model to an input.
* @param in Input.
* @return Output.
*/
def apply(in: DenseVector[T]): DenseVector[T] = {
x.t * in
}
}
But I can't seem to get around the import ops._ even by putting it in the parent
This is not bad ! I can live with import ops._ once per class
Hmm with an extra ~100 lines of code in NumericTransformer individually referencing every implicit def in MutableOptimizationSpace I can make it so the subclasses don't need import ops._.
Thoughts?
+1 for a scary unreadable NumericTransformer if it means none of us ever have to think about this again ;)
Until breeze changes the interface of MutableOptimizationSpace. (I have no idea how likely that is though)
Agree with @etrain
How hard would it be to create our own "Space" patterned after MOS? that way we control the interface... i don't know how much that buys us but maybe we can cut out some stuff that isn't necessary, etc.
(This seemed to be what David Hall was suggesting yesterday, anyway)
The problem is actually somewhat unrelated and comes down to something along the lines of the following:
- to capture multiple implicits via a single context bound (or a single parameter of implicit evidence), you need to define the other implicits you're trying to capture inside that one implicit val
- to get the compiler to find the implicits defined within the initial implicit val, you have to import it's scope
import ops._ - imported scopes inside a class scope aren't inherited by the subclasses (I think this has to do w/ scala treating the class definition like a constructor)
If you google "scala inherit imports" you'll see examples of people being told it isn't possible. The solution is to explicitly capture the implicits we're interested in as implicit defs of the superclass (NumericTransformer) via e.g.
implicit def implicitDef = ops.implicitDef
but we have to do this for every single implicit defined inside MutableOptimizationSpace that we care about (and I have no idea which ones we do vs. don't)
Okay, I've run into some issues w/ broadcasting, but hopefully will get a response soon to my Breeze issue.
Side note: It concerns me how much I've learned about scala implicits -_-
Should we remove this from release 0.1 while we wait on https://github.com/scalanlp/breeze/issues/404#issuecomment-100368432
Removing this from the Release 0.1 milestone - but let's keep our eye on this breeze issue.
@etrain how would you guys feel about an interface kind of like this?
abstract class NumericTransformer[T : BreezeNumeric] extends Transformer[DenseVector[T], DenseVector[T]] {
def apply(in: DenseVector[T]): DenseVector[T] = {
apply(in.asDenseMatrix).flatten()
}
def apply(in: DenseMatrix[T]): DenseMatrix[T]
def onMatrix: NumericMatrixTransformer[T] = NumericMatrixTransformer(this)
}
abstract class NumericEstimator[T : BreezeNumeric] extends Estimator[DenseVector[T], DenseVector[T]] {
def fit(data: RDD[DenseVector[T]]): NumericTransformer[T]
def onMatrix: NumericMatrixEstimator[T] = NumericMatrixEstimator(this)
}
case class NumericMatrixTransformer[T : BreezeNumeric](x: NumericTransformer[T]) extends Transformer[DenseMatrix[T], DenseMatrix[T]] {
def apply(in: DenseMatrix[T]): DenseMatrix[T] = x.apply(in)
}
case class NumericMatrixEstimator[T : BreezeNumeric](x: NumericEstimator[T]) extends Estimator[DenseMatrix[T], DenseMatrix[T]] {
def fit(data: RDD[DenseMatrix[T]]): NumericMatrixTransformer[T] = {
x.fit(data.flatMap(MatrixUtils.matrixToRowArray)).onMatrix
}
}
I'm also starting to wonder if maybe using DenseVector as the canonical features representation instead of a DenseMatrix with only one row is more trouble than it's worth.
Especially because it's easier to translate Matlab code to work on DenseMatrix than DenseVector
I think it's a good idea to keep a separation between Matrix and Vector - while a matrix generalizes a vector, it's good for interfaces to reflect the logical intent of the operators, and probably more natural for Transformer developers to be operating on DenseVector than DenseMatrix.
Also - we should shy away from choosing interfaces because it makes translating from Matlab easier. Instead, let's try and focus on having the best possible interfaces so that we don't accidentally assume some of Matlab's deficiencies.
Fair enough. With that in mind how do you feel about the above interfaces?
I guess you would prefer requiring the single vector impl to be implemented and having a default matrix impl instead of the above?
Yeah, that could work. In general I thought this issue was about centralized handling f the Double vs. Float problem - is this addressed via BreezeNumeric and if so, how does that work?
It is, but there are enough cases where we want to do an "on matrix" / batch operation on what's otherwise a normal NumericTransformer that I think this is a relatively reasonable place to include that functionality (and allow overriding for optimized implementations).
well, keep in mind that most of the time we're talking about batching matrix operations together, it's probably going to be about converting an entire partition into a row matrix and allowing for an override of default behavior that operates on that matrix.
but yes - i agree this is a fine place for that functionality!
There's that case and there's all the cases in the image pipelines, or e.g. LDA on the frames in the TIMIT pipeline
Right, right. I think whatever we figure out for the DenseVector case is also going to work in the DenseMatrix case (w.r.t. the Float/Double thing) and I'm unsure if we want to put those all under the "NumericTransformer" umbrella or not. This actually might be a reasonable place to think about using mixins.
So, I actually looked at Mixins for this previously. To be able to mix in something you have to construct it with the "new" command, so you can't use a case class (or any object) constructor