mathjs icon indicating copy to clipboard operation
mathjs copied to clipboard

Added type inferencing for Vector & Matrix operations in multiply.js (+performance boost!)

Open RandomGamingDev opened this issue 2 years ago • 11 comments

The Issue

mathjs has had an "issue" with performance (some of the first search results for "mathjs is slow":

  • https://stackoverflow.com/questions/54328275/math-js-is-slow-to-multiply-2-big-matrices
  • https://www.reddit.com/r/javascript/comments/275fi9/comment/iyj6t2n/?utm_source=share&utm_medium=web2x&context=3

Except it doesn't.

Why the slow performance and all the complaints then?


mathjs is quite powerful, but the idea that you should explicitly set the numeric type of your tensors is something that people simply don't come across when reading the documentation (here are some pages from the documentation):

  • https://mathjs.org/docs/reference/functions/multiply.html
  • https://mathjs.org/docs/datatypes/matrices.html

In fact, mathjs's documentation makes it look like explicitly defining the numeric type is completely unnecessary because of the default type assignment in the config of mathjs:

  • https://mathjs.org/docs/datatypes/numbers.html "Most functions can determine the type of output from the type of input: a number as input will return a number as output, a BigNumber as input returns a BigNumber as output. Functions which cannot determine the type of output from the input (for example math.evaluate) use the default number type, which can be configured when instantiating math.js"

This leads to extremely poor performance when people try to use mathjs:

  • Rendering 1000 hyper-cubes rotated around 2 axis using vanilla mathjs, no explicit numeric types (slow & avgs ~40ms/frame on my machine): https://editor.p5js.org/PotatoBoy/sketches/uPZZwvhgC

Which wouldn't exist had these explicit numeric types been used:

  • Rendering 1000 hyper-cubes rotated around 2 axis using vanilla mathjs with explicit numeric types (fast & avgs ~1-2ms/frame on my machine): https://editor.p5js.org/PotatoBoy/sketches/DsDHNojgz

Why this wasn't done before now and how this PR fixes the issue that caused it

It seems like adding type inferencing was previously considered, but then ignored because of an issue with changing element types:

  • Issue: https://github.com/josdejong/mathjs/issues/1709
  • Issue's issue: https://github.com/josdejong/mathjs/issues/1709#issuecomment-590008781

This PR fixes this by getting the type during the tensor operation itself

  • <tensor>.getDataType() is called to infer the datatype if it wasn't explicitly defined
  • While getting inferring the type every operation may seem slow, compared to the alternative of using the generic function, it's much much faster as seen by the previous examples and the solution's example and is well worth it for the performance.
  • The datatype of the generated tensor using the inferred datatype is changed right before returning it, but not for the instantiation parameter in order to future-proof it in case future optimized methods of loading tensor data are used so that they can be used for the operation.

The Solution this PR provides

Type inferencing boosts the performance to speeds nearly indistinguishable from the speeds of explicit types

  • Rendering 1000 hyper-cubes rotated around 2 axis using type inferencing mathjs, no explicit numeric types (fast & avgs ~1-2ms/frame on my machine): https://editor.p5js.org/PotatoBoy/sketches/01IsuOlTR

Lastly, I'd still recommend that the documentation more readily states and promotes the existence of these explicit types, but I'm not sure whether or not that would fit in with what mathjs is trying to go for in its documentation, and it mostly likely won't be necessary with type inferencing added so I won't be adding documentation doing that in this PR.

RandomGamingDev avatar Feb 07 '24 02:02 RandomGamingDev

Thanks for working out this PR @RandomGamingDev . You're right, matrix operations are way faster when there is a datatype configured in the matrix. Currently the default is to leave it undefined and assume the matrix may have mixed content, which is slow.

So, good point to see how we can address this and make the "default" performance better.

Some thoughts:

  • Your solution to improve multiply will work, but only for multiply, not for other matrix operations.
  • So, I think we should look for a more generic solution.
  • The most logic one is to ensure call getArrayDataType(data) when constructing the matrix, ensuring it always has _datatype defined.
  • This will make creating a matrix slower, and I would love to have an idea on the performance impact: how much slower matrix creation becomes compared to how much we win with typical matrix operations (multiplication, addition, etc).

@gwhitney do you have any thoughts in this regard?

josdejong avatar Feb 07 '24 08:02 josdejong

The most logic one is to ensure call getArrayDataType(data) when constructing the matrix, ensuring it always has _datatype defined. This will make creating a matrix slower, and I would love to have an idea on the performance impact: how much slower matrix creation becomes compared to how much we win with typical matrix operations (multiplication, addition, etc).

The issue with that is the reason why this wasn't done before when talked about in the previous issue which is mathjs users might change the numeric type of part of the tensor and then be confused why it isn't working:

It seems like adding type inferencing was previously considered, but then ignored because of an issue with changing element types:

  • Issue: https://github.com/josdejong/mathjs/issues/1709
  • Issue's issue: https://github.com/josdejong/mathjs/issues/1709#issuecomment-590008781

Also, the performance issue of getting the numeric type every operation is offset by the performance gained.


Your solution to improve multiply will work, but only for multiply, not for other matrix operations. So, I think we should look for a more generic solution.

Yes, it's true that this PR will only improve multiply, but I don't think a more generic solution would work well. This solution is already generic enough and basically the only other operations that I see that benefit from type inferencing are here:

  • https://github.com/josdejong/mathjs/tree/develop/src/type/matrix/utils
  • https://github.com/josdejong/mathjs/blob/9fe4ffee5f85d52fd7f3e49abd204edffab64f94/src/function/matrix/dot.js

The type inferencing solution works for these as well. I'll add type inferencing for them as well.

RandomGamingDev avatar Feb 07 '24 14:02 RandomGamingDev

I could believe either of the following could be a reasonable path forward

(a) datatype be set on creation, and then any operation that would make it mixed fails unless the type is explicitly set to mixed, because we all know that 99.9% of the use is not actually mixed type; or

(b) @RandomGamingDev is right and it only matters for a handful of operations and so it's OK just to check for uniform type on every call to such operations.

Conceptually, I agree with Jos that (a) is "cleaner" but it may not be worth the trouble. One question: if we go (b), is there a minimum size for the matrix below which it's not worth checking (because we don't get back enough performance payoff to make it worth the check? Just wondering. Maybe multiplying a mixed type matrix is just so bad that even for 2x2 matrices, it's worth checking for uniform type.

gwhitney avatar Feb 07 '24 17:02 gwhitney

Conceptually, I agree with Jos that (a) is "cleaner" but it may not be worth the trouble.

The main issue I have with (a) is with backwards compatibility and debugging and because the _data parameter (despite looking like it should be private due to the prepended underscore) is modified and used directly in programs using mathjs. Changing it so that changing the types causes an error would break back-compatibility for these programs and would mean having to detect variable modifications as soon as they happen (since it sounds like you want the wrong modification of a tensor to instantly throw an error), which was the issue that stopped type inferencing from getting added in the first place and although it appears possible using certain features of JS, it looks like an overly bulky and unnecessary solution to me. Because of these reasons I don't think that (a)'s choice of AOT (ahead of time) type inferencing on tensor creation would work as well as (b)'s choice of JIT (just in time) type inferencing on tensor operation.

One question: if we go (b), is there a minimum size for the matrix below which it's not worth checking (because we don't get back enough performance payoff to make it worth the check? Just wondering. Maybe multiplying a mixed type matrix is just so bad that even for 2x2 matrices, it's worth checking for uniform type.

It might not be that bad, but even for a 2x2 matrix multiplication that's still using the general function and inferring the type for every single operation, so realistically the JIT method of just checking at the start using the decently efficient <tensor>.getDataType() method is going to be the more efficient option unless the type is mixed, in which case the performance difference is tiny compared to the cost of using the general function to infer the type for every number of the tensor which is a change that I'd say is well worth it if it means that all other tensors without an explicit numeric type get multitudes faster and so that newcomers to the library won't get confused by the "terrible" performance.

RandomGamingDev avatar Feb 07 '24 18:02 RandomGamingDev

the _data parameter (despite looking like it should be private due to the prepended underscore) is modified and used directly in programs using mathjs.

That's surely not a "supported" use of the mathjs api, or at least it very much should not be. The existence of such usage shouldn't be a factor in the choice between (a) and (b). That's not saying I am pushing hard for (a) over (b); I am leaving that up to Jos, and I'd support either direction. I don't see that it's too costly for the official api to check any matrix-modifying calls to see if they respect its current datatype.

gwhitney avatar Feb 07 '24 18:02 gwhitney

That's surely not a "supported" use of the mathjs api, or at least it very much should not be.

Yeah I agree that it shouldn't be since it should be private as shown by the prepended underscore.

The existence of such usage shouldn't be a factor in the choice between (a) and (b).

Might be a bit of a controversial opinion for some people, but I respect not respecting bad and illegal usage of a library lol

However, despite the proper setter being documented, it can be hard for many to find since it doesn't have its own section, nor an emphasis on its importance, and I think that it's important to be aware of it since the fact is that such usage is not rare, but common (here are a few examples from just a quick google search):

  • https://stackoverflow.com/a/68129698
  • https://stackoverflow.com/a/48405237
  • https://stackoverflow.com/a/65725773
  • https://gist.github.com/richwandell/208ddcb4b7c77b3489caae7a4cf54fc9
  • https://gist.github.com/proclamo-zz/97ae6e6a64c48b8f4e87
  • https://gist.github.com/sdjacobs/cff1db0bd1ac17fbd20f
  • https://snyk.io/advisor/npm-package/mathjs/functions/mathjs.range
  • https://www.geeksforgeeks.org/create-a-graph-plotter-using-html-css-and-javascript/
  • https://lab.avl.la/fourier/

I am leaving that up to Jos, and I'd support either direction.

I also support both as long as something happens to make operations with untyped tensors faster since it's causing a lot frustration with math.js, especially from newcomers, although I do still believe that the JIT approach from this PR is the best route to go and comes with little if any issues.

I'll add type inferencing to the other operations in this PR if we get the go-ahead from @josdejong.

RandomGamingDev avatar Feb 07 '24 22:02 RandomGamingDev

it can be hard for many to find since it doesn't have its own section,

Well, I mean these days there is https://mathjs.org/docs/datatypes/matrices.html#getting-and-setting-a-value-in-a-matrix And many of your links explicitly say "well, you could look at _data, but you really shouldn't" -- and they are right! So I maintain my position that these direct-usages-of-internals should not be a factor in Jos's decision, but that nevertheless, this could go either way.

gwhitney avatar Feb 07 '24 22:02 gwhitney

Well, I mean these days there is https://mathjs.org/docs/datatypes/matrices.html#getting-and-setting-a-value-in-a-matrix

I meant giving it its own page or some extra emphasis in the matrices section so that they see it before they decide to just print out the tensor and use _data

And many of your links explicitly say "well, you could look at _data, but you really shouldn't" -- and they are right!

Yes, and again:

I respect not respecting bad and illegal usage of a library lol

I just wanted to give more information to be aware of regarding how common this usage is whether or not it plays a role in the end decision.

RandomGamingDev avatar Feb 07 '24 23:02 RandomGamingDev

Thanks for your inputs guys. I'll give this some more thought next week.

I would like to get a more clear picture on the tradeoffs and how this will work out for typical use cases. If we save more in matrix operations than it costs to run getMatixType it can be worth it, but at this point it is a bit of a wild guess to me. A bit of benchmarking to get a feel for this would be helpful. And alongside that, I want to get a clear picture on whether this is only interesting for multiply or for many operations (I expect the latter).

josdejong avatar Feb 09 '24 13:02 josdejong

And alongside that, I want to get a clear picture on whether this is only interesting for multiply or for many operations (I expect the latter).

https://github.com/josdejong/mathjs/pull/3149#issuecomment-1932170349 The other operations that I see that benefit from type inferencing are here (if you choose to go down the JIT path I'll implement type inferencing for them right away :D):

  • https://github.com/josdejong/mathjs/tree/develop/src/type/matrix/utils
  • https://github.com/josdejong/mathjs/blob/9fe4ffee5f85d52fd7f3e49abd204edffab64f94/src/function/matrix/dot.js

RandomGamingDev avatar Feb 09 '24 13:02 RandomGamingDev

@RandomGamingDev I've been doing some performance testing and had a better look at your PR. This looks really promising, I can't wait to get this PR finished and merged 😎 .

Some thoughts/questions: 1. I didn't find any performance issues with most functions or operators. I noticed though that the .map() method of DenseMatrix is relatively slow. This is a off topic I think, we should address that in a separate PR (if there is room for improvement). In any case, I would like to proceed with this PR. 2. The functions multiply, divide, and pow are quite slow. Your PR improves the performance of all three of these functions dramatically, this awesome! 3. Do you see low hanging fruit to improve divide too? 😁 4. I made some inline comments in the code, can you have a look at those?

  1. Yeah, that sounds like something that could use improvement, especially if a lot of other functions use it for iteration.
  2. :D
  3. As you said previously, the performance has already been improved. As for improving it further, I don't see any specific way to improve the performance of matrix division with just the type inferencing in this PR.
  4. I've taken a look at them and added them

I'd like to note that I only added type inferencing to some of the matrix algorithms in the matrix utils since some of the callbacks used alongside the algorithms that didn't have type inferencing added to them aren't typed and thus fail when type inferencing tries to assume a type. Maybe we should change those in the future, but for now I've left those without type inferencing.

RandomGamingDev avatar Feb 18 '24 04:02 RandomGamingDev

Published now in v12.4.0, thanks again!

josdejong avatar Feb 22 '24 15:02 josdejong