Make it easier to retrieve multiple results from an Acc run.
I have written some Accelerate programs in which I end up using multiple "run"s, often just for debugging or diagnostic purposes, for example, to print intermediate results in the computation.
The frustrating thing is that because it includes array level tuples and Let's, Accelerate compiles arbitrary graphs of array operations. So of course it's already possible to extract many arrays of varying types from a single Accelerate run.
However, putting all the results into a big tuple of arrays can be inconvenient, especially with all the lifting and unlifting business.
Can anyone think of a way to make it easer to construct entire Accelerate computation graphs with multiple outputs? Perhaps the analogy is to the State monad vs. ST. It would be nice to be able to virtually "run" accelerate multiple times within a monad, but really only call a single run.
Have you got a small example? (I always find it easier to discuss design with concrete code.)
Yep, I'll paste it when I can boil it down. I was computing ANOVA but was accepting Acc (Array...) as input rather than (Array).
Thus if I wanted to ask any questions of the input arrays , I had to do a "run". In particular I wanted to check dimensions of the inputs to determine downstream actions and error conditions at the Haskell level. In my ideal world I guess I would be able to do a single run, but receive that dimension information back from the GPU as soon as its ready. Which, I guess could make the rest of the Acc computation speculative -- if there's an error or the results otherwise become unneeded.
I also meant to look at the mandel example Simon came up with. He felt a need to write the initial version with multiple runs. But I think in that case it was to accomplish iteration, right?
I think my library function above represented a best practice -- Acc libraries should probably not call "run" internally, rather they should be combinators with Acc-typed inputs and outputs.
I agree that Acc libraries should probably not call run internally, but rather should be combinators with Acc-typed inputs and outputs.
And yes, Simon's example was about iteration.