compressed output from CmdStan and input to bin/print
Sebastian Weber suggested this for RStan, but it would be good to have in Stan, too. See RStan issue 38.
Boost has compression wrappers (like Java's) for streams. For instance, the bzip2 that Sebastian suggested is documented here:
http://www.boost.org/doc/libs/1_55_0/libs/iostreams/doc/classes/bzip2.html
I'm currently using Stan on a remote server, and it seems like various parts of my workflow would be much quicker if everything used compressed files. I'm in the process of patching rstan to accommodate compressed files, and it would be very helpful if Stan could pre-compress everything.
Would the bzip2 streaming compression mentioned above be something that could be added to the next version (i.e. v2.12.0++)? I don't speak C++ well enough to add it myself, but it seems like it might be pretty straightforward. Thanks for all of your hard work on Stan!
Boost provides a gzip (not bzip2) wrapper:
http://www.boost.org/doc/libs/1_60_0/libs/iostreams/doc/classes/gzip.html
for zlib. zlib appears to be MIT licensed. The more third party libraries we add the more complicated our licensing and build process becomes, so I'm not sure we want to do this. It might be better for us to pipe the uncompressed output through system utilities rather than trying to build it into our system.
It provides both: http://www.boost.org/doc/libs/1_60_0/libs/iostreams/doc/classes/bzip2.html
It looks like we could just copy the lib into the tree and use it (at least I don't see any problem with that): http://www.bzip.org/1.0.5/bzip2-manual-1.0.5.html
I think this could make sense after we refactor and decide more on the output/binary format.
If it's just a copy-paste, then my preference would be to make it an option sooner rather than later. But if it would make the refactor more complicated, then delaying would definitely make sense.
@davharris: We take pull requests! This one (if the license is really compatible) would need 1) a writer class like the current stream_writer but with the compressing streams from boost; and 2) test (probably just one for each of the available functor calls to the writer).
It's a pretty straightforward project and if you dig up the license details (theyr'e mostly at the bzip.org site) you could get an answer out of the dev list on whether it's reasonable to include or not. We try to keep discussion like that out of hte issues.
It's never just a simple cut-and-paste with software with this many moving parts.
Boost only provides the wrappers. The underlying software needs to be built cross-platform if it's not header only in a way that's compatible with the rest of Stan (can't do gross top-level usings, for instance); the compatibility will take some code review of the to-be-included package. And we need a full set of tests to make sure it'll work. And of course, we'll only take it in the core (stan-dev/stan) if the license is no more restrictive than BSD.
On Sep 21, 2016, at 9:45 PM, Krzysztof Sakrejda [email protected] wrote:
@davharris: We take pull requests! This one (if the license is really compatible) would need 1) a writer class like the current stream_writer but with the compressing streams from boost; and 2) test (probably just one for each of the available functor calls to the writer).
It's a pretty straightforward project and if you dig up the license details (theyr'e mostly at the bzip.org site) you could get an answer out of the dev list on whether it's reasonable to include or not. We try to keep discussion like that out of hte issues.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.
Okay, thanks. Looks like Boost would use zlib under the hood, and it's distributed under its own license. It looks BSD-compatible to my non-lawyer eyes (FSF says it's GPL compatible, but I don't see anything BSD specific and so I'm interpreting the text myself).
I can't find a direct link to the bz2 license, but when I download the source and read the LICENSE file, it looks like a custom mix of several licenses (I see text that looks copied from zlib, MIT, and BSD). It also looks BSD-compatible, but it's not endorsed by OSI or FSF as far as I can tell.
For the record, zlib is considered a free use license:
https://en.wikipedia.org/wiki/Zlib_License https://www.gnu.org/licenses/license-list.html#ZLib http://directory.fsf.org/wiki/Zlib https://opensource.org/licenses/Zlib
Closing in favor of the newer https://github.com/stan-dev/cmdstan/issues/1035