cmdstan icon indicating copy to clipboard operation
cmdstan copied to clipboard

compressed output from CmdStan and input to bin/print

Open bob-carpenter opened this issue 11 years ago • 8 comments

Sebastian Weber suggested this for RStan, but it would be good to have in Stan, too. See RStan issue 38.

Boost has compression wrappers (like Java's) for streams. For instance, the bzip2 that Sebastian suggested is documented here:

http://www.boost.org/doc/libs/1_55_0/libs/iostreams/doc/classes/bzip2.html

bob-carpenter avatar Jan 28 '14 16:01 bob-carpenter

I'm currently using Stan on a remote server, and it seems like various parts of my workflow would be much quicker if everything used compressed files. I'm in the process of patching rstan to accommodate compressed files, and it would be very helpful if Stan could pre-compress everything.

Would the bzip2 streaming compression mentioned above be something that could be added to the next version (i.e. v2.12.0++)? I don't speak C++ well enough to add it myself, but it seems like it might be pretty straightforward. Thanks for all of your hard work on Stan!

davharris avatar Sep 21 '16 15:09 davharris

Boost provides a gzip (not bzip2) wrapper:

http://www.boost.org/doc/libs/1_60_0/libs/iostreams/doc/classes/gzip.html

for zlib. zlib appears to be MIT licensed. The more third party libraries we add the more complicated our licensing and build process becomes, so I'm not sure we want to do this. It might be better for us to pipe the uncompressed output through system utilities rather than trying to build it into our system.

bob-carpenter avatar Sep 21 '16 19:09 bob-carpenter

It provides both: http://www.boost.org/doc/libs/1_60_0/libs/iostreams/doc/classes/bzip2.html

It looks like we could just copy the lib into the tree and use it (at least I don't see any problem with that): http://www.bzip.org/1.0.5/bzip2-manual-1.0.5.html

I think this could make sense after we refactor and decide more on the output/binary format.

sakrejda avatar Sep 21 '16 19:09 sakrejda

If it's just a copy-paste, then my preference would be to make it an option sooner rather than later. But if it would make the refactor more complicated, then delaying would definitely make sense.

davharris avatar Sep 22 '16 00:09 davharris

@davharris: We take pull requests! This one (if the license is really compatible) would need 1) a writer class like the current stream_writer but with the compressing streams from boost; and 2) test (probably just one for each of the available functor calls to the writer).

It's a pretty straightforward project and if you dig up the license details (theyr'e mostly at the bzip.org site) you could get an answer out of the dev list on whether it's reasonable to include or not. We try to keep discussion like that out of hte issues.

sakrejda avatar Sep 22 '16 01:09 sakrejda

It's never just a simple cut-and-paste with software with this many moving parts.

Boost only provides the wrappers. The underlying software needs to be built cross-platform if it's not header only in a way that's compatible with the rest of Stan (can't do gross top-level usings, for instance); the compatibility will take some code review of the to-be-included package. And we need a full set of tests to make sure it'll work. And of course, we'll only take it in the core (stan-dev/stan) if the license is no more restrictive than BSD.

On Sep 21, 2016, at 9:45 PM, Krzysztof Sakrejda [email protected] wrote:

@davharris: We take pull requests! This one (if the license is really compatible) would need 1) a writer class like the current stream_writer but with the compressing streams from boost; and 2) test (probably just one for each of the available functor calls to the writer).

It's a pretty straightforward project and if you dig up the license details (theyr'e mostly at the bzip.org site) you could get an answer out of the dev list on whether it's reasonable to include or not. We try to keep discussion like that out of hte issues.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

bob-carpenter avatar Sep 22 '16 14:09 bob-carpenter

Okay, thanks. Looks like Boost would use zlib under the hood, and it's distributed under its own license. It looks BSD-compatible to my non-lawyer eyes (FSF says it's GPL compatible, but I don't see anything BSD specific and so I'm interpreting the text myself).

I can't find a direct link to the bz2 license, but when I download the source and read the LICENSE file, it looks like a custom mix of several licenses (I see text that looks copied from zlib, MIT, and BSD). It also looks BSD-compatible, but it's not endorsed by OSI or FSF as far as I can tell.

davharris avatar Sep 23 '16 19:09 davharris

For the record, zlib is considered a free use license:

https://en.wikipedia.org/wiki/Zlib_License https://www.gnu.org/licenses/license-list.html#ZLib http://directory.fsf.org/wiki/Zlib https://opensource.org/licenses/Zlib

aadler avatar Sep 23 '16 21:09 aadler

Closing in favor of the newer https://github.com/stan-dev/cmdstan/issues/1035

WardBrian avatar Feb 17 '24 17:02 WardBrian