dinit dinit-iostream: Custom new special-purpose library

This is replacement for C++ standard library iostream/fstream.

See dinit-iostream.h for more information and the usage.

Closes #240

Signed-off-by: Mobin "Hojjat" Aydinfar <[email protected]>

Dec 16 '23 17:12 mobin-2008

buffer handling seems to be suboptimal, for example the buffer isn't necessarily completely full before it is flushed. Eg. suppose I write 16380 bytes followed by 10 bytes followed by 512 bytes and then flush. That should need only two writes (buffer size is 16384) but it will do three.

Sorry, this example might not be right. Consider instead: 16380 bytes followed by 8 bytes followed by 16380 bytes again.

Dec 18 '23 22:12 davmac314

@mobin-2008 you made a comment here but it seems to have been deleted. Let me know when the PR is ready for review.

Dec 27 '23 12:12 davmac314

Ready for review.

Changelog from your last review:

MAX_TYPE_NUMDIGITS() algorithm has been changed to recursive integer logarithm like function.
buf->reset() has been removed (because make no difference)
Documention has been reworded in terms of vocabulary and grammar.
A check has been added to fstream_base::open() functions to disallow null pointer/empty paths.
A check has been added to put() function to avoid undefined behavior in strlen() of null string and Segmentation fault on appending null string into buffer.
Exception that get thrown by iostream library has been specialized.

Dec 28 '23 19:12 mobin-2008

I will try to fully explain each issue, so hopefully there's no confusion:

Constructors setting buffer_failbit

The constructors should be properly documented (i.e. document that buffer allocation is done when the constructor is called, but may fail, leaving buffer_failbit set). I see now that you did improve the behaviour, but I didn't see that before because you didn't list it as one of the changes and you included many changes so there was a lot of noise.

(That is one reason why you should not add additional changes when responding to a review. It complicates things, and makes it harder to check that the issues bought up in the review last time were resolved properly).

streambuf set_buf/set_buf_with_owner function

streambuf function set_buf (now called "set_buf_with_owner") is dangerous because it removes the previous buffer even though it may have data in it, without flushing it first. I don't understand the use case for it (i.e. why it's needed) so I don't know whether the best answer is just to have set_buf (... with_owner) flush the buffer, or even just document that you must flush the (current) buffer before calling set_buf; if there is no use case then it should really just be removed (we don't want extra code that isn't used anyway).

i.e. the solution should be one of the following:

remove it
make it protected instead of public (if it only needs to be called by subclasses)
make it flush the old buffer
document that the old buffer should be flushed before calling

If the solution is any option other than "remove it", you should be able to explain why.

exceptions function

The exceptions function should behave consistently, it doesn't make sense to have a magic value (0) mean one thing and any other value mean something different. If the function does two different things it should be two different functions.

But: I also really don't see why toggling exception bits is useful, why can't I just set the bits I want set?

Example: if I write a function which takes an istream parameter (by reference) how can I ensure the behaviour (exceptions vs non)? Eg how could I ensure throw_on_buffer_fail is set for example?

void read_some_data(istream &instream) {
    // how can I turn on buffer_failbit in exceptions?

    // If I do:
    instream.exceptions(buffer_failbit);
    // Now I don't know whether buffer failure throws exceptions or not!
}

Dec 29 '23 23:12 davmac314

I will try to fully explain each issue, so hopefully there's no confusion:

Constructors setting buffer_failbit

The constructors should be properly documented (i.e. document that buffer allocation is done when the constructor is called, but may fail, leaving buffer_failbit set). I see now that you did improve the behaviour, but I didn't see that before because you didn't list it as one of the changes and you included many changes so there was a lot of noise.

(That is one reason why you should not add additional changes when responding to a review. It complicates things, and makes it harder to check that the issues bought up in the review last time were resolved properly).

construction examples now include a check to catching buffer allocation failures.

Also I wrote that message to notice there is some other changes with fixing all previous reviews but looks like it made into confusion.

streambuf set_buf/set_buf_with_owner function

streambuf function set_buf (now called "set_buf_with_owner") is dangerous because it removes the previous buffer even ...

i.e. the solution should be one of the following:

remove it ...

getting raw pointer through get_buf() is enough and I removed them.

exceptions function

The exceptions function should behave consistently, it doesn't make sense to have a magic value (0) mean one thing and any other value mean something different. If the function does two different things it should be two different functions.

But: I also really don't see why toggling exception bits is useful, why can't I just set the bits I want set?

Example: if I write a function which takes an istream parameter (by reference) how can I ensure the behaviour (exceptions vs non)? Eg how could I ensure throw_on_buffer_fail is set for example?
void read_some_data(istream &instream) {
    // how can I turn on buffer_failbit in exceptions?

    // If I do:
    instream.exceptions(buffer_failbit);
    // Now I don't know whether buffer failure throws exceptions or not!
}

exceptions() is renamed to throw_on_states() for better meaning. Also it accepts a boolean to throw or not throw on requested bits instead of toggle behavior:

...
// Caller can be sure about that this call will mark eofbit for throwing `dio::iostream_eof` exception for example
instream.throw_on_states(diostates::eofbit, true);
...

Also I added a new function called clear_throws() which disables all situation can throw exceptions instead of magic 0 parameter. I think it's enough.

Dec 30 '23 07:12 mobin-2008

I think it's enough.

Ok, but I've seen more pushes come through since. So I will wait until you request review.

Dec 30 '23 09:12 davmac314

I added a bunch of tests but looks like there is something werid with it. when it tries to close fd 0, it can't and sanitizer in my system aborts the program with (use-after-free). I can't understand what's going on (I moved usedfds to public space of test-bpsys.cc for debugging):

...
Process 23085 resuming
PASSED
Process 23085 stopped
* thread #1, name = 'iostreamtests', stop reason = breakpoint 1.1
    frame #0: 0x000055555569547a iostreamtests`dio::filein::~filein(this=0x0000555556097080) at dinit-iostream.cc:609:5
   606 	
   607 	filein::~filein() noexcept
   608 	{
-> 609 	    close();
   610 	}
   611 	
   612 	// Standard input "cin" on STDIN_FILENO file descriptor
(lldb) print fd
(int) 0
(lldb) print usedfds
(std::vector<bool>) size=4 {
  [0] = true
  [1] = false
  [2] = false
  [3] = false
}
(lldb) continue
Process 23085 resuming
Process 23085 stopped
* thread #1, name = 'iostreamtests', stop reason = breakpoint 2.1
    frame #0: 0x0000555555699d5e iostreamtests`bp_sys::close(fd=0) at test-bpsys.cc:163
   160 	
   161 	int close(int fd)
   162 	{
-> 163 	    if (size_t(fd) >= usedfds.size()) abort();
   164 	    if (! usedfds[fd]) abort();
   165 	
   166 	    usedfds[fd] = false;
(lldb) print usedfds
(std::vector<bool>) size=4 {
  [0] = true
  [1] = false
  [2] = false
  [3] = false
}
(lldb) continue
Process 23085 resuming
Process 23085 stopped
* thread #1, name = 'iostreamtests', stop reason = breakpoint 3.1
    frame #0: 0x0000555555699d75 iostreamtests`bp_sys::close(fd=0) at test-bpsys.cc:164:19
   161 	int close(int fd)
   162 	{
   163 	    if (size_t(fd) >= usedfds.size()) abort();
-> 164 	    if (! usedfds[fd]) abort();
   165 	
   166 	    usedfds[fd] = false;
   167 	    write_hndlr_map.erase(fd);
(lldb) continue
Process 23085 resuming
=================================================================
==23085==ERROR: AddressSanitizer: heap-use-after-free on address 0x502000000010 at pc 0x555555699f45 bp 0x7fffffffe5d0 sp 0x7fffffffe5c8
READ of size 8 at 0x502000000010 thread T0
    #0 0x555555699f44 in std::__1::__bit_reference<std::__1::vector<bool, std::__1::allocator<bool>>, true>::operator bool[abi:he170006]() const /usr/bin/../include/c++/v1/__bit_reference:65:35
    #1 0x555555699f44 in bp_sys::close(int) /home/mobin/Documents/git/dinit/src/tests/test-bpsys.cc:164:11
    #2 0x555555690afc in dio::fileinout_base::close() /home/mobin/Documents/git/dinit/src/tests/../dinit-iostream.cc:43:10
    #3 0x55555569547e in dio::filein::~filein() /home/mobin/Documents/git/dinit/src/tests/../dinit-iostream.cc:609:5

0x502000000010 is located 0 bytes inside of 8-byte region [0x502000000010,0x502000000018)
freed by thread T0 here:
    #0 0x555555689e0d in operator delete(void*) (/home/mobin/Documents/git/dinit/src/tests/iostreamtests+0x135e0d) (BuildId: 703a3958e660cf15)
    #1 0x55555569e044 in std::__1::vector<bool, std::__1::allocator<bool>>::~vector[abi:he170006]() /usr/bin/../include/c++/v1/vector:2137:67
    #2 0x7fffefa5833d  (/lib/ld-musl-x86_64.so.1+0x5833d) (BuildId: cdb214233e75e92f)

previously allocated by thread T0 here:
    #0 0x5555556895ad in operator new(unsigned long) (/home/mobin/Documents/git/dinit/src/tests/iostreamtests+0x1355ad) (BuildId: 703a3958e660cf15)
    #1 0x5555556a169a in std::__1::__allocation_result<std::__1::allocator_traits<std::__1::allocator<unsigned long>>::pointer> std::__1::__allocate_at_least[abi:he170006]<std::__1::allocator<unsigned long>>(std::__1::allocator<unsigned long>&, unsigned long) /usr/bin/../include/c++/v1/__memory/allocate_at_least.h:55:19
    #2 0x5555556a169a in std::__1::vector<bool, std::__1::allocator<bool>>::__vallocate[abi:he170006](unsigned long) /usr/bin/../include/c++/v1/vector:2455:29
    #3 0x7fffefabbd77 in _dlstart (/lib/ld-musl-x86_64.so.1+0xbbd77) (BuildId: cdb214233e75e92f)

SUMMARY: AddressSanitizer: heap-use-after-free /usr/bin/../include/c++/v1/__bit_reference:65:35 in std::__1::__bit_reference<std::__1::vector<bool, std::__1::allocator<bool>>, true>::operator bool[abi:he170006]() const
Shadow bytes around the buggy address:
  0x501ffffffd80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x501ffffffe00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x501ffffffe80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x501fffffff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x501fffffff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x502000000000: fa fa[fd]fa fa fa fd fd fa fa fd fd fa fa fa fa
  0x502000000080: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x502000000100: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x502000000180: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x502000000200: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x502000000280: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==23085==ABORTING

Hmm, @davmac314 there is something in test includes that close fds before my class destructer? I don't think so.

Feb 09 '24 16:02 mobin-2008

Hmm, @davmac314 there is something in test includes that close fds before my class destructer? I don't think so.

In several of your tests:

    dio::fileout test_1(1);

It's using a fixed file descriptor that will be closed at the end of the test. It should be allocating a descriptor to use for the test instead.

Feb 10 '24 00:02 davmac314

Hmm, @davmac314 there is something in test includes that close fds before my class destructer? I don't think so.

In several of your tests:
    dio::fileout test_1(1);
It's using a fixed file descriptor that will be closed at the end of the test. It should be allocating a descriptor to use for the test instead.

Hmm, None of them use fd 1, They are using allocfd() for fd allocation.

Feb 10 '24 03:02 mobin-2008

Hmm, None of them use fd 1, They are using allocfd() for fd allocation.

Looks like you changed it, it was using fd 1:

https://github.com/mobin-2008/dinit/commit/db900b2f0ee71c618a303279a4a2f57a7e39412c

The current version doesn't have any issue, it runs just fine, the error you posted doesn't happen. It only happens on the version that uses fd 1.

Edit: hmm, meson builds are failing in CI though.

Feb 10 '24 03:02 davmac314

I commented out default cout/cerr/cin objects in dinit-iostream.cc and looks like everything got resolved. I guess there is a problem at closing default fds in simulated fd management (but I couldn't find it).

Feb 10 '24 05:02 mobin-2008

Probably the test infrastructure is deconstructed before the stream objects (cout/cerr/cin). I don't think there's any way to control the order, but that explains why some builds are ok and some fail.

Commenting them out isn't a solution. Either remove them altogether, or fix it so they no longer rely on the test infrastructure on destruction (for example, close the streams from inside the test harness rather than leaving it to the destructors).

Feb 11 '24 03:02 davmac314

I changed test to close cout/cerr/cin in main() and I moved dinit-iostream.o from parent_objs to a target because linking it with other tests will cause same problems.

Feb 11 '24 12:02 mobin-2008

I am thinking about this PR and I realised I'm just not happy with some aspects of the design. It's mostly my own fault, I really should've been clearer about what I wanted from the start. I know you've put lots of work in to this and are probably getting tired of it so I will just say clearly what I think needs to be done, rather than having you to try to figure it out. I guess I was hoping you'd come to these conclusions yourself but that wasn't really fair of me.

So: can we please get rid of the throw_on_states() - I have been trying to convince myself that this is ok but it's just another aspect of the standard library that really should be gone. Functions should either throw an exception on an error condition or they shouldn't. It shouldn't be something you select dynamically, as that just leads to confusion and subtle errors. It was a mistake in the standard library, and it's not a mistake we should copy - I'm quite sure of that.

write should always throw an exception if it can't write or buffer the full message, it should never return an error (-1)
write_nothrow should behave as it does now (writes or buffers some, returns how much it wrote/buffered or -1 if it couldn't buffer any), but can we call it write_nt to make it less unwieldy (or write_nx, for "no eXceptions")
get_line should always throw an exception if it can't read a line (error, or at end-of-file)
get_line_nothrow (maybe call it get_line_nt/get_line_nx?) can behave as it does now
maybe introduce a new function, get_line_until_eof which throws an exception on error but not on end-of-file, since that's a fairly common need

The rest of the current design is basically fine (though I've yet to properly review some of it). There's still a bit of cleaning up to do.

If you think the current design is actually better than what I've outlined above or there is another option then let me know and we can discuss it further. If you are tired of working on this and want me to take over that is also fine, but I'm certainly happy if you want to continue; I just want to be clear on the right design going forward.

Feb 14 '24 12:02 davmac314

I am thinking about this PR and I realised I'm just not happy with some aspects of the design. It's mostly my own fault, I really should've been clearer about what I wanted from the start. I know you've put lots of work in to this and are probably getting tired of it so I will just say clearly what I think needs to be done, rather than having you to try to figure it out. I guess I was hoping you'd come to these conclusions yourself but that wasn't really fair of me.

So: can we please get rid of the throw_on_states() - I have been trying to convince myself that this is ok but it's just another aspect of the standard library that really should be gone. Functions should either throw an exception on an error condition or they shouldn't. It shouldn't be something you select dynamically, as that just leads to confusion and subtle errors. It was a mistake in the standard library, and it's not a mistake we should copy - I'm quite sure of that.

I started to working on it because it looked interesting and I wanted to have some real challenge in C++ so I'm so happy with what I learned :)

I think the point of a special-purpose library is to remove/change anything that doesn't fit our needs and it's normal to not having a complete plan about what it should looks like and I tried my best to figure out whats needed, whats not and so on, but you're more professional about this.

As I know C++ in fact didn't have exceptions in the beginning and basic libraries such as iostream designed in that era, when exceptions added to C++ those libraries didn't change a lot except some functions. Got it, Just drop it like other parts of C++.

write should always throw an exception if it can't write or buffer the full message, it... ...

I wil fix them all. The final design is a function will throw exceptions on failure or just return -1. Also sets of exceptions is not adjustable.

Also I'm happy with until_eof() function, looks useful.

The rest of the current design is basically fine (though I've yet to properly review some of it). There's still a bit of cleaning up to do.

If you think the current design is actually better than what I've outlined above or there is another option then let me know and we can discuss it further. If you are tired of working on this and want me to take over that is also fine, but I'm certainly happy if you want to continue; I just want to be clear on the right design going forward.

Good to hear :) As I said above, I started to working on it because it was and it is fun and continuing on something that I started to working on, is key to success.

Feb 16 '24 01:02 mobin-2008

Ok, the API looks a bit more like what I was thinking. Thanks! I still think it needs a bit of work though. Consider the following code that uses std:: streams:

    std::fstream f;
    // f.exceptions(std::ios::failbit | std::ios::badbit);
    f.open("somefile.txt", f.binary | f.out);
    f << "a message\n";
    f << "another message\n";
    f.close();

    if (f) {
        std::cout << "all succeeded!\n";
    }
    else {
        std::cout << "failure!\n";
    }

See how easy it is to check for success/failure - it's just a single check after a series of operations. If any of them fail, including the open or close, it will show the "failure!" message. Also if the first write fails ("a message") the 2nd one ("another message") isn't attempted, so there's no worry that some output in the middle will be missing due to an error.

If I uncomment the f.exceptions(...) call, then it's also even easier: I can put the whole thing in a try block or even let it propagate out of the function (if it makes sense to do that).

There's a few things that would make that a lot more difficult with this API. Eg on ostream::open:

    // Note: open() doesn't set any state bits in stream. errors should be checked through
    // its return and POSIX errno.

That seems awkward to use. Same with close. This forces individual error checks for these functions. Also, they don't throw exceptions, so I can't just wrap the whole thing in a try block. Eg:

    dio::ostream f;
    f.open("somefile");
    f.write_nx("a message\n");
    f.write_nx("another message\n");
    f.close();

    // how can I check for error condition now?

or with exceptions:

try {
    dio::ostream f;
    f.open("somefile"); // won't throw on failure
    f.write("a message\n");
    f.write("another message\n");
    f.close();  // this won't throw if flush or close fails :(
}
catch (...) {
    // handle all errors? no, because failure in open()/close() won't throw
}

Both of these should ideally be just as easy as when using std::fstream. There are a few things I can see that are making it hard to write simple code that safely does output:

open and close don't set any error state on the stream
open and close don't throw exceptions
write/write_nx don't check error state of stream before writing, early write could fail but later one could go through
write/write_nx of empty string returns failure instead of success (why?)

One other thing I noticed is this:

    if (msg == nullptr) return 0; // Avoid strlen() UB on nullptr string

As a general rule, in Dinit we don't try to detect invalid input (and ignore it). It is better to invoke UB especially if it will lead to a crash as in this case. That's what we want - to see when there are bugs, not to paper over them - nullptr is not a valid buffer, it is not something you can write, it is a bug if it appears here.

Mar 19 '24 12:03 davmac314

write/write_nx of empty string returns failure instead of success (why?)

Other things are fixed (plus this one) but my idea was that you don't probably want to write nothing! So it should return false for this.

Mar 19 '24 17:03 mobin-2008

my idea was that you don't probably want to write nothing! So it should return false for this.

I think there are plenty of times when a program might output a string that may be empty, without knowing if it is empty or not.

If that generates an error it forces the caller to first check the length, if they want to handle error on the individual write call:

ostream f;
std::string s;
...
if (!s.empty()) {
    if (!f.write(s)) {
        // error handling
    }
}

Mar 19 '24 21:03 davmac314

Will review when I get a chance

Mar 19 '24 23:03 davmac314

Sorry about push but the only change in library is removing this check:

    ostream(const int tfd, std::unique_ptr<streambuf> passed_buf) noexcept
    {
        fd = tfd;
        buf = std::move(passed_buf);
        if (!buf) buffer_failbit = true; // <-- this
    }

If you passed a nullptr buffer, you did something wrong and it should be fixed by you. This is for "don't mask bugs" policy. This is final thing, Sorry about push after review request :(

Mar 20 '24 00:03 mobin-2008

So it's allowed? (you didn't answer!)

If it's allowed, what happens to the buffer (does it get flushed or cleared?) and the error states?

If something is allowed, the behaviour should be sensible (and documented). If it is not allowed, that should be documented ("Must not be called if the stream is already open.")

I think this note is good enough:

    // Note: Opening an already opened stream is not allowed and caller must close the stream
    // before open it again.

Apr 12 '24 06:04 mobin-2008

CI failure is unrelated.

May 16 '24 19:05 mobin-2008

Also if anything is unclear, please ask for clarification.

Jun 15 '24 00:06 davmac314