Streaming `get field` and `to text` and other utilities
Related problem
Right now a lot of nushell's utilities do not support streaming, meaning that the user has to wait before seeing any progress. If the user desires to use a stream to do something interactively, for example, piping directory a directory list to fzf to pick a file, it is required to wait for the whole stream to finish loading.
Describe the solution you'd like
I would love it if most of nushell's pipeline utilities were streaming by default, as it is in traditional shells. Currently accessing a column from a table requires the whole table to load before the column can be extracted. For example, if one likes to stream the names of files in a directory, he would be inclined to do:
ls **/* | get name
However, the get command must wait for all of ls **/* to finish loading, which is not ideal. Also, one would later want to serialize that column:
ls **/* | get name | to text
The serialization should also be streaming.
Describe alternatives you've considered
The alternative is to use text-based streaming commands or custom ones that are not built-in to nushell all the time. However, this is against nushell's philosophy of data structure oriented pipelines and would be against user-friendly experience.
It is also a downgrade from text-based shells.
Additional context and details
Issue inspired from https://github.com/nushell/nushell/issues/6174
I believe if you use each things will stream. Try this:
ls **/* | each {|i| $i.name | to text}
It does stream, but each naturally would convert each text into a table again, defeating the purpose of to text.
I had asked a question related to this on here (https://github.com/nushell/nushell/issues/6570) and Superuser: https://superuser.com/questions/1742738/skipping-lines-in-a-large-file-then-pipe-to-external-command-in-nushell
This isn't just a time performance issue, it's a massive memory performance issue. I have a 60+ GB file, and I just want to skip some lines and have the rest stream directly. I'm opening with --raw, and then converting to lines via |lines and skipping the header via |skip --
open --raw pagelinks.sql | lines | skip 35 | ???
The answer suggested something like
open --raw pagelinks.sql | lines | skip 35 | each { mysql dbname }
but that has a lot of problems. For one thing, it won't work on many SQL dumps; each line would have to be a valid SQL statement by itself. Fortunately, that's true for me. However, it isn't in most cases, and there's an even bigger problem: it makes it impossible to do this in one transaction, which makes the operation much too slow.
I found a workaround, which is actually what I want, except not so ugly 😄
# create a named pipe via mkfifo or something
mkfifo /path/to/named-pipe
# Then in a separate shell or background task of some kind:
mysql < /path/to/named-pipe
# then:
open --raw pagelinks.sql | lines | skip 35 | each { save --raw /path/to/named-pipe }
The list<string> is never collected into memory in this case; each iterates as I would expect.
I'm also not sure why this needs to buffer everything into memory, but it does:
open --raw pagelinks.sql | lines | skip 35 | save --raw /path/to/file
I understand why | save /path/to/file would buffer; it would have to in order to format the table. But I would expect that --raw wouldn't need to, and would save each string as a line or something. But clearly I misunderstood what --raw does.
So here's my proposal: a command that opens an external command (since this is really only applicable to external commands) with STDIN set to a pipe from nushell:
open --raw pagelinks.sql | lines | skip 35 | stream { mysql dbname }
# maybe a better syntax:
open --raw pagelinks.sql | lines | skip 35 | stream [program-name] [args...]
Does this make sense?
I spent a couple hours on this tonight, trying to make get stream. I didn't quite get it working, but I think I know how to accomplish it.
I agree that Nu should stream wherever possible. Issues like this are helpful for identifying places where streaming needs some work, thanks @wmstack and @hut8.
I understand why
| save /path/to/filewould buffer; it would have to in order to format the table. But I would expect that--rawwouldn't need to, and would save each string as a line or something. But clearly I misunderstood what--rawdoes.
@hut8 I don't think you misunderstood; I took a quick look at the save code and I think it just needs a bit of extra code to handle streaming when writing raw files. In general, if you see a feature that should stream but doesn't it's probably just something we missed or didn't have time for.
get and select are proving to be a little difficult. Still working on making those stream; long story short it's turned into a yak shave and some cell path error handling improvements need to happen first: https://github.com/nushell/nushell/pull/7540
to text is partially done (https://github.com/nushell/nushell/pull/7577). It now handles incoming ListStreams correctly, but it's not as smart as it could be for other data types (for which it buffers all output text; ideally it would stream the data out in an ExternalStream).