nushell icon indicating copy to clipboard operation
nushell copied to clipboard

Streaming `get field` and `to text` and other utilities

Open wmstack opened this issue 3 years ago • 3 comments

Related problem

Right now a lot of nushell's utilities do not support streaming, meaning that the user has to wait before seeing any progress. If the user desires to use a stream to do something interactively, for example, piping directory a directory list to fzf to pick a file, it is required to wait for the whole stream to finish loading.

Describe the solution you'd like

I would love it if most of nushell's pipeline utilities were streaming by default, as it is in traditional shells. Currently accessing a column from a table requires the whole table to load before the column can be extracted. For example, if one likes to stream the names of files in a directory, he would be inclined to do:

ls **/* | get name

However, the get command must wait for all of ls **/* to finish loading, which is not ideal. Also, one would later want to serialize that column:

ls **/* | get name | to text

The serialization should also be streaming.

Describe alternatives you've considered

The alternative is to use text-based streaming commands or custom ones that are not built-in to nushell all the time. However, this is against nushell's philosophy of data structure oriented pipelines and would be against user-friendly experience.

It is also a downgrade from text-based shells.

Additional context and details

Issue inspired from https://github.com/nushell/nushell/issues/6174

wmstack avatar Jul 29 '22 07:07 wmstack

I believe if you use each things will stream. Try this:

ls **/* | each {|i| $i.name | to text}

fdncred avatar Jul 29 '22 11:07 fdncred

It does stream, but each naturally would convert each text into a table again, defeating the purpose of to text.

wmstack avatar Jul 29 '22 21:07 wmstack

I had asked a question related to this on here (https://github.com/nushell/nushell/issues/6570) and Superuser: https://superuser.com/questions/1742738/skipping-lines-in-a-large-file-then-pipe-to-external-command-in-nushell

This isn't just a time performance issue, it's a massive memory performance issue. I have a 60+ GB file, and I just want to skip some lines and have the rest stream directly. I'm opening with --raw, and then converting to lines via |lines and skipping the header via |skip --

open --raw pagelinks.sql  | lines | skip 35 |  ???

The answer suggested something like

open --raw pagelinks.sql | lines | skip 35 | each { mysql dbname }

but that has a lot of problems. For one thing, it won't work on many SQL dumps; each line would have to be a valid SQL statement by itself. Fortunately, that's true for me. However, it isn't in most cases, and there's an even bigger problem: it makes it impossible to do this in one transaction, which makes the operation much too slow.

I found a workaround, which is actually what I want, except not so ugly 😄

# create a named pipe via mkfifo or something
mkfifo /path/to/named-pipe
# Then in a separate shell or background task of some kind:
mysql < /path/to/named-pipe
# then:
open --raw pagelinks.sql  | lines | skip 35 | each { save --raw /path/to/named-pipe }

The list<string> is never collected into memory in this case; each iterates as I would expect.

I'm also not sure why this needs to buffer everything into memory, but it does:

open --raw pagelinks.sql  | lines | skip 35 | save --raw /path/to/file

I understand why | save /path/to/file would buffer; it would have to in order to format the table. But I would expect that --raw wouldn't need to, and would save each string as a line or something. But clearly I misunderstood what --raw does.

So here's my proposal: a command that opens an external command (since this is really only applicable to external commands) with STDIN set to a pipe from nushell:

open --raw pagelinks.sql  | lines | skip 35 | stream { mysql dbname }
# maybe a better syntax:
open --raw pagelinks.sql  | lines | skip 35 | stream [program-name] [args...]

Does this make sense?

hut8 avatar Oct 24 '22 03:10 hut8

I spent a couple hours on this tonight, trying to make get stream. I didn't quite get it working, but I think I know how to accomplish it.

I agree that Nu should stream wherever possible. Issues like this are helpful for identifying places where streaming needs some work, thanks @wmstack and @hut8.

I understand why | save /path/to/file would buffer; it would have to in order to format the table. But I would expect that --raw wouldn't need to, and would save each string as a line or something. But clearly I misunderstood what --raw does.

@hut8 I don't think you misunderstood; I took a quick look at the save code and I think it just needs a bit of extra code to handle streaming when writing raw files. In general, if you see a feature that should stream but doesn't it's probably just something we missed or didn't have time for.

rgwood avatar Dec 09 '22 05:12 rgwood

get and select are proving to be a little difficult. Still working on making those stream; long story short it's turned into a yak shave and some cell path error handling improvements need to happen first: https://github.com/nushell/nushell/pull/7540

to text is partially done (https://github.com/nushell/nushell/pull/7577). It now handles incoming ListStreams correctly, but it's not as smart as it could be for other data types (for which it buffers all output text; ideally it would stream the data out in an ExternalStream).

rgwood avatar Dec 23 '22 00:12 rgwood