stdlib icon indicating copy to clipboard operation
stdlib copied to clipboard

[RFC]: add support for pretty printing tabular data in the REPL

Open kgryte opened this issue 1 year ago • 10 comments

Description

This RFC proposes adding support for displaying tabular data in the REPL. Possible signature:

table( data[, n] )

It could work in a manner similar to head/tail, where, if n > 0, show the first n rows, and, if n < 0, show the last n rows.

The function could also support options, such as maximum cell width (e.g., to enforce showing abbreviated cell contents (such as truncating long strings)).

If data is an array of objects, could use keys as headers. Otherwise, if array of arrays, could support providing a list of headers.

Related Issues

No.

Questions

  • Generating ASCII tables is likely to be applicable beyond just the REPL. So one question is whether the implementation for ASCII tables should live outside the REPL (e.g., in @stdlib/plot/ascii/table or similar) and then the REPL would just use that package for displaying tabular data? This probably makes sense, and we'd just need to define the API surface of the ASCII table package.
  • We should also support ndarray data.

Other

ASCII tables are fairly common, so prior art should be relatively easy to find. For example,

  • https://github.com/tecfu/tty-table
  • https://github.com/sorensen/ascii-table

Checklist

  • [X] I have read and understood the Code of Conduct.
  • [X] Searched for existing issues and pull requests.
  • [X] The issue name begins with RFC:.

kgryte avatar Mar 27 '24 06:03 kgryte

One option would be to have the package live in @stdlib/plot/table with a base implementation and then have several dedicated implementations for different output formats, e.g. ASCII printout for the REPL or GitHub Flavored Markdown.

Planeshifter avatar Jun 13 '24 14:06 Planeshifter

What exactly would a "base" implementation do? Meaning, how feasible is it to actually make a "base" implementation? My sense is that we'd first need to create concrete implementations and then back out common logic, rather than the other way around. Atm, it's hard for me to envision what a base implementation would look like and whether it's even possible.

kgryte avatar Jun 13 '24 15:06 kgryte

There may be common logic around defining column widths (number of visible characters) and subsequent content truncation, or precision, but I'd be interested in seeing a concrete implementation first.

Not wholly opposed to a @stdlib/plot/table namespace, but I think this needs to be fleshed out a bit more.

kgryte avatar Jun 13 '24 15:06 kgryte

I suppose we have some precedent for this with @stdlib/plot/sparklines. The plot namespace is generally a bit messy. Nevertheless, a @stdlib/plot/table namespace seems fine.

kgryte avatar Jun 13 '24 15:06 kgryte

Yes, @stdlib/plot/sparklines was the precedent I was thinking of. Not sure how much can be abstracted into a base implementation. But having multiple output format (ASCII, GFM, HTML, LaTeX down the road) is something I do believe we should support.

@Snehil-Shah will probably start with one implementation for the ASCII case in @stdlib/plot/table and propose an API with a set of supported options and we can refine things from there.

Planeshifter avatar Jun 13 '24 15:06 Planeshifter

@kgryte @Planeshifter Found this package. It supports the output type using the argument tablefmt, which can be different ASCII and unicode outputs or also HTML and LaTeX. You have some ASCII designs (with names like plain, simple etc, just like we do with unicode sparklines), some unicode designs and some markup languages.

I think we can have sub packages inside the @stdlib/plot/table namespace, like ASCII, unicode, and markup.

Also I had a doubt, should I include box drawing characters (like '┐', '─') along with the ASCII designs or should I stick to only ASCII? Or should I include them in unicode? From what I read they are included with the extended ASCII sets so should be compatible with all terminal environments.

Snehil-Shah avatar Jun 14 '24 20:06 Snehil-Shah

I am not sure it makes sense to distinguish ascii from unicode. In which case, I'd probably lean toward @stdlib/plot/table/unicode as the package name. And, while python-tabulate is a good reference, I don't think it makes sense to exactly follow their API. Rather than the tablefmt kwarg, I'd be tempted to generalize to allow specifying arbitrary border/separator characters. We have a similar idea in the REPL presentation framework.

In which case, re: box drawing characters, if we supported, e.g., top-left-corner, top-right-corner, bottom-left-corner, and bottom-right-corner options, users could specify whatever character(s) they want. Could also borrow some inspiration from CSS in terms of how options could be specified (e.g., shortcut options: corners: '+ + + +'). The above is not edict, but fodder for potential API design brainstorming.

kgryte avatar Jun 14 '24 22:06 kgryte

@Planeshifter @kgryte understood. Also I'll be following @stdlib/plot/sparklines namespace as a reference implementation if that's fine? With the base implementation mostly doing the parsing of input data into headers (Array) and rows (Array<Array>). Also should I break it down into multiple PRs? like starting with a base implementation, and then unicode?

Snehil-Shah avatar Jun 15 '24 11:06 Snehil-Shah

@Snehil-Shah As a first step, I would not create a "base" implementation. First, create the Unicode/ASCII implementation with an emphasis on clear separation of concerns. You really won't know what the contours of a "base" package should be until you've done a couple implementations (e.g., Unicode/ASCII, Markdown, LaTeX, HTML, etc).

kgryte avatar Jun 15 '24 19:06 kgryte

For reference, when I wrote the sparkline packages, I started with concrete sparkline implementations and then extracted out the common components to allow for code reuse. I suggest following a similar protocol here.

kgryte avatar Jun 15 '24 19:06 kgryte