datamash icon indicating copy to clipboard operation
datamash copied to clipboard

mirror of GNU Datamash. Send questions/comments/bugs to [email protected]

GNU Datamash

GNU Datamash is a command-line program which performs basic numeric,textual and statistical operations on input textual data files.

it is designed to be portable and reliable, and aid researchers to easily automate analysis pipelines, without writing code or even short scripts.

Home page: https://www.gnu.org/software/datamash

Usage

See datamash --help for basic usage information.

See man datamash for examples and operation details.

For the instrucions manual, see info datamash or visit https://www.gnu.org/software/datamash/manual/

Examples

What's the sum and mean of the values in field 1 ?

$ seq 10 | datamash sum 1 mean 1
55 5.5

Given a file with three columns (Name, College Major, Score), what is the average, grouped by college major?

$ cat scores.txt
John       Life-Sciences    91
Dilan      Health-Medicine  84
Nathaniel  Arts             88
Antonio    Engineering      56
Kerris     Business         82
...


# Sort input and group by column 2, calculate average on column 3:

$ datamash --sort --group 2  mean 3 < scores.txt
Arts             68.9474
Business         87.3636
Health-Medicine  90.6154
Social-Sciences  60.2667
Life-Sciences    55.3333
Engineering      66.5385

See more examples at https://www.gnu.org/software/datamash/examples/

Download and Installation

Download the latest source code at https://www.gnu.org/software/datamash .

General installation commands:

$ tar -xzf datamash-[VERSION].tar.gz
$ cd datamash-[VERSION]
$ ./configure
$ make
$ make check
$ sudo make install

See Platform/OS-specific download instructions at https://www.gnu.org/software/datamash/download/

To build from latest git sources, see the HACKING.md file. This file is available when cloning from git, but is not distributed in the tar archive. To clone the git repository run git clone git://git.savannah.gnu.org/datamash.git It is also available online at https://git.savannah.gnu.org/cgit/datamash.git/tree/HACKING.md

BASH Auto-completion

The datamash package inclueds a bash auto-completion script. The installation location can be controlled using

./configure --with-bash-completion-dir=[no|local|global|PATH]

The options are:

  • local - install under the package's $PREFIX path. typically /usr/local/share/datamash/bash-completion.d/ , but can be changed with ./configure --prefix. This is the default.

  • no - do not install the bash completion script.

  • [PATH] - install into the PATH specified on the command line, e.g. ./configure --with-bash-completion-dir=/for/bar/bash-completion.d/

  • global - install into the system's global bash-completion directory, as reported by pkg-config. This will be the result of: pkg-config --variable=completionsdir bash-completion Which is commonly /usr/share/bash-completion/completions or /etc/bash.d. If pkg-config is not found or if pkg-config does not have the config (.pc) file for the bash-completion package, defaults to 'local'.

local is the default, and should be used particularly if installing under a non-default --prefix without root permissions. global should be used if you are installing to default location (/usr/local) and have root permissions (e.g. sudo make install). Using custom PATH or global should be used when packaging datamash for further distribution.

Questions and Bug Reports

  • Please send questions and bug reports to [email protected] .
  • Searchable archive at https://lists.gnu.org/archive/html/bug-datamash .
  • Subscribe at https://lists.gnu.org/mailman/listinfo/bug-datamash .

Copyright and License

Copyright (C) 2013-2021 Assaf Gordon [email protected]

License: GPL Version 3 (or later).

For any copyright year range specified as YYYY-ZZZZ in this package note that the range specifies every single year in that closed interval.