

* Noteworthy changes in release 1.9 (2025-04-05) [stable]

** Changes in Behavior

  datamash(1), decorate(1): Add short options -h and -V for --help and --version
  respectively.

  datamash(1): the rand operation now uses getrandom(2) for generating a random
  seed, instead of relying on date/time/pid mixing.

** New Features

  datamash(1): add operation dotprod for calculating the scalar product of two
  columns.

  datamash(1): Add option -S/--seed to set a specific seed for pseudo-random
  number generation.

  datamash(1): Add option --vnlog to enable experimental support for the vnlog
  format. More about vnlog is at https://github.com/dkogan/vnlog.

  datamash(1): -g/groupby takes ranges of columns (e.g. 1-4)

** Bug Fixes

  datamash(1) now correctly calculates the "antimode" for a sequence
  of numbers.  Problem reported by Kingsley G. Morse Jr. in
  <https://lists.gnu.org/archive/html/bug-datamash/2023-12/msg00003.html>.

  When using the locale's decimal separator as field separator, numeric
  datamash(1) operations now work correctly.  Problem reported by Jérémie
  Roquet in
  <https://lists.gnu.org/archive/html/bug-datamash/2018-09/msg00000.html>
  and by Jeroen Hoek in
  <https://lists.gnu.org/archive/html/bug-datamash/2023-11/msg00000.html>.

  datamash(1): The "getnum" operation now stays inside the specified field.


* Noteworthy changes in release 1.8 (2022-07-23) [stable]

** Changes in Behavior

  Schedule -f/--full combined with non-linewise operations for deprecation.
  In a future release, -f/--full will only be usable with operations where
  it makes sense. For now, we print a warning to stderr when -f/--full is
  used with non-linewise operations, and such usage will no longer be
  supported.

  The bin operation now uses more intuitive bins. Previously, a command
  such as `datamash bin 1 <<< -0` would output -100; and -100 did not fall
  in its own bin. We now require all bins to take the form `[nx,(n+1)x)`
  with integer n and bin width x. We discard the sign on -0 and gate such
  inputs into the [0,x) bin.

  Operations taking more than one argument now provide more complete output
  with --header-out. Previously, an operation such as `pcov x:y` would
  produce an output header like `pcov(y)`, discarding the `x`. The new
  behavior will output header `pcov(x,y)`.

  datamash(1) no longer ignores --output-delimiter with the rmdup operation.

** New Features

  New datamash option --sort-cmd argument to specify the program used
  by the -s option to sort input, plus enhancements to the security and
  portability of building sort command lines.

  New datamash option -c/--collapse-delimiter=X argument uses character
  X instead of comma between values in collapse and unique lists.

  New datamash operations: mean square (ms) and root mean square (rms).

  Decorate now supports sorting IP addresses of both versions 4 and 6
  together. IPv4 addresses are logically converted to IPv6 addresses,
  either as IPv4-Mapped (ipv6v4map) or IPv4-Compatible (ipv6v4comp)
  addresses.

  Add two command aliases:
    'echo' may now be used instead of 'cut'.
    'uniq' may now be used instead of 'unique'.

** Improvements

  Updated the bash completion script to reflect recent additions.

** Bug Fixes

  Datamash now passes the -z/--zero-terminated flag to the sort(1) child
  process when used with "--sort --zero-terminated". Additionally,
  if the system's sort(1) does not support -z, datamash reports the error
  and exits. Previously it would omit the "-z" when running sort(1),
  resulting in incorrect results.

  Documentation fixes and spelling corrections.

  Incorrect format in a decorate(1) error breaking compilation on some
  systems.

  datamash(1), decorate(1): Fix some minor memory leaks.

  datamash(1) no longer crashes when the unique or countunique operations
  are used with input data containing NUL bytes.  The problem was reported
  in https://lists.gnu.org/archive/html/bug-datamash/2020-11/msg00001.html
  by Catalin Patulea.

  datamash(1) no longer crashes when crosstab with --header-in is called
  by field name instead of index. I.e. `datamash --header-in ct x,y` now
  works as expected.


* Noteworthy changes in release 1.7 (2020-04-23) [testing]

** New Features

  decorate(1): new program - sorts input in non-standard ordering, e.g.
  IPv4, IPv6, roman numerals.

  New operations: sha224/sha384.

  New operations: geomean (Geometric mean) and harmmean (Harmonic mean).


* Noteworthy changes in release 1.6 (2020-02-24) [stable]

** Bug Fixes

  The 'gutnum' operation (introduced in vresion 1.5) now correctly
  prints detected numbers without truncating them.


* Noteworthy changes in release 1.5 (2019-09-17) [stable]

** New Features

  Datamash now accepts backslash-escaped characters in field names.
  This allows working with named fields containing dash/mins,colons,commas
  or field names starting with digits (Note the interplay between
  backslash and shell quoting). The following are equivalent,
  and sum an input field named 'FOO-BAR':
      datamash -H sum FOO\\-BAR < input.txt
      datamash -H sum 'FOO\-BAR' < input.txt
      datamash -H sum "FOO\\-BAR" < input.txt

  New operations: dirname, basename
  These behave just like dirname(1) and basename(1):
     $ echo /home/foo/bar.txt | datamash dirname 1 basename 1
     /home/foo    bar.txt

  New operations: extname, barename
  'extname' extract the extension of the file name.
  'barename' (not to be confused with 'basename') extract the basename
  without the extension.
  Example:
     $ echo /home/foo/bar.tar.gz | datamash barename 1 extame 1
     bar         tar.gz

  New operation: getnum
  This operation extracts a number from a string.
  'getnum' accepts an optional single letter option:
     getnum:n - natural numbers (positive integers, including zero)
     getnum:i - integers
     getnum:d - decimal point numbers
     getnum:p - positive decimal point numbers (this is the default)
     getnum:h - hex numbers
     getnum:o - octal numbers
   Examples:
     $ echo foo-42.0-bar | datamash getnum 1
     42.0
     $ echo foo-42.0-bar | datamash getnum:n 1
     42
     $ echo foo-42.0-bar | datamash getnum:i 1
     -42
     $ echo foo-42.0-bar | datamash getnum:d 1
     -42.0

  New operation: cut
  Similar to cut(1), it copies the input field to the output as-is.
  The advantage over cut(1) is that combined with datamash's other features,
  input fields can be specified by name instead of column number, and
  output fields can be re-ordered and duplicated.
  Example:
    $ printf "a b c\n1 X 6\n" | datamash -W -H cut c,a,c
    cut(c)  cut(a)  cut(c)
    6       1       6

** Bug fixes

  Datamash now correctly calculates mode/antimode for negative values.
  In version 1.4 and earlier, the following produced incorrect results:
    $ echo -1 | datamash-1.4 mode 1
    1.844674407371e+19



* Noteworthy changes in release 1.4 (2018-12-22) [stable]

** New Features

  New option: -C/--skip-comments to skip comment lines (lines starting
  with '#' or ';' and optional whitespace).


* Noteworthy changes in release 1.3 (2018-03-16) [stable]

** New Features

  New option: --format=FMT sets printf style floating-point format.
  Example:
     $ echo '50.5' | datamash --format "%07.3f" sum 1
     050.500
     $ echo '50.5' | datamash --format "%07.3e" sum 1
     5.050e+01

  New option: -R/--round=N rounds numeric values to N decimal places.

  New option: --output-delimiter=X overrides -t/-W.

  New operation: trimmean (trimmed mean value).
  To calculate 20% trimmed mean:
     $ printf "%s\n" 13 3 7 33 3 9 | datamash  trimmean:0.2  1
     8


** Bug fixes

  Datamash now builds correctly with external OpenSSL libraries
  (./configure --with-openssl=yes). The 'configure' script now reports
  whether internal or external libraries are used:

     $ ./configure [OPTIONS]
     [...]
     Configuration summary for datamash
         md5/sha*: internal (gnulib)
  OR
         md5/sha*: external (-lcrypto)


* Noteworthy changes in release 1.2 (2017-08-22) [stable]

** New Features

  New operations:
    perc (percentile),
    range (max-min of values in group/column)

  Improved 'check' operation:
    Expected number of lines/fields can be specified as parameter.

** Improvements

  Improved bash-completion script installation path (see README for details).


* Noteworthy changes in release 1.1.1 (2017-01-19) [stable]

** Bug fixes

  'check' command correctly counts a trailing delimiter at end of lines.

  'transpose' command correctly handles missing fields on the last line.


* Noteworthy changes in release 1.1.0 (2016-01-16) [stable]

** New Features

  Bumped version to 1.1.0 to better comply to semver.

  New operations:
   crosstab (cross-tabulation / pivot-tables),
   check (verify tabular structure),
   bin (bin numeric values)
   strbin (bin strings values)
   pearson correlation,
   covariance,
   rounding functions: round,floor,ceil,trunc,frac

** Improvements

  Speed, Portability, Tests, Coverage improvements.


* Noteworthy changes in release 1.0.7 (2015-06-29) [stable]

** New Features

  New operations: md5, sha1/256/512, base64, rmdup.

  New option --narm to ignore NaN/NA values.

  New feature: ability to specify field by names instead of numbers
  (require using --header-in or -H).

  New translations added.

** Improvements

  Speed, Portability, Coverage improvements.


* Noteworthy changes in release 1.0.6 (2014-07-29) [stable]

** New Features

  New operations: transpose, reverse.

** Improvements

  Tests: improve portability, add I/O error tests, add few edge-case tests.

  Build: improve man-page generation, cross-compiling, auxiliary build scripts.

  Documentation: expand and fix man-page (and shorten --help screen).


* Noteworthy changes in release 1.0.5 (2014-07-15) [stable]

First release as GNU Datamash.
