#!/usr/pkg/bin/perl

#
# dbcolpercentile.pm
# Copyright (C) 1997-2018 by John Heidemann <johnh@isi.edu>
#
# This program is distributed under terms of the GNU general
# public license, version 2.  See the file COPYING
# in $dblibdir for details.
#


=head1 NAME

dbcolpercentile - compute percentiles or ranks for an existing column

=head1 SYNOPSIS

    dbcolpercentile [-rplhS] column

=head1 DESCRIPTION

Compute a percentile of a column of numbers.
The new column will be called I<percentile> or I<rank>.
Non-numeric records are handled as in other programs.

If the data is pre-sorted and only a rank is requested,
no extra storage is required.
In all other cases, a full copy of data is buffered on disk.

=head1 OPTIONS

=over 4

=item B<-p> or B<--percentile> or B<--mode percentile>

Show percentile (default).
Percentile is the percentage of the cumulative values at or lower than the current value, relative to the total count.

=item B<-P> or B<--rank> or B<--nopercentile> or B<--mode rank>

Compute ranks instead of percentiles.

=item B<--fraction>

Show fraction (percentage, except between 0 and 1, not cumulative fraction).

=item B<-a> or B<--include-non-numeric>

Compute stats over all records (treat non-numeric records
as zero rather than just ignoring them).

=item B<-S> or B<--pre-sorted>

Assume data is already sorted.
With one -S, we check and confirm this precondition.
When repeated, we skip the check.

=item B<-f FORMAT> or B<--format FORMAT>

Specify a L<printf(3)>-style format for output statistics.
Defaults to C<%.5g>.

=item B<-T TmpDir>

where to put tmp files.
Also uses environment variable TMPDIR, if -T is 
not specified.
Default is /tmp.

=back

Sort specification options (can be interspersed with column names):

=over 4

=item B<-r> or B<--descending>

sort in reverse order (high to low)

=item B<-R> or B<--ascending>

sort in normal order (low to high)

=item B<-n> or B<--numeric>

sort numerically (default)

=item B<-N> or B<--lexical>

sort lexicographically

=back

=for comment
begin_standard_fsdb_options

This module also supports the standard fsdb options:

=over 4

=item B<-d>

Enable debugging output.

=item B<-i> or B<--input> InputSource

Read from InputSource, typically a file name, or C<-> for standard input,
or (if in Perl) a IO::Handle, Fsdb::IO or Fsdb::BoundedQueue objects.

=item B<-o> or B<--output> OutputDestination

Write to OutputDestination, typically a file name, or C<-> for standard output,
or (if in Perl) a IO::Handle, Fsdb::IO or Fsdb::BoundedQueue objects.

=item B<--autorun> or B<--noautorun>

By default, programs process automatically,
but Fsdb::Filter objects in Perl do not run until you invoke
the run() method.
The C<--(no)autorun> option controls that behavior within Perl.

=item B<--help>

Show help.

=item B<--man>

Show full manual.

=back

=for comment
end_standard_fsdb_options


=head1 SAMPLE USAGE

=head2 Input:

    #fsdb name id test1
    a 1 80
    b 2 70
    c 3 65
    d 4 90
    e 5 70
    f 6 90

=head2 Command:

    cat DATA/grades.fsdb | dbcolpercentile test1

=head2 Output:

	#fsdb name id test1 percentile
	d	4	90	1
	f	6	90	1
	a	1	80	0.66667
	b	2	70	0.5
	e	5	70	0.5
	c	3	65	0.16667
	#  | dbsort -n test1
	#   | dbcolpercentile test1

=head2 Command 2:

    cat DATA/grades.fsdb | dbcolpercentile --rank test1

=head2 Output 2:

	#fsdb name id test1 rank
	d	4	90	1
	f	6	90	1
	a	1	80	3
	b	2	70	4
	e	5	70	4
	c	3	65	6
	#  | dbsort -n test1
	#   | dbcolpercentile --rank test1


=head1 SEE ALSO

L<Fsdb>.
L<dbcolhisto>.


=cut


# WARNING: This code is derived from dbcolpercentile.pm; that is the master copy.

use Fsdb::Filter::dbcolpercentile;
my $f = new Fsdb::Filter::dbcolpercentile(@ARGV);
$f->setup_run_finish;  # or could just --autorun
exit 0;


=head1 AUTHOR and COPYRIGHT

Copyright (C) 1991-2018 by John Heidemann <johnh@isi.edu>

This program is distributed under terms of the GNU general
public license, version 2.  See the file COPYING
with the distribution for details.

=cut

1;
