#!/usr/pkg/bin/perl -w

#
# dbrowuniq.pm
# Copyright (C) 1997-2018 by John Heidemann <johnh@isi.edu>
#
# This program is distributed under terms of the GNU general
# public license, version 2.  See the file COPYING
# in $dblibdir for details.
#


=head1 NAME

dbrowuniq - eliminate adjacent rows with duplicate fields, maybe counting

=head1 SYNOPSIS

dbrowuniq [-cFLB] [uniquifying fields...]

=head1 DESCRIPTION

Eliminate adjacent rows with duplicate fields, perhaps counting them.
Roughly equivalent to the Unix L<uniq> command,
but optionally only operating on the specified fields.

By default, I<all> columns must be unique.
If column names are specified, only those columns must be unique
and the first row with those columns is returned.

Dbrowuniq eliminates only identical rows that I<adjacent>.
If you want to eliminate identical rows across the entirefile,
you must make them adajcent, perhaps by using dbsort on your
uniquifying field.
(That is, the input with three lines a/b/a will produce
three lines of output with both a's, but if you dbsort it,
it will become a/a/b and dbrowuniq will output a/b.

By default, L<dbrowuniq> outputs the I<first> unique row.
Optionally, with C<-L>, it will output the I<last> unique row,
or with C<-B> it outputs both first and last.
(This choice only matters when uniqueness is determined by specific fields.)

L<dbrowuniq> can also count how many unique, adjacent lines it finds
with C<-c>, with the count going to a new column (defaulting to C<count>).
Incremental counting, when the C<count> column already exists,
is possible with C<-I>.
With incremental counting, the existing count column is summed.

=head1 OPTIONS

=over 4

=item B<-c> or B<--count>

Create a new column (count) which counts the number of times
each line occurred.

The new column is named by the C<-N> argument, defaulting to C<count>.

=item B<-N> on B<--new-name>

Specify the name of the count column, if any.
(Default is C<count>.)

=item B<-I> on B<--incremental>

Incremental counting.
If the count column exists, it is assumed to have a partial count
and the count accumulates.
If the count column doesn't exist, it is created.

=item B<-L> or B<--last>

Output the last unique row only.
By default, it outputs the first unique row.

=item B<-F> or B<--first>

Output the first unique row only. 
(This output is the default.)

=item B<-B> or B<--both>

Output both the first and last unique rows. 

=back

=for comment
begin_standard_fsdb_options

This module also supports the standard fsdb options:

=over 4

=item B<-d>

Enable debugging output.

=item B<-i> or B<--input> InputSource

Read from InputSource, typically a file name, or C<-> for standard input,
or (if in Perl) a IO::Handle, Fsdb::IO or Fsdb::BoundedQueue objects.

=item B<-o> or B<--output> OutputDestination

Write to OutputDestination, typically a file name, or C<-> for standard output,
or (if in Perl) a IO::Handle, Fsdb::IO or Fsdb::BoundedQueue objects.

=item B<--autorun> or B<--noautorun>

By default, programs process automatically,
but Fsdb::Filter objects in Perl do not run until you invoke
the run() method.
The C<--(no)autorun> option controls that behavior within Perl.

=item B<--header> H

Use H as the full Fsdb header, rather than reading a header from
then input.

=item B<--help>

Show help.

=item B<--man>

Show full manual.

=back

=for comment
end_standard_fsdb_options


=head1 SAMPLE USAGE

=head2 Input:

    #fsdb      event
    _null_getpage+128
    _null_getpage+128
    _null_getpage+128
    _null_getpage+128
    _null_getpage+128
    _null_getpage+128
    _null_getpage+4
    _null_getpage+4
    _null_getpage+4
    _null_getpage+4
    _null_getpage+4
    _null_getpage+4
    #  | /home/johnh/BIN/DB/dbcol event
    #  | /home/johnh/BIN/DB/dbsort event

=head2 Command:

    cat data.fsdb | dbrowuniq -c

=head2 Output:

    #fsdb	event	count
    _null_getpage+128	6
    _null_getpage+4	6
    #	2	/home/johnh/BIN/DB/dbcol	event
    #  | /home/johnh/BIN/DB/dbrowuniq -c

=head1 SAMPLE USAGE 2

Retaining the last unique row as an example.

=head2 Input:

	#fsdb event i
	_null_getpage+128 10
	_null_getpage+128 11
	_null_getpage+128 12
	_null_getpage+128 13
	_null_getpage+128 14
	_null_getpage+128 15
	_null_getpage+4 16
	_null_getpage+4 17
	_null_getpage+4 18
	_null_getpage+4 19
	_null_getpage+4 20
	_null_getpage+4 21
	#  | /home/johnh/BIN/DB/dbcol event
	#  | /home/johnh/BIN/DB/dbsort event

=head2 Command:

    cat data.fsdb | dbrowuniq -c -L event

=head2 Output:

	#fsdb event i count
	_null_getpage+128	15	6
	#  | /home/johnh/BIN/DB/dbcol event
	#  | /home/johnh/BIN/DB/dbsort event
	_null_getpage+4	21	6
	#   | dbrowuniq -c 

=head1 SAMPLE USAGE 3

Incremental counting.

=head2 Input:

    #fsdb	event	count
    _null_getpage+128	6
    _null_getpage+128	6
    _null_getpage+4	6
    _null_getpage+4	6
    #  /home/johnh/BIN/DB/dbcol	event
    #  | /home/johnh/BIN/DB/dbrowuniq -c

=head2 Command:

    cat data.fsdb | dbrowuniq -I -c event

=head2 Output:

	#fsdb event count
	_null_getpage+128   12
	_null_getpage+4     12
	#  /home/johnh/BIN/DB/dbcol	event
	#  | /home/johnh/BIN/DB/dbrowuniq -c
	#   | dbrowuniq -I -c event

=head1 SEE ALSO

L<Fsdb>.


=cut


# WARNING: This code is derived from dbrowuniq.pm; that is the master copy.

use Fsdb::Filter::dbrowuniq;
my $f = new Fsdb::Filter::dbrowuniq(@ARGV);
$f->setup_run_finish;  # or could just --autorun
exit 0;


=head1 AUTHOR and COPYRIGHT

Copyright (C) 1997-2018 by John Heidemann <johnh@isi.edu>

This program is distributed under terms of the GNU general
public license, version 2.  See the file COPYING
with the distribution for details.

=cut

1;
