INTRODUCTION

This area contains a snapshot of the ongoing development of a rudimentary
reputation system based on OpenDKIM's statistics package.

The statistics package is able to capture the flow of mail arriving as it
passes through OpenDKIM.  All messages, even unsigned ones, are recorded.
For signed messages, all signatures are recorded including pass/fail,
signing domain, certain signature properties, etc.  The statistics system
can also upload the locally-collected data to an arbitrary central repository
where it can be aggregated and then made available to participants to query.
See the stats/README for details of this component.

This central service implements a queryable system based on the REPUTE
protocols under development within the Internet Engineering Task Force.
This directory contains a description sufficient to build such a system,
but also (and primarily) for building a local store that can be used as
an intelligent filter based on DKIM verification results.

REQUIREMENTS

The requirements for this reputation system are as follows:

1) Reputation must be based on the domain(s) that have some kind of
   responsibility for the message.

2) Reputation should be expressed in such a way that it can be easily
   converted to a message flow limit.

3) There must be some reasonable, even if tiny, allowance made for
   domains about which no data have been accumulated.

4) The system may output a few values for a given domain, such as
   "strict", "medium" and "light", and these would be applied at the
   discretion of the implementation.

5) Data expiration must be configurable at the discretion of the site using
   the system.  That is, the system should neither impose a specific lifetime
   of the data accumulated, though it should discuss the impacts of different
   choices of data lifetime.


OVERVIEW

The system computes a recommended flow based on the history of the
behaviour of the signing domain(s) on a message.  Mail that is unsigned will
be treated as though it was signed by the NULL domain, meaning all unsigned
mail is presumed to come from a common source ("the unknown", as it were).
What remains, then, is to determine the perceived value of mail from each
of these sources.  This is done via data collection and statistical methods.

Full details of this system can be found in the white paper entitled
"Enforcing Rate Limits Based on Authenticated Sender Reputation",
which will be included in this directory upon publication.


SETUP INSTRUCTIONS

1) If you intend only to query for reputations, skip to step 11.

2) Follow the steps to get opendkim configured for creating statistics
   reports.  In addition, install contrib/stats/stats.lua someplace
   and reference it in a StatisticsPolicyScript line in your configuration
   file.  Note that only the "SpamAssassin" portion is required.

3) If you intend to submit your data, follow the steps to create
   and register a submission key (GPG required), and then begin sending
   your reports regularly via cron(8).  Skip to step 11.

4) If you intend to do local reputation, create a new SQL database with
   tables as described in stats/mkdb.mysql and in reputation/mkdb-rep.mysql.
   (You may need to edit these if you are using a backend other than MySQL.)

5) Rather than (or in addition to) sending your stats reports off to
   a central site, import them periodically to your own database using
   cron(8) and opendkim-importstats(8).

6) Somewhere soon after midnight daily, arrange to run opendkim-genrates(8).
   This takes your statistical reports from SQL and rolls them up into 
   spam and flow predictions, which opendkim can then query.

7) Install a web server (e.g., Apache), and ensure it includes support for
   inline PHP. 

8) Select a URI that will be the root of your reputation query.  Configure
   your web server such that this URI points to a specific directory.
   This documentation refers to that directory as $REPROOT.

9) Create a REPUTE query template that will be returned from
   http://<your-URI>/.well-known/repute-template.  This would be in
   $REPROOT/.well-known/repute-template.  The template describes to a
   REPUTE client how to form the HTTP query that your server will use.  The
   template used for this experimental work is as follows:

   http://{service}/repute.php{?subject,application,assertion,service,reporter}

10) Install reputation/repute.php and reputation/repute-config.php into
    $REPROOT.  Edit repute-config.php as needed to match your database
    parameters.

11) Configure opendkim to query the reputation server by adding
    these configuration items:

	ReputationRatios        repute:<repute-server>
	ReputationLimits        repute:<repute-server>:<reporter-id>
	ReputationTimeFactor    24
	ReputationMinimum       2
	ReputationSpamCheck     /^X-Spam: ?[Yy][Ee][Ss]/

    This tells opendkim the following things:

	- domain reputations should be obtained via the REPUTE protocol from
	  <repute-server>
	- message limits based on those reputations should also be
	  retrieved from the same service; you may include your
	  own <reporter-id> if you have one to get limits tailored
	  to your site based on the data you have reported
	- the daily limit should be enforced in 24 chunks (i.e., applied
	  hourly rather than daily)
	- any domain is allowed to get at least two messages in regardless
	  of its reputation or the "spam" verdict on a message
	- a message is spam if any header field matches the regular expression
	  shown

    If you have configured a local reputation service, you can replace
    the ReputationLimits and ReputationRatios setting with a query to the
    data set that contains those numbers for you, i.e., the "daily_limit_low"
    and "rate_high" columns in the "predictions" table.


TROUBLESHOOTING

[some troubleshooting tips will be added here as experience is gained]


SUPPORT/QUESTIONS/COMMENTS

This work is still evolving.  For support, subscribe to the "opendkim-users"
list at http://lists.opendkim.org and post your question there.

To report bugs or assist in development, subscribe to the "opendkim-dev"
list at http://lists.opendkim.org and post there.

Please do not use the trackers at SourceForge for reporting bugs or filing
feature requests until this work is no longer considered experimental.
