
Q-ODBC - ODBC interface for the Q programming language
====== = ==== ========= === === = =========== ========

This module provides a simple ODBC interface for the Q programming language,
which lets you access a large variety of open source and commercial database
systems from Q. ODBC ("Open Database Connectivity") was originally developed
by Microsoft for Windows, but is now available on many different platforms,
and (at least) two open source implementations exist for Unix-like systems
(iODBC [http://www.iodbc.org/] and unixODBC [http://www.unixodbc.org/]). ODBC
has become the industry standard for portable and vendor independent database
access. Most modern relational databases provide an ODBC interface so that
they can be used with this module. This includes the popular open source DBMSs
MySQL [http://www.mysql.com/] and PostgreSQL [http://www.postgresql.org/]. The
module provides the necessary operations to connect to an ODBC data source and
retrieve or modify data using SQL statements.

To make this module work, you must have an ODBC installation on your system,
as well as the driver backend for the DBMS you want to use (and, of course,
the DBMS itself). You also have to configure the DBMS as a data source for the
ODBC system. On Windows this is done with the ODBC applet in the system
control panel. For iODBC and unixODBC you can either edit the corresponding
configuration files (/etc/odbc.ini and/or ~/.odbc.ini) by hand, or use one of
the available graphical setup tools. More information about the setup process
can be found on the iODBC and unixODBC websites.


OPENING AND CLOSING A DATA SOURCE
======= === ======= = ==== ======

To open an ODBC connection, you have to specify a "connect string" which names
the data source to be used with the `odbc_connect' function. A list of
available data sources can be obtained with the `odbc_sources' function. For
instance, on my Linux system running MySQL and PostgreSQL it shows the
following:

==> odbc_sources
[("myodbc","MySQL ODBC 2.50"),("psqlodbc","PostgreSQL ODBC")]

The first component in each entry of the list is the name of the data source,
which can be used as the value of the `DSN' option in the connect string, the
second component provides a short description of the data source.

Likewise, the list of ODBC drivers available on your system can be obtained
with the `odbc_drivers' function which returns a list of pairs of driver names
and attributes. It seems that at this time this function is properly supported
only on Windows, though. There it can be used to determine a legal value for
the DRIVER attribute in the connect string, see below.

The `odbc_connect' function is invoked with a single parameter, the connect
string, which is used to describe the data source and various other parameters
such as user id and password. For instance, on my system I can connect to the
local "myodbc" data source from above as follows:

==> def DB = odbc_connect "DSN=myodbc"

Here's how to specify a username and password; note that the different options
are separated with a semicolon:

==> def DB = odbc_connect "DSN=myodbc;UID=root;PWD=guess"

The precise set of options in the connect string depends on your ODBC
interface, but at least the following options should be available on most
systems:

- DSN=<data source name>
- HOST=<server host name>
- DATABASE=<database path>
- UID=<user name>
- PWD=<password>

The following options appear to be Windows-specific:

- FILEDSN=<DSN file name>
- DRIVER=<driver name>
- DBQ=<database file name>

Using the FILEDSN option you can establish a connection to a data source
described in a .dsn file on Windows, as follows:

==> odbc_connect "FILEDSN=test.dsn"

On Windows it is also possible to directly connect to a driver and name a
database file as the data source. For instance, using the MS Access ODBC
driver you can connect to a database file test.mdb as follows:

==> odbc_connect "DRIVER=Microsoft Access Driver (*.mdb);DBQ=test.mdb"

The `odbc_connect' function returns an `ODBCHandle' object which is used to
refer to the database connection in the other routines provided by this
module. An ODBCHandle object is closed automatically when it is no longer
accessible.  You can also close it explicitly with a call to the
`odbc_disconnect' function:

==> odbc_disconnect DB

After `odbc_disconnect' has been invoked on a handle, any further operations
on it will fail.


GETTING INFORMATION ABOUT A DATA SOURCE
======= =========== ===== = ==== ======

You can get general information about an open database connection with the
`odbc_info' function. This function returns a tuple of strings with the
following items (see the description of the SQLGetInfo() function in the ODBC
API reference for more information):

- DATA_SOURCE_NAME: the data source name
- DATABASE_NAME: the default database
- DBMS_NAME: the host DBMS name
- DBMS_VER: the host DBMS version
- DRIVER_NAME: the name of the ODBC driver
- DRIVER_VER: the version of the ODBC driver
- DRIVER_ODBC_VER: the ODBC version supported by the driver
- ODBC_VER: the ODBC version of the driver manager

E.g., here is what the connection to MySQL shows on my Linux system:

==> odbc_info DB
("myodbc","test","MySQL","5.0.18","myodbc3.dll","03.51.12","03.51","03.52")

As of Q 7.11, the odbc module now provides a number of new operations to
retrieve a bunch of additional meta information about the given database
connection. In particular, the odbc_getinfo function provides a direct
interface to the SQLGetInfo() routine. The result of odbc_getinfo is a byte
string which can be converted to an integer or string value, depending on the
type of information requested. For instance:

==> bint $ odbc_getinfo DB SQL_MAX_TABLES_IN_SELECT
31

==> bstr $ odbc_getinfo DB SQL_IDENTIFIER_QUOTE_CHAR
"`"

Information about supported SQL data types is available with the odbc_typeinfo
routine (this returns a lot of data, see odbc.q for an explanation):

==> odbc_typeinfo DB SQL_ALL_TYPES

Moreover, information about the tables in the current database, as well as the
structure of the tables and their primary and foreign keys can be retrieved
with the odbc_tables, odbc_columns, odbc_primary_keys and odbc_foreign_keys
functions:

==> odbc_tables DB
[("event","TABLE"),("pet","TABLE")]

==> odbc_columns DB "pet"
[("name","varchar","NO","''"),("owner","varchar","YES",()),
("species","varchar","YES",()),("sex","char","YES",()),
("birth","date","YES",()),("death","date","YES",())]

==> odbc_primary_keys DB "pet"
["name"]

==> odbc_foreign_keys DB "event"
[("name","pet","name")]

This often provides a convenient and portable means to retrieve basic
information about table structures, at least on RDBMS which properly implement
the corresponding ODBC calls (which unfortunately isn't the case for all ODBC
drivers yet). Also note that while this information is also available through
special system catalogs in most databases, the details of accessing these vary
a lot among implementations.


EXECUTING SQL QUERIES
========= === =======

As soon as a database connection has been opened, you can execute SQL queries
on it using the `sql' function which executes a query and collects the results
in a list. Note that SQL queries generally come in two different flavours:
queries returning data (so-called result sets), and statements modifying the
data (which have as their result the number of affected rows). The `sql'
function returns a nonempty list of tuples (where the first tuple denotes the
column titles, and each subsequent tuple corresponds to a single row of the
result set) in the former, and the row count in the latter case.

For instance, here is how you can select some entries from a table. (The
following examples assume the sample "menagerie" tables from the MySQL
documentation. The `init' function in the odbc_examp.q script can be used to
create these tables in your default database.)

==> sql DB "select name,species from pet where owner='Harold'" ()
[("name","species"),("Fluffy","cat"),("Buffy","dog")]

Often the third parameter of `sql', as above, is just the empty tuple,
indicating a parameterless query. Queries involving marked input parameters
can be executed by specifying the parameter values in the third argument of
the `sql' call. For instance:

==> sql DB "select name,species from pet where owner=?" "Harold"
[("name","species"),("Fluffy","cat"),("Buffy","dog")]

Multiple parameters are specified as a tuple:

==> sql DB "select name,species from pet where owner=? and species=?" \
("Harold","cat")
[("name","species"),("Fluffy","cat")]

Parameterized queries are particularly useful for the purpose of inserting
data into a table:

==> sql DB "insert into pet values (?,?,?,?,?,?)" \
("Puffball","Diane","hamster","f","1999-03-30",())
1

In this case we could also have hard-coded the data to be inserted right into
the SQL statement, but a parameterized query like the one above can easily be
applied to a whole collection of data rows, e.g., as follows:

==> do (sql DB "insert into pet values (?,?,?,?,?,?)") DATA

Parameterized queries also let you insert data which cannot be specified
easily inside an SQL query, such as long strings or binary data.

The following types of result and parameter values are recognized and
converted to/from the corresponding Q types:

SQL value/type						Q value/type
---------------------------------------------		------------
NULL (no value)						()
integer types (INTEGER and friends)			Int
floating point types (REAL, FLOAT and friends)		Float
binary data (BINARY, BLOB, etc.)			ByteStr
character strings (CHAR, VARCHAR, TEXT, etc.)		String

All other SQL data (including, e.g., TIME, DATE and TIMESTAMP) is represented
in Q using its character representation, encoded as a Q string.

Some databases also allow special types of queries (e.g., "batch" queries
consisting of multiple SQL statements) which may return multiple result sets
and/or row counts. The `sql' function only returns the first result set, which
is appropriate in most cases. If you need to determine all result sets
returned by a query, the `msql' function must be used. This function is
invoked in exactly the same way as the `sql' function, but returns a list with
all the result sets and/or row counts of the query.

Example:

==> msql DB "select * from pet; select * from event" ()

This will return a list with two result sets, one for each query.


LOW-LEVEL OPERATIONS
========= ==========

The `sql' and `msql' operations are in fact just ordinary Q functions which
are implemented in terms of the low-level operations `sql_exec', `sql_fetch',
`sql_more' and `sql_close'. You can also invoke these functions directly if
necessary. The `sql_exec' function starts executing a query and returns either
a row count or the column names of the first result set as a tuple of
strings. After that you can use `sql_fetch' to obtain the results in the set
one by one. When all rows have been delivered, `sql_fetch' fails. The
`sql_more' function can then be used to check for additional result sets. If
there are further results, `sql_more' returns either the next row count, or a
tuple of column names, after which you can invoke `sql_fetch' again to obtain
the data rows in the second set, etc. When the last result set has been
processed, `sql_more' fails.

Example:

==> sql_exec DB "select name,species from pet where owner='Harold'" ()
("name","species")

==> sql_fetch DB // get the 1st row
("Fluffy","cat")

==> sql_fetch DB // get the 2nd row
("Buffy","dog")

==> sql_fetch DB // no more results
sql_fetch <<ODBCHandle>>

==> sql_more DB // no more result sets
sql_more <<ODBCHandle>>

Moreover, the `sql_close' function can be called at any time to terminate an
SQL query, after which subsequent calls to `sql_fetch' and `sql_more' will
fail:

==> sql_close DB // terminate query
()

This is not strictly necessary (it will be done automatically as soon as the
next SQL query is invoked), but it is useful in order to release all resources
associated with the query, such as parameter values which have to be cached so
that they remain accessible to the SQL server. Since these parameters in some
cases may use a lot of memory it is better to call `sql_close' as soon as you
are finished with a query. This is also done automatically by the `sql' and
`msql' functions.

The low-level operations are useful when you have to deal with large result
sets where you want to avoid to build the complete list of results in main
memory. Instead, these functions allow you to process the individual elements
immediately as they are delivered with the `sql_fetch' function. Using the
low-level operations you can also build your own specialized query engines;
take the definition of `sql' or `msql' as a start and change it according to
your needs.


ERROR HANDLING
===== ========

When one of the above operations fails because the SQL server reports an
error, an error term of the form `odbc_error MSG STATE' will be returned,
which specifies an error message and the corresponding SQL state (i.e., error
code). A detailed explanation of the state codes can be found in the ODBC
documentation. For instance, a reference to a non-existant table will cause a
report like the following:

==> sql DB "select * from pets" ()
odbc_error "[TCX][MyODBC]Table 'test.pets' doesn't exist" "S1000"

You can check for such return values and take some appropriate action. By
redefining odbc_error accordingly, you can also have it generate exceptions or
print an error message. For instance:

odbc_error MSG STATE	= fprintf ERROR "%s (%s)\n" (MSG,STATE);

NOTE: When redefining `odbc_error' in this manner, you should be aware that
the return value of `odbc_error' is what will be returned by the other
operations of this module in case of an error condition. These return values
are checked by other functions such as `sql'. Thus the return value should
still indicate that an error has happened, and not be something that might be
interpreted as a legal return value, such as an integer or a nonempty
tuple. It is usually safe to have `odbc_error' return an empty tuple or throw
an exception, but other types of return values should be avoided.


CAVEATS AND BUGS
======= === ====

Be warned that multiple result sets are not supported by all databases. I also
found that some ODBC drivers do not properly implement this feature, even
though the database supports it. So you better stay away from this right now
if you want your application to be portable. Anyway, you can easily implement
batched queries using a sequence of single queries instead.

Note that since the exact numeric SQL data types (NUMERIC, DECIMAL) are mapped
to Q Float values (which are double precision floating point numbers), there
might be a loss of precision in extreme cases. If this is a problem you should
explicitly convert these values to strings in your query (which can be done by
concatenating them with the empty string, as in "select 1234.56||''").


FURTHER INFORMATION AND EXAMPLES
======= =========== === ========

For further details about the operations provided by this module please see
the odbc.q file. A sample script illustrating the usage of the module can be
found in the odbc_examp.q file.


Enjoy! :)

Feb 16 2008
Albert Graef
ag@muwiinfa.geschichte.uni-mainz.de, Dr.Graef@t-online.de
http://www.musikwissenschaft.uni-mainz.de/~ag
