Since v0.16 the relpipe-tr-sql
module
uses the ODBC abstraction layer and thus we can access data in any DBMS (database management system).
Our program depends only on the generic API and the driver for particular DBMS is loaded dynamically depending on the configuration.
ODBC (Open Database Connectivity) is an industry standard and provides API for accessing a DBMS. In late 80s several vendors (mostly from the Unix and database communities) established the SQL Access Group (SAG) and then specified the Call Level Interface (CLI). ODBC, which is based on CLI, was published in early 90s. ODBC is available on many operating systems and there are at least two free software implementations: unixODBC and iODBC.
For more information see the v0.16 release notes.
ODBC:
the API consisting of C functions; see the files sql.h
and sqlext.h
e.g. in unixODBC.
Database driver:
a shared library (an .so
file)
that implements the API and connects to particular DBMS (SQLite, PostgreSQL, MySQL, MariaDB, Firebird etc.);
is usually provided by the authors of given DBMS, sometimes writen by a third-party
Client:
a program that calls the API in order to access a database; our relpipe-tr-sql
is a client
Data Source Name (DSN): the name of a preconfigured data source – when connecting, we need to know only the DSN – all parameters (like server name, user name, password etc.) can be then looked-up in the configuration
Connection string: a text string consisting of serialized parameters needed for connecting – we can specify all parameters ad-hoc in the connection string without creating any permanent configuration; a connection string can also refer to a DSN and add or override some parameters
There is some global configuration in the /etc
directory.
In /etc/odbcinst.ini
we can a find list of ODBC drivers.
Thanks to it, we can refer to a driver by its name (e.g. SQLite3
)
instead of the path to the shared library (e.g. /usr/lib/x86_64-linux-gnu/odbc/libsqlite3odbc.so
).
In /etc/odbc.ini
we can find a list of global (for given computer) data sources.
It is uncommon to put complete configurations in this file, because anyone would be able to read the passwords,
but we can provide here just a template with public parameters like server name, port etc.
and user will supply his own user name and password in the connection string or in his personal configuration file.
The ~/.odbc.ini
contains personal configuration of given user.
There are usually data sources including the passwords.
Thus this file must be readable only by given user (chmod 600 ~/.odbc.ini
).
Providing passwords in connection strings passed as CLI arguments is not a good practice due to security reasons:
by default it is stored in the shell history and it is also visible to other users of the same machine in the list of running processes.
The section name – in the []
brackets – is the DSN.
Then there are parameters in form of key=value
on each line.
The relpipe-tr-sql
and relpipe-in-sql
support these relevant CLI options:
--list-data-sources
:
whether to
lists available (configured) data sources in relational format (so we pipe the output to some output filter e.g. to relpipe-out-tabular
)
--data-source-name
:
specifies the DSN of a configured data source
--data-source-string
:
specifies the connections string for ad-hoc connection without need of any configuration
$ relpipe-tr-sql --list-data-sources true | relpipe-out-tabular data_source: ╭───────────────┬──────────────────────╮ │ name (string) │ description (string) │ ├───────────────┼──────────────────────┤ │ sqlite-memory │ SQLite3 │ │ relpipe │ PostgreSQL Unicode │ ╰───────────────┴──────────────────────╯ Record count: 2
Because output of this command is relational, we can further process it in our relational pipelines. This output is also used for the Bash-completion for suggesting the DSN.
If neither --data-source-name
nor --data-source-string
option is provided,
a temporary in-memory SQLite database is used as default.
In Debian GNU/Linux and similar distributions we can install SQLite ODBC driver by this command:
apt install libsqliteodbc
Which also installs the SQLite library that is all we need (because SQLite is a serverless and self-contained database).
Then we can use the default in-memory temporary database or specify the connection string ad-hoc, access existing SQLite databases or create new ones – e.g. this command:
… | relpipe-tr-sql --data-source-string 'Driver=SQLite3;Database=file:MyDatabase.sqlite'
will create the MyDatabase.sqlite
file and fill it with relations that came from STDIN.
For frequently used databases it is convenient to configure a data source in ~/.odbc.ini
:
[MyDatabase]
Driver=SQLite3
Database=file:/home/hacker/MyDatabase.sqlite
and then connect to it simply using --data-source-name MyDatabase
(both the option and the name will be suggested by Bash-completion).
The SQLite ODBC driver supports several parameters that are described in its documentation.
One of them is LoadExt
that loads SQLite extensions:
LoadExt=/home/hacker/libdemo.so
So we can write our own SQLite extension with custom functions or other features (example) or chose some existing one and load it into the SQLite connected through ODBC.
In Debian GNU/Linux and similar distributions we can install PostgreSQL ODBC driver by this command:
apt install odbc-postgresql
PostgreSQL is very powerful DBMS (probably most advanced free software relational database system)
and utilizes the client-server architecture.
This means that we also needs a server (can be also installed through apt
like the driver).
Once we have a server – remote or local – we need to create a user (role).
For SQL transformations we configure a dedicated role that has no persistent schema and uses the temporary one as default,
which means that all relations we create are lost at the end of the session (when the relpipe-tr-sql
command finishes),
thus it behaves very similar to the SQLite in-memory database.
CREATE USER relpipe WITH PASSWORD 'someSecretPassword';
ALTER ROLE relpipe SET search_path TO 'pg_temp';
And then we configure the ODBC data source:
[postgresql-temp]
Driver=PostgreSQL Unicode
Database=postgres
Servername=localhost
Port=5432
Username=relpipe
Password=someSecretPassword
Now we can use advanced PostgreSQL features for transforming data in our pipelines. We can also configure a DSN for another database that contains some useful data and other database objects, call existing business functions installed in such database, load data to or from this DB etc.
If the libmyodbc
package is missing in our distribution,
the ODBC driver for MySQL can be downloaded from their website.
We can get a binary package (.deb
, .rpm
etc.) or source code.
If we are compiling from sources, we do something like this:
cd mysql-connector-odbc-*-src/
mkdir build
cd build
cmake ../ -DWITH_UNIXODBC=1
make
We should use the driver in the same or similar version as the MySQL client library installed on our system.
For example 8.x driver will not work with 5.x library.
Successful compilation results in libmyodbc*.so
files.
Like PostgreSQL, also MySQL is a client-server,
so we need a server where we create a database and some user account.
As root through the mysql mysql
command we execute:
CREATE DATABASE relpipe CHARACTER SET = utf8;
CREATE USER 'relpipe'@'localhost' IDENTIFIED BY 'someSecretPassword';
GRANT ALL PRIVILEGES ON relpipe.* TO 'relpipe'@'localhost';
FLUSH PRIVILEGES;
As a normal user we add new data source to our ~/.odbc.ini
file:
[mysql-relpipe-localhost]
Driver=/home/hacker/src/mysql/build/lib/libmyodbc5a.so
Server=localhost
Port=3306
Socket=/var/run/mysqld/mysqld.sock
User=relpipe
Password=someSecretPassword
Database=relpipe
InitStmt=SET SQL_MODE=ANSI_QUOTES;
Charset=utf8
See that we have compiled the ODBC driver in our home directory
and even without installing it anywhere and registering it in the /etc/odbcinst.ini
file,
we can simply refer to the .so
file from our ~/.odbc.ini
.
If we set Server=localhost
, the client-server communication does not go through TCP/IP
but rather through the unix domain socket specified in the Socket
field.
If we set Server=127.0.0.1
or some remote IP address or domain name, the communication goes through TCP/IP on given port.
The SET SQL_MODE=ANSI_QUOTES;
init statement is important,
because it tells MySQL server that it should support standard SQL "quoted" identifiers
instead of that `weird` MySQL style.
We use the standard SQL while creating the tables.
There are many other parameters, quite well documented.
Now we can use MySQL as the SQL engine for transformations in our pipelines and we can also access existing MySQL databases, load data to and from them or call functions and procedures installed on the server.
Relational pipes, open standard and free software © 2018-2022 GlobalCode