The relpipe-tr-scheme
can not only filter records,
but can also modify them and even modify the structure of the relation – add or remove attributes.
We have some CSV file:
1,first,1,2,3
2,second,2,10,1024
3,third,4,4,16
and we convert it to a relation called n
:
cat guile-1.csv \
| relpipe-in-csv n id integer name string a integer b integer c integer \
| relpipe-out-tabular
which printed as a table looks like this:
n:
╭──────────────┬───────────────┬─────────────┬─────────────┬─────────────╮
│ id (integer) │ name (string) │ a (integer) │ b (integer) │ c (integer) │
├──────────────┼───────────────┼─────────────┼─────────────┼─────────────┤
│ 1 │ first │ 1 │ 2 │ 3 │
│ 2 │ second │ 2 │ 10 │ 1024 │
│ 3 │ third │ 4 │ 4 │ 16 │
╰──────────────┴───────────────┴─────────────┴─────────────┴─────────────╯
Record count: 3
Because it is annoying to write some code again and again, we will create a shell function and (re)use it later:
sample-data() {
cat guile-1.csv \
| relpipe-in-csv n id integer name string a integer b integer c integer;
}
Another option is storing the relational data in a file and then reading this file. Files are better option, if the transformation is costly and we do not need live/fresh data.
Then, we can modify such relation using Scheme – e.g. we can make the name
uppercase and increase id
by 1000:
sample-data \
| relpipe-tr-scheme \
--relation n \
--for-each '(set! $name (string-upcase $name) ) (set! $id (+ $id 1000) )' \
| relpipe-out-tabular
So we have:
n:
╭──────────────┬───────────────┬─────────────┬─────────────┬─────────────╮
│ id (integer) │ name (string) │ a (integer) │ b (integer) │ c (integer) │
├──────────────┼───────────────┼─────────────┼─────────────┼─────────────┤
│ 1001 │ FIRST │ 1 │ 2 │ 3 │
│ 1002 │ SECOND │ 2 │ 10 │ 1024 │
│ 1003 │ THIRD │ 4 │ 4 │ 16 │
╰──────────────┴───────────────┴─────────────┴─────────────┴─────────────╯
Record count: 3
The relation on the output might have different structure that the relation on the input. We can keep only some of the original attributes:
sample-data \
| relpipe-tr-scheme \
--relation n \
--for-each '(set! $name (string-upcase $name) ) (set! $id (+ $id 1000) )' \
--output-attribute 'id' integer \
--output-attribute 'name' string \
| relpipe-out-tabular
and have:
n:
╭──────────────┬───────────────╮
│ id (integer) │ name (string) │
├──────────────┼───────────────┤
│ 1001 │ FIRST │
│ 1002 │ SECOND │
│ 1003 │ THIRD │
╰──────────────┴───────────────╯
Record count: 3
If we do not want to completely redefine the structure of the relation, we can keep all original attributes and just add definitions of some others:
sample-data \
| relpipe-tr-scheme \
--relation n \
--for-each '(define $sum (+ $a $b $c) )' \
--output-attribute 'sum' integer \
--input-attributes-prepend \
| relpipe-out-tabular
so we have a completely new attribute containing the sum of a
, b
and c
:
n:
╭──────────────┬───────────────┬─────────────┬─────────────┬─────────────┬───────────────╮
│ id (integer) │ name (string) │ a (integer) │ b (integer) │ c (integer) │ sum (integer) │
├──────────────┼───────────────┼─────────────┼─────────────┼─────────────┼───────────────┤
│ 1 │ first │ 1 │ 2 │ 3 │ 6 │
│ 2 │ second │ 2 │ 10 │ 1024 │ 1036 │
│ 3 │ third │ 4 │ 4 │ 16 │ 24 │
╰──────────────┴───────────────┴─────────────┴─────────────┴─────────────┴───────────────╯
Record count: 3
We can change the attribute order by using --input-attributes-append
instead of --input-attributes-prepend
.
Each attribute has a data type (integer, string…). And we can change the type. Of course we have to modify the data, because we can not put e.g. string value into an integer attribute.
sample-data \
| relpipe-tr-scheme \
--relation n \
--for-each '(define $id (string-upcase $name) )' \
--output-attribute 'id' string \
--output-attribute 'a' integer \
--output-attribute 'b' integer \
--output-attribute 'c' integer \
| relpipe-out-tabular
The code above changed the type of id
attribute from integer to string
and put uppercase name
into it:
n:
╭─────────────┬─────────────┬─────────────┬─────────────╮
│ id (string) │ a (integer) │ b (integer) │ c (integer) │
├─────────────┼─────────────┼─────────────┼─────────────┤
│ FIRST │ 1 │ 2 │ 3 │
│ SECOND │ 2 │ 10 │ 1024 │
│ THIRD │ 4 │ 4 │ 16 │
╰─────────────┴─────────────┴─────────────┴─────────────╯
Record count: 3
We can do projection and restriction at the same time, during the same transformation:
sample-data \
| relpipe-tr-scheme \
--relation n \
--for-each '(set! $name (string-upcase $name) ) (set! $id (+ $id 1000) )' \
--output-attribute 'id' integer \
--output-attribute 'name' string \
--where '(= $c (* $a $b) )' \
| relpipe-out-tabular
and have:
n:
╭──────────────┬───────────────╮
│ id (integer) │ name (string) │
├──────────────┼───────────────┤
│ 1003 │ THIRD │
╰──────────────┴───────────────╯
Record count: 1
And if we use expt
instead of *
, we will get SECOND instead of THIRD.
The example above has its SQL equivalent:
SELECT
id + 1000 AS id,
upper(name) AS name
FROM n
WHERE c = (a * b);
The difference is that Relational pipes do not require data to be stored anywhere, because we (by default) process streams on the fly. Thus one process can generate data, second one can transform them and the third one can convert them to some output format. All processes are running at the same time and without need to cache all data at once.
Relational pipes, open standard and free software © 2018-2022 GlobalCode