While reading classic pipelines involving grep
and cut
commands
we must notice that there is some similarity with simple SQL queries looking like:
SELECT "some", "cut", "fields" FROM stdin WHERE grep_matches(whole_line);
And that is true: grep
does restriction
1
and cut
does projection
2
.
Now we can do these relational operations using our relational tools called relpipe-tr-grep
and relpipe-tr-cut
.
Assume that we need only mount_point
fields from our fstab
where type
is btrfs
or xfs
and we want to do something (a shell script block) with these directory paths.
relpipe-in-fstab \
| relpipe-tr-grep --relation 'fstab' --attribute 'type' --value '^btrfs|xfs$' \
| relpipe-tr-cut --relation 'fstab' --attribute 'mount_point' \
| relpipe-out-nullbyte \
| while read -r -d '' m; do
echo "$m";
done
The relpipe-tr-cut
tool has similar syntax to its grep and sed siblings and also uses the power of regular expressions.
In this case it modifies on-the-fly the fstab
relation and drops all its attributes except the mount_point
one.
Then we pass the data to the Bash while
cycle.
In such simple scenario (just echo
), we could use xargs
as in examples above,
but in this syntax, we can write whole block of shell commands for each record/value and do more complex actions with them.
Assume that we have a simple relation containing numbers:
seq 0 8 \
| tr \\n \\0 \
| relpipe-in-cli generate-from-stdin numbers 3 a integer b integer c integer \
> numbers.rp
and second one containing letters:
relpipe-in-cli generate letters 2 a string b string A B C D > letters.rp
We saved them into two files and then combined them into a single file. We will work with them as they are a single stream of relations:
cat numbers.rp letters.rp > both.rp;
cat both.rp | relpipe-out-tabular
Will print:
numbers: ╭─────────────┬─────────────┬─────────────╮ │ a (integer) │ b (integer) │ c (integer) │ ├─────────────┼─────────────┼─────────────┤ │ 0 │ 1 │ 2 │ │ 3 │ 4 │ 5 │ │ 6 │ 7 │ 8 │ ╰─────────────┴─────────────┴─────────────╯ Record count: 3 letters: ╭─────────────┬─────────────╮ │ a (string) │ b (string) │ ├─────────────┼─────────────┤ │ A │ B │ │ C │ D │ ╰─────────────┴─────────────╯ Record count: 2
We can put away the a
attribute from the numbers
relation:
cat both.rp | relpipe-tr-cut --relation 'numbers' --attribute 'b|c' | relpipe-out-tabular
and leave the letters
relation unaffected:
numbers: ╭─────────────┬─────────────╮ │ b (integer) │ c (integer) │ ├─────────────┼─────────────┤ │ 1 │ 2 │ │ 4 │ 5 │ │ 7 │ 8 │ ╰─────────────┴─────────────╯ Record count: 3 letters: ╭─────────────┬─────────────╮ │ a (string) │ b (string) │ ├─────────────┼─────────────┤ │ A │ B │ │ C │ D │ ╰─────────────┴─────────────╯ Record count: 2
Or we can remove a
from both relations resp. keep there only attributes whose names match 'b|c'
regex:
cat both.rp | relpipe-tr-cut --relation '.*' --attribute 'b|c' | relpipe-out-tabular
Instead of '.*'
we could use 'numbers|letters'
and in this case it will give the same result:
numbers: ╭─────────────┬─────────────╮ │ b (integer) │ c (integer) │ ├─────────────┼─────────────┤ │ 1 │ 2 │ │ 4 │ 5 │ │ 7 │ 8 │ ╰─────────────┴─────────────╯ Record count: 3 letters: ╭─────────────╮ │ b (string) │ ├─────────────┤ │ B │ │ D │ ╰─────────────╯ Record count: 2
All the time, we are reducing the attributes. But we can also multiply them or change their order:
cat both.rp \
| relpipe-tr-cut --relation 'numbers' --attribute 'b|a|c' --attribute 'b' --attribute 'a' --attribute 'a' \
| relpipe-out-tabular
n.b. the order in 'b|a|c'
does not matter and if such regex matches, it preserves the original order of the attributes;
but if we use multiple regexes to specify attributes, their order and count matters:
numbers: ╭─────────────┬─────────────┬─────────────┬─────────────┬─────────────┬─────────────╮ │ a (integer) │ b (integer) │ c (integer) │ b (integer) │ a (integer) │ a (integer) │ ├─────────────┼─────────────┼─────────────┼─────────────┼─────────────┼─────────────┤ │ 0 │ 1 │ 2 │ 1 │ 0 │ 0 │ │ 3 │ 4 │ 5 │ 4 │ 3 │ 3 │ │ 6 │ 7 │ 8 │ 7 │ 6 │ 6 │ ╰─────────────┴─────────────┴─────────────┴─────────────┴─────────────┴─────────────╯ Record count: 3 letters: ╭─────────────┬─────────────╮ │ a (string) │ b (string) │ ├─────────────┼─────────────┤ │ A │ B │ │ C │ D │ ╰─────────────┴─────────────╯ Record count: 2
The letters
relation stays rock steady and relpipe-tr-cut --relation 'numbers'
does not affect it in any way.
There are various input filters (relpipe-in-*
), one of them is relpipe-in-csv
which converts CSV files to relational format.
Thus we can process standard CSV files in our relational pipelines
and e.g. filter records that have certain value in certain column (relpipe-tr-grep
)
or keep only certain columns (relpipe-tr-cut
).
We may have a tasks.csv
file containing TODOs and FIXMEs:
"file","line","type","description" ".hg/shelve-backup/posix_mq.patch","97","TODO","support also other encodings." ".hg/shelve-backup/posix_mq.patch","163","TODO","support also other encodings." "src/FileAttributeFinder.h","79","TODO","optional whitespace trimming or substring" "src/FileAttributeFinder.h","80","TODO","custom encoding + read encoding from xattr" "src/FileAttributeFinder.h","83","TODO","allow custom error value or fallback to HEX/Base64" "streamlet-examples/streamlet-common.h","286","FIXME","correct error codes" …
And we can process it using this pipeline:
cat tasks.csv \
| relpipe-in-csv \
| relpipe-tr-grep --relation 'csv' --attribute 'type' --value 'FIXME' \
| relpipe-tr-cut --relation 'csv' --attribute 'file|description' \
| relpipe-out-tabular
and get result like this:
csv: ╭───────────────────────────────────────┬──────────────────────╮ │ file (string) │ description (string) │ ├───────────────────────────────────────┼──────────────────────┤ │ streamlet-examples/streamlet-common.h │ correct error codes │ │ streamlet-examples/streamlet-common.h │ correct error codes │ │ streamlet-examples/Streamlet.java │ correct error codes │ ╰───────────────────────────────────────┴──────────────────────╯ Record count: 3
We work with attribute (column) names, so there is no need to remember column numbers. And thanks to regular expressions we can write elegant and powerful filters.
1. selecting only certain records from the original relation according to their match with given conditions
2.limited subset of what projection means
Relational pipes, open standard and free software © 2018-2022 GlobalCode