Thanks to the relpipe-in-filesystem
we can collect metadata (or even the file contents)
and store them for later use in an index file.
Such index is useful for faster access and for offline work (we can index e.g. an optical disc or external or network HDD).
We can simply pipe the relational data into a file and use this file as the index. Or we can use some other format. In this example, we will use an SQLite file as the index.
First step is to collect the file metadata. We will index just a subset of our filesystem,
the /bin/
and /usr/bin/
directories:
find /bin/ /usr/bin/ -print0 \
| relpipe-in-filesystem --relation "program" \
| relpipe-tr-sql --data-source-string 'Driver=SQLite3;Database=file:bin.sqlite'
This index allows us to do fast searches and various analysis. We can e.g. find 20 largest binaries:
relpipe-in-sql \
--data-source-string 'Driver=SQLite3;Database=file:bin.sqlite' \
--relation "largest" \
"SELECT path, size FROM program WHERE type = 'f' ORDER BY size DESC LIMIT 20" \
| relpipe-out-tabular
How very:
largest:
╭──────────────────────────────┬───────────────╮
│ path (string) │ size (string) │
├──────────────────────────────┼───────────────┤
│ /usr/bin/blender │ 76975440 │
│ /usr/bin/blenderplayer │ 32199344 │
│ /usr/bin/mscore │ 24252992 │
│ /usr/bin/mysql_embedded │ 23004600 │
│ /usr/bin/node │ 18369616 │
│ /usr/bin/galax-parse │ 18365264 │
│ /usr/bin/galax-run │ 18360496 │
│ /usr/bin/clementine │ 16818328 │
│ /usr/bin/emacs25-nox │ 15055112 │
│ /usr/bin/doxygen │ 14924104 │
│ /usr/bin/rosegarden │ 14416952 │
│ /usr/bin/snap │ 13472520 │
│ /usr/bin/audacity │ 13257064 │
│ /usr/bin/pgadmin3 │ 13098800 │
│ /usr/bin/qemu-system-aarch64 │ 12564688 │
│ /usr/bin/qemu-system-arm │ 12370192 │
│ /usr/bin/qemu-system-ppc64 │ 12280864 │
│ /usr/bin/qemu-system-ppc │ 11738208 │
│ /usr/bin/qemu-system-x86_64 │ 11658464 │
│ /usr/bin/qemu-system-i386 │ 11623776 │
╰──────────────────────────────┴───────────────╯
Record count: 20
And we can collect additional metadata and append them to our index file.
In this example, we get lists of dynamically linked libraries using the ldd
tool
for each binary and store the lists in our index:
relpipe-in-sql \
--data-source-string 'Driver=SQLite3;Database=file:bin.sqlite' \
--relation bin "SELECT path FROM program WHERE type = 'f'" \
| relpipe-out-nullbyte \
| while read_nullbyte f; do
ldd "$f" | perl -ne 'if (/ => (.*) \(/) { print "$ENV{f},$1\n"; }';
done \
| relpipe-in-csv \
"dependency" \
"program" string \
"library" string \
| relpipe-tr-sql --data-source-string 'Driver=SQLite3;Database=file:bin.sqlite'
And then we can make a „popularity contest“ and find 20 most often used libraries:
relpipe-in-sql \
--data-source-string 'Driver=SQLite3;Database=file:bin.sqlite' \
--relation "popular_libraries" "
SELECT
d.library,
count(*) AS count
FROM dependency AS d
JOIN program AS p ON (d.program = p.path)
GROUP BY library
ORDER BY count DESC
LIMIT 20" \
| relpipe-out-tabular
Well, well… here we are:
popular_libraries:
╭────────────────────────────────────────────┬────────────────╮
│ library (string) │ count (string) │
├────────────────────────────────────────────┼────────────────┤
│ /lib/x86_64-linux-gnu/libc.so.6 │ 2508 │
│ /lib/x86_64-linux-gnu/libpthread.so.0 │ 1487 │
│ /lib/x86_64-linux-gnu/libdl.so.2 │ 1364 │
│ /lib/x86_64-linux-gnu/libm.so.6 │ 1271 │
│ /lib/x86_64-linux-gnu/librt.so.1 │ 1057 │
│ /lib/x86_64-linux-gnu/libz.so.1 │ 1019 │
│ /lib/x86_64-linux-gnu/libgcc_s.so.1 │ 811 │
│ /lib/x86_64-linux-gnu/libpcre.so.3 │ 788 │
│ /lib/x86_64-linux-gnu/liblzma.so.5 │ 749 │
│ /usr/lib/x86_64-linux-gnu/libstdc++.so.6 │ 742 │
│ /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0 │ 681 │
│ /lib/x86_64-linux-gnu/libbsd.so.0 │ 658 │
│ /usr/lib/x86_64-linux-gnu/libXau.so.6 │ 648 │
│ /usr/lib/x86_64-linux-gnu/libXdmcp.so.6 │ 648 │
│ /usr/lib/x86_64-linux-gnu/libxcb.so.1 │ 648 │
│ /usr/lib/x86_64-linux-gnu/libX11.so.6 │ 638 │
│ /usr/lib/x86_64-linux-gnu/libpng16.so.16 │ 622 │
│ /lib/x86_64-linux-gnu/libgpg-error.so.0 │ 616 │
│ /lib/x86_64-linux-gnu/libgcrypt.so.20 │ 613 │
│ /usr/lib/x86_64-linux-gnu/liblz4.so.1 │ 575 │
╰────────────────────────────────────────────┴────────────────╯
Record count: 20
In future versions there might be an option to gather more file metadata like hashes, Exif etc.
But even in the current version, it is possible to gather any literally metadata using a custom script (as we have shown with ldd
above).
Extended attributes are already supported (the --xattr
option).
n.b. if we use a database frequently it is convenient to configure it as a data source in the ~/.odbc.ini
file
– and then connect to it using the --data-source-name
option and its name.
Relational pipes, open standard and free software © 2018-2022 GlobalCode