Atom Syndication Format is a standard for publishing web feeds a.k.a web syndication. These feeds are usually consumed by a feed reeder that aggregates news from many websites and displays them in a uniform format. The Atom feed is an XML with a list of recent news containing their titles, URLs and short annotations. It also contains some metadata (website author, title etc.).
Using this simple XQuery 1 FLWOR Expression we convert the Atom feed into the XML serialization of relational data:
xquery version "1.0";
declare namespace relpipe="tag:globalcode.info,2018:relpipe";
declare namespace atom="http://www.w3.org/2005/Atom";
<relpipe xmlns="tag:globalcode.info,2018:relpipe">
<relation>
<name>atom</name>
<attributes-metadata>
<attribute-metadata name="published" type="string"/>
<attribute-metadata name="title" type="string"/>
<attribute-metadata name="url" type="string"/>
</attributes-metadata>
{
for $e in /atom:feed/atom:entry
order by $e/atom:published descending
return
<record>
<attribute>{$e/atom:published/text()}</attribute>
<attribute>{$e/atom:title/text()}</attribute>
<attribute>{string($e/atom:link/@href)}</attribute>
</record>
}
</relation>
</relpipe>
Download: examples/atom.xq
This is similar operation to xmltable used in SQL databases.
It converts an XML tree structure to the relational form.
In our case, the output is still XML, but in a format that can be read by relpipe-in-xml
.
All put together in a single shell script:
#!/bin/bash
get_atom() {
wget --quiet --output-document - https://blog.frantovo.cz/agregace/c/
# wget --quiet --output-document - https://blog.frantovo.cz/agregace/k/
# cat atom.xml
}
get_atom | galax-run -context-item /dev/stdin atom.xq | relpipe-in-xml | relpipe-out-tabular
Will generate a table with web news:
atom:
╭──────────────────────┬───────────────────────────────────────────────────────────────────────┬────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ published (string) │ title (string) │ url (string) │
├──────────────────────┼───────────────────────────────────────────────────────────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ 2018-12-24T13:37:24Z │ GNU Bash: Vánoční tipy │ https://blog.frantovo.cz/c/370/GNU%20Bash%3A%20V%C3%A1no%C4%8Dn%C3%AD%20tipy │
│ 2018-08-04T23:23:00Z │ HiFive1 – deska s otevřeným čipem RISC-V │ https://blog.frantovo.cz/c/368/HiFive1%20%E2%80%93%20deska%20s%C2%A0otev%C5%99en%C3%BDm%20%C4%8Dipem%20RISC-V │
│ 2018-06-30T13:37:08Z │ The Things Network – LoRaWAN – IoT │ https://blog.frantovo.cz/c/366/The%20Things%20Network%20%E2%80%93%20LoRaWAN%20%E2%80%93%C2%A0IoT │
│ 2018-03-31T19:48:00Z │ Roland Rubix44 – externí zvuková karta │ https://blog.frantovo.cz/c/365/Roland%20Rubix44%20%E2%80%93%20extern%C3%AD%20zvukov%C3%A1%20karta │
│ 2017-11-25T20:26:49Z │ Přepisování parametrů příkazové řádky │ https://blog.frantovo.cz/c/362/P%C5%99episov%C3%A1n%C3%AD%20parametr%C5%AF%20p%C5%99%C3%ADkazov%C3%A9%20%C5%99%C3%A1dky │
│ 2017-07-01T22:16:00Z │ Java a záludnost ternárního operátoru │ https://blog.frantovo.cz/c/359/Java%20a%C2%A0z%C3%A1ludnost%20tern%C3%A1rn%C3%ADho%20oper%C3%A1toru │
│ 2017-06-11T19:05:13Z │ Paralelní port jako generátor signálu │ https://blog.frantovo.cz/c/358/Paraleln%C3%AD%20port%20jako%20gener%C3%A1tor%20sign%C3%A1lu │
│ 2016-12-28T22:50:00Z │ Herní ovladače počátku 90. let │ https://blog.frantovo.cz/c/356/Hern%C3%AD%20ovlada%C4%8De%20po%C4%8D%C3%A1tku%2090.%20let │
│ 2016-11-12T20:16:00Z │ GPIO v Raspberry Pi jako soubory │ https://blog.frantovo.cz/c/355/GPIO%20v%C2%A0Raspberry%20Pi%20jako%20soubory │
│ 2016-02-29T23:45:00Z │ Nakupujeme v zahraničí po Internetu │ https://blog.frantovo.cz/c/353/Nakupujeme%20v%C2%A0zahrani%C4%8D%C3%AD%20po%20Internetu │
│ 2015-12-24T17:25:41Z │ Malajsie: Kuala Lumpur a hackerspacy │ https://blog.frantovo.cz/c/354/Malajsie%3A%20Kuala%20Lumpur%20a%C2%A0hackerspacy │
│ 2015-10-04T12:25:07Z │ Opravujeme chyby v softwaru: inotify-tools │ https://blog.frantovo.cz/c/352/Opravujeme%20chyby%20v%C2%A0softwaru%3A%20inotify-tools │
│ 2015-09-30T23:10:01Z │ CLOC: počítáme řádky kódu │ https://blog.frantovo.cz/c/351/CLOC%3A%20po%C4%8D%C3%ADt%C3%A1me%20%C5%99%C3%A1dky%20k%C3%B3du │
│ 2015-06-20T20:03:31Z │ binfmt_misc: spouštíme javovské programy podobně jako nativní binárky │ https://blog.frantovo.cz/c/349/binfmt_misc%3A%20spou%C5%A1t%C3%ADme%20javovsk%C3%A9%20programy%20podobn%C4%9B%20jako%20nativn%C3%AD%20bin%C3%A1rky │
│ 2015-06-13T22:56:58Z │ Přepisujeme soukromé proměnné v Javě pomocí reflexe │ https://blog.frantovo.cz/c/348/P%C5%99episujeme%20soukrom%C3%A9%20prom%C4%9Bnn%C3%A9%20v%C2%A0Jav%C4%9B%20pomoc%C3%AD%20reflexe │
│ 2015-04-04T16:47:44Z │ Jak jsem si (ne)koupil notebook │ https://blog.frantovo.cz/c/341/Jak%20jsem%20si%20%28ne%29koupil%20notebook │
│ 2015-02-15T18:55:35Z │ Těžíme akumulátory 18650 │ https://blog.frantovo.cz/c/340/T%C4%9B%C5%BE%C3%ADme%20akumul%C3%A1tory%2018650 │
│ 2015-01-17T23:23:00Z │ Java 8: Stream API │ https://blog.frantovo.cz/c/339/Java%208%3A%20Stream%20API │
│ 2015-01-04T01:47:49Z │ JXD S7800B: kapesní herní konsole │ https://blog.frantovo.cz/c/338/JXD%20S7800B%3A%20kapesn%C3%AD%20hern%C3%AD%20konsole │
│ 2014-12-28T14:31:07Z │ Vánoční hvězda – 3D │ https://blog.frantovo.cz/c/337/V%C3%A1no%C4%8Dn%C3%AD%20hv%C4%9Bzda%20%E2%80%93%203D │
╰──────────────────────┴───────────────────────────────────────────────────────────────────────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Record count: 20
Or with relpipe-out-recfile
, we will get output in the recfile format (GNU Recutils), like this:
%rec: atom
published: 2018-12-24T13:37:24Z
title: GNU Bash: Vánoční tipy
url: https://blog.frantovo.cz/c/370/GNU%20Bash%3A%20V%C3%A1no%C4%8Dn%C3%AD%20tipy
published: 2018-08-04T23:23:00Z
title: HiFive1 – deska s otevřeným čipem RISC-V
url: https://blog.frantovo.cz/c/368/HiFive1%20%E2%80%93%20deska%20s%C2%A0otev%C5%99en%C3%BDm%20%C4%8Dipem%20RISC-V
published: 2018-06-30T13:37:08Z
title: The Things Network – LoRaWAN – IoT
url: https://blog.frantovo.cz/c/366/The%20Things%20Network%20%E2%80%93%20LoRaWAN%20%E2%80%93%C2%A0IoT
…
For frequent usage we can create a script or funcrion called relpipe-in-atom
that reads Atom XML on STDIN and generates relational data on STDOUT.
And then do any of these:
wget … | relpipe-in-atom | relpipe-out-tabular
wget … | relpipe-in-atom | relpipe-out-csv
wget … | relpipe-in-atom | relpipe-out-gui
wget … | relpipe-in-atom | relpipe-out-nullbyte | while read_nullbyte published title url; do echo "$title"; done
wget … | relpipe-in-atom | relpipe-out-recfile
There are several implementations of XQuery.
Galax is one of them.
XQilla or
BaseX are another ones (and support newer versions of the standard).
There are also XSLT processors like xsltproc.
BaseX can be used instead of Galax – we just replace
galax-run -context-item /dev/stdin
with basex -i /dev/stdin
.
Or for XQilla we use: xqilla -i /dev/stdin
.
Reading Atom feeds in a terminal might not be the best way to get news from a website, but this simple example learns us how to convert arbitrary XML to relational data. And of course, we can generate multiple relations from a single XML using a single XQuery script. XQuery can be also used for operations like JOIN or UNION and for filtering and other transformations as will be shown in further examples.
Relational pipes, open standard and free software © 2018-2022 GlobalCode