When the stable version will be released?
We don't know – there is no exact date.
Relational pipes are something that should be released about twenty years ago. But real work started in 2018.
So it is not a big difference whether it will be released this month or the next one.
We understand the release early, release often rule.
But it fits better to application software than to standards and APIs.
Of course, we expect some evolution after the v1.0.0 release, but we need to stabilize and verify many things before the release in order to be able to maintain backward compatibility in future.
When the project started?
The first commit was in the July 2018. Before that, there was nonpublic prototype which started in the April 2018.
But the original ideas are much older.
Predecessors of Relational pipes were SQL-API (September 2014) and alt2xml (January 2012).
The SQL-API was a prototype of an API for operating systems – it allowed SELECTing users, processes, fstab etc.
This prototype was based on PostgreSQL and Perl and is being replaced by particular modules of Relational pipes.
The alt2xml uses different data model (tree instead of relational) but despite that, it is based on the same ideas as Relational pipes: converting data from various formats to a uniform model, streaming and processing through reusable transformations (filters).
This tool might be developed in the future and can be used together with Relational pipes (e.g. read INI, JSON or Java properties files using alt2xml and pass them to the relpipe-in-xmltable
and continue with the processing in the relational way).
How can I help you?
\0
).
This "API" will be supported for sure and data are simply the attribute values. There are no record separators (we know the number of attributes, so they are not needed).
Disadvantage of this approach is that the stream can contain only a single relation; and that the metadata are not embedded in the stream and must be passed separately.
Why do you speak about relations instead of tables?
It might be uncommon terminology for someone, but relations and attributes symbolizes
that we focus on substance of the data. Pure data are conveyed through the pipelines
and the presentation of such data is only the last step.
The data might be presented/visualized in many various forms.
And tables (consisting of rows and columns) are only one of many possible options.
Relational | SQL | alternative terms |
relation | table | |
attribute | column | field |
record | row | tuple |
What about duplicate records?
In the relational model, the records must be unique.
In Relational pipes there is no central authority that would prevent you from appending duplicate records to the relational stream.
It means that in some points in the relational pipeline there might occur data that do not fit the rules of the relational model.
The deduplication is generally not done on the output side of particular steps, but is postponed and done on the input side of steps, where uniqueness is important (e.g. JOIN or UNION).
You should not put duplicate records in the relational stream, but you can.
Duplicates can also occur after some transformations like relpipe-tr-cut
(e.g. if you choose only dump
or type
attributes from your fstab
and omit the primary/unique key field).
Such data are not considered invalid, but should be processed like there are no duplicates (if uniqueness is important for particular step)
or should be passed through if it is not in conflict with the goal of given step (e.g. calling uppercase()
function on some field or doing UNION ALL).
Each tool must document how it handles duplicate records.
The reasons for this transient tolerance of duplicate records are two.
1) Performance: guaranteeing the uniqueness in every moment would negate streaming and would require holding whole relation in memory and always sorting the records.
2) Modularity: many tasks would have to be done by a single bulky tool that does everything e.g. if you want to cut only the type
field from your fstab
and then count statistics how many times particular filesystems are used.
Why C++?
Firstly, Relational pipes are a specification of a data format and as such are not bound to any programming language.
This specification is totally language- and platform- independent.
The ideal/perfect language does not exist and our implementations will be written in various languages. We started our prototype and first real implementations in C++ for several reasons:
Implementation in other languages will follow. Java is the next one. Then probably Perl, Python, Rust, Go, PHP etc. (depends on community involvement).
Are Relational pipes compatible with cloud, IoT, SPA/PWA, AI, blockchain and mobile-first? Should our DevOps use it in our serverless hipster fintech app with strong focus on SEO, UX and machine learning?
Go @#$%& yourself. We are pretty old school hackers and we enjoy our green screen terminals!
Of course, you can use Relational pipes anywhere if it makes sense for you.
Relational pipes are designed to be generic enough – i.e. not specific to any industry (banking, telecommunications, embedded etc.) nor platform.
Data in this format are very concise, so can be used even in very small devices.
Its native data structure is a relation (table) but it can also handle tree-structured data (i.e. any data).
It is designed rather for streaming than for storage (but under some circumstances it is also meaningful to use it for storage).
What about your hobbies?
It is a bit personal question, but I can unveil that I collect signed photos of Ally Sheedy, Winona Ryder and Richard Stallman.
Relational pipes, open standard and free software © 2018-2022 GlobalCode