WHAT: Natural-language communication offers a model for agile data architecture.
SO WHAT: The cost of new development is the shadow of inadequate data architectures.
NOW WHAT: Keep watching for progress from FluidInfo.
===
WHAT?
In my last post, "Is Metadata?", I speculated that Terry Jones proposes a new grammar and a "new literacy" with his fluid information architecture. I suggested that the artificial and limited constraints imposed by short-term business architectures block adaptation. George Orwell hinted the same in Politics and the English Language, and then again, more famously, in 1984: Control the information system, and you control thought. Control thought, and you control the information system. A nice, total(itarian), closed loop. However, control language and you choke off clarity. Adaptability dies, markets starve, nobody extends credit, and there's no more funding for new databases. Sound familiar?
(I'll bet "Home_equity_value" had to be a positive integer in some of those financial data models....)
In this post I speculate about how adaptive, less controlled data-sharing architectures can work for us.
Roman Jakobson argued famously that poetry shows us, as if in vitro, how natural language works in vivo: it leaks, drips, evaporates, condenses, gets more and less viscous...in other words, it flows. New, never-uttered words and sentences are born and die every day, and the language systems that survive are those that have allowed this adaptive re-combination. They allow this because through poetic functions and devices, they keep re-adjusting the message. And that's why poetry exists in every language we know: it takes very seriously the "playing with language" that lets language evolve, adapt, and survive. If only a fluid database with a sufficiently easy-to-adopt user interface could enable people to "play with data." Then enterprises large and small could "speak with data" and respond with agility to the world.
If only.
Jakobson, a Russian-born linguist, looked at poetry to see what it could teach us about natural-language communication. He decomposed natural-language communication into fluid zones: a Sender-Message-Receiver zone (start thinking of a horizontal line) and a “Code” zone (ok, now hold that horizontal and start to think vertically). Here's a nice schematic.
Suppose I (Sender) scream something (Message) about the world (Context) at you (Receiver). Because you have learned how to process language, you devote some mental bandwidth to deciding how much of this event is just me screaming expressively, what the words I am screaming refer to, and how much, if anything, I expect you actually to do about it. Jakobson called all of these points of emphasis "functions" in the communication chain.
Culture taught you how to process language using these functions. By trial and error, you learned to pick out the "dominant" function of the communication.
Ultimately, culture teaches us what to look for, so we can tell a poem from a paper on fluid dynamics, or--which is harder-- so we can distinguish screaming about a fire on the stove from screaming about a goal in the championship.
Poetry, Jakobson pointed out, happens when the main function of the communication is the "Message." This "poetic function" works with the "Code" or "Metalingual" function, to sharpen our awareness of the Code, because we look at the message over and over again. As we do this, it almost becomes opaque. We hover there, in the vertical zone, so we can confirm that we are using language right: “Whenever the addresser and/or the addressee need to check up whether they use the same code, speech is focused on the Code: it performs a metalingual function.” This vertical zone is also the zone where metadata lives.
Why do we do this Metalingual thing?
- We encode (produce) reality in information-shells ("representations") so we can extend the reach of our senses.
- We decode (consume) these representations to survive in diverse landscapes.
- We change our codes when landscapes change (sometimes a chicken-and/or-egg proposition...)
- ERGO…We "metalingually" check up on each other’s codes--we keep messing with the message-- because we want this subroutine of an algorithm to keep working: sense, intend, encode, send, decode, verify, succeed or fail, re-invent.
- We repeat this overall linguistic algorithm...repeatedly.
So, to repeat: Where the message itself dominates, the "poetic function" is at work. Poetry and metadata both perform this "checking up on codes." Poetic function is at work whenever somebody repeats a phrase they like, or invents a new word, or laughs with joy at the way something is said. But when we are pausing from pragmatic language to "check up," we contemplate the data AS WELL AS the reality that's encoded there in the message. Meanwhile, the poetry/metadata is always saying, "This is not reality. It's just words. It's just the world according to data." A couple of poets have said this more quotably:
W. H. Auden - "Poetry makes nothing happen."
Marianne Moore - "Imaginary gardens with live toads."
So, ahem, back to business. Enter data. Data is a representation of facts about things. This much we have learned. Data is NOT facts about things.
In Jakobson's study of language use, he saw speakers grow dull, robotic, automated, the more they took the representation/data for the reality. This habit, he felt, of never questioning the code, creates the risk that users will fail to adapt to reality. Poetry mitigates this risk, because poetry wakes us up to the reality that language, like data, is just a representation, and needs to be continually improved. And that improvement is what poetry does.
SO WHAT?
Metadata needs to be part of the data, like poetry. If we want to survive in a changing landscape, we can't afford to treat metadata separately, or secondarily. Wherever we do that, the database is destined to die an untimely death, and we can expect the cost of operations, maintenance, and process breakdowns to rise.
If we can allow data to flow to new processes, we will have fewer needs for specialized application architectures. This is why I say the cost of new development is the shadow of inadequate data architecture.
Database design is waiting for natural-language fluidity. When that is possible, databases can self-organize as users collectively represent local realities. When this happens, operational efficiency will tolerate the disorder that's common in natural language--that is, unlike planned and controlled (and by definition out-of-date) data structures, a fluid design allows for a certain amount of exploratory disorder so that new data sets can continue to evolve fitter representations of reality. This is true, long-range stability, but it calls for a significant paradigm shift.
Why do we need a paradigm shift?
To answer, let's look at a traditional attempt to solve this problem, created by one of the fathers of data warehousing, the famous and continuously employed data consultant Bill Inmon. Inmon offers a “Corporate Information Factory” architecture.
Inmon's architecture acknowledges the reality of short budget-cycles, and promises to prevent the creation of expensive data stovepipes by creating "a long-term enterprise-wide information architecture that is substantively different from the stovepipe environment." He lists these features:
• Responsive to changing conditions,
• A low cost of information,
• Fast response time,
• Inclusion of Web-based processing,
• Non-overlapping processes and data,
• The ability to handle very large volumes of data,
• The foundation for sharing data among different agencies,
• Holistic security
It looks like Inmon is after the same kind of clarity that Orwell sought. And from this I conclude that he wants from his framework the same kind of flexibility we get from natural language. In fact, we can line up Inmon's CIF benefits with Orwell's advice, which closes "Politics and the English Language":
Inmon -> Orwell
• Responsive to changing conditions -> 1. Never use a metaphor, simile, or other figure of speech which you are used to seeing in print.
• A low cost of information -> 2. Never use a long word where a short one will do.
• Fast response time -> 3. If it is possible to cut a word out, always cut it out.
• Non-overlapping processes and data -> 4. Never use the passive where you can use the active.
• The foundation for sharing data among different agencies -> 5. Never use a foreign phrase, a scientific word, or a jargon word if you can think of an everyday English equivalent.
Inmon asserts that managers build applications that they can afford, according to requirements that they deliberately intend, but that, “relative to the long-term horizon of information systems, the [budget and] career cycle is very short.”
The Grand Assumption here, though, is that the "information systems" are the hardware and software. After years of trying to build data warehouses, we now see a surge in the direction of "data governance," and, though admirable, in a decentered world, submission to governance is struggling. People just don't like it. Why? Because although the system is universal, the data is always local.
The inter-business horizon of "information systems" needed to conduct transactions is more important than the life-cycle of a physical information system within a business. The recent decomposition and recombination of organizations, especially data-driven organizations, is illustrating this truth. And similarly, the human enterprise that consumes data in the trillions of dollars’ worth daily is NOT a centrally managed enterprise, but instead a swarm of communications illustrating all of Jakobson's functions all the time. Sure, among this swarm there are clusters we call organizations. But these are changing too fast for their own good, and in fact massive data assets are preventing them, and not enabling them, to grow (as you know if you have ever relied upon a third-party partner). ( And see my post on the cell-phone personal supply chain.)
NOW WHAT?
To conclude: the budget-cycle view of information systems is the equivalent of an oral culture (budget is here today, gone tomorrow) clashing with a written culture (data persists and persists and persists). ( See my post on oral vs. literate data cultures. )
Look to the fluid database for the next-generation evolution of data and application architecture.
When there is a structure for allowing data to flow and be rejoined, without a controlling API, "data-speakers" in a changing world, like natural language speakers in a changing world, can adapt their codes to the world in which they actually find themselves, able to write and create new representations…not just to parrot the illiterate, pre-programmed, corporate, or political ones.