The Sancho Panza Test for Data Quality
I just had a conversation with Dylan Jones, editor and publisher of DataQualityPro, in Stratford-upon-Avon, England. During the conversation we agreed that data-management professionals who "speak data," and people who benefit from the quality, often do not speak the same dialects.
This made me realize a delicious disambiguator, and I'm sure I'm not the first to taste it: SELECT value FROM Experience WHERE DQ < > DQ;
To resolve ambiguities, it's useful to listen to the Sancho Panzas of the organization: the humble workers who stand by, deal with, and adjust to the company's data, of whatever quality, on a daily basis. Find out what they say and how they say it, and you may find the fast route to the data that needs correcting.
I call this the Sancho Panza test: if you can explain your DQ initiative's objective in proverbial terms--terms that make intuitive sense to the people in the organization--, it probably will succeed.
Influences and Influencers
It's not just IT vs. the business. It's hard for a person with specific data management expertise, who "speaks data," to reach the people in the organization who are the influencers--people who "speak WITH data" and can give a DQ effort the boost it needs to succeed. Similarly, these influencers are effective because of hidden forces in the organization, tribal undercurrents, that are the influences on how information gets produced and consumed.
As a result, more often than not, the promised claims of data quality initiatives look or sound quixotic--hard to understand, nice to dream about, impossible to achieve.
People who understand data, who "speak data" at the concrete level of data management, may speak very differently from people who use information. They bring wizard-like wisdom to how data systems actually work. For working this magic, they are both trusted and mistrusted by their business managers. These two tribes need each other, and mining that dialectic is critical to a successful data-quality initiative.
This mining includes understanding the social-status dimension of the organization. Both tribes--and both types of DQ specialist--claim that their contributions are the ones that really matter. The "data speakers" insist that if you can't model/collect/store/move/publish data correctly, you won't have a business at all. The "context-speakers" are too busy to be confused with these details--the business pays for this data, so it needs to serve the business.
Miguel de Cervantes to the Rescue
Consider this analogy: After he's read lots of exciting romantic novels about knights and giants, Don Quixote (let's call him DQ) really believes that a windmill is a giant. Off he goes off to attack it, like an organization trying to do business with weak data and a whole lot of enthusiasm for its business logic. And Sancho Panza represents the faithful DQ consultant, who patiently and persistently picks up his bruised master while pointing out that, uh, that's a windmill.
But this is why they call it a great book: the analogy reverses without losing value. And when we reverse the analogy, we uncover the Sancho Panza test for DQ consultants.
This time, think of Don Quixote as the data-quality specialist or even the data management specialist or software vendor, bringing to the world his specialist's perspective and vocabulary and enthusiasm, influenced by the books he's read, visioning everyday business practices, with his value added, as goldmines for the organization. Meanwhile Sancho Panza represents the person who does a practical job every day, who knows what works around here and what doesn't.
I advocate to Data Quality (let's call it DQ) consultants that they listen to this Sancho Panza, and consider themselves as Don Quixote. Sancho doesn't know much about data, but he knows what he likes. He likes information that he can process in his proverbial way. He's open to listening, but slow to change, and he'll tell you what he thinks.
Cervantes created the wise fool Sancho Panza to accompany DQ on his picaresque journey. Thanks to this foiling arrangement, readers of the novel discover that while they find Sancho Panza a relief from the knight's wacky imagination, in the end they enjoy Don Quixote just as much, for his willingness to see an elevated world. The two together help to clarify each one's value.
In contrast to DQ's high-flown language about common things, Sancho spouts sayings like, "Hunger is the best sauce."
(The poor father of a poor family, he knows what he's talking about. If all politics is local, and all information is local, it's possible that this little proverb speaks to the politics of data quality. Power issues about data happen where the business context (the hunger) realizes the data's value (the sauce). And DQ professionals must admit that the pain and shame of getting the data wrong, or the wrong data, are also realized in the live business context, not when the data's sitting there in the warehouse.)
Here's an example from the novel: Sancho Panza has just stopped Don Quixote from attacking a group of actors who were putting on a play that DQ didn't like.
"The sceptres and crowns of those play-actor emperors," said Sancho, "were never yet pure gold, but only brass foil or tin."
"That is true," said Don Quixote, "[...] in the comedy and life of this world, where some play emperors, others popes, and, in short, all the characters that can be brought into a play; but when it is over, that is to say when life ends, death strips them all of the garments that distinguish one from the other, and all are equal in the grave."
"A fine comparison!" said Sancho; "though not so new but that I have heard it many and many a time, as well as that other one of the game of chess; how, so long as the game lasts, each piece has its own particular office, and when the game is finished they are all mixed, jumbled up and shaken together, and stowed away in the bag, which is much like ending life in the grave."
"Thou art growing less doltish and more shrewd every day, Sancho," said Don Quixote.
"[A]lways, or mostly, when Sancho tried to talk fine and attempted polite language, he wound up by toppling over from the summit of his simplicity into the abyss of his ignorance; and where he showed his culture and his memory to the greatest advantage was in dragging in proverbs." (Chapter XII)
Applying the Sancho Panza Test
To understand the "culture and memory" of a company, keep a notebook of the habits and sayings of information workers who would be affected by the data quality initiative. Mining these sayings can help to assess the culture to which you are bringing the DQ message, and can help to identify where DQ might attack successfully, and where DQ might go on fruitless and painful quests.
I worked with a company of 25,000 employees where a singular speech habit prevailed throughout the company: Everybody who wanted to be anybody used the word "So" to start a sentence. They did it in place of "Um"--as if the speaker were announcing a sophisticated conclusion (like "therefore," but more casual) from data, but instead of using data, they were invoking the spirit of data, like Don Quixote's magic books.
This happened when a conversation was started, when a question was answered, even in the middle of an otherwise normal conversation. It was a "language gesture," a way of establishing authority because it borrowed from the logic of inductive conclusion that was a meme in the company's culture. It also meant that managers made their mark not by performing to specifications, but by coming up with insightful ideas. In this culture, more data is better, and quality is second-rate, because at the mid-manager level, the behavior of LOOKING and SOUNDING analytic got jumbled up (like the kings with the pawns) with the true value-added analysis that led to dollars and cents.
Moreover, in this culture, actual data quality, if it were delivered, risked exposing these off-the-cuff analysts. Their analysis might easily be flawed, which carried with it political risk: in a culture that valued analysis, the ability to convince others that you are analytic carried as much political weight as the analysis itself. To be exposed would mean to lose face.
Solution?
The best step we made in this environment was to ask senior executives to declare themselves owners of key reports. From this position we could identify the metrics on those reports that mattered most, and then drill down from there to the data elements whose quality would have the greatest effect. When we asked the report generators to identify the data that mattered, we got quixotically unmanageable numbers of attributes, because they regarded everything they did as important--and they claimed to "speak data" by naming granular systems and sources, rather than principles of correction that would add lasting value.