The Sancho Panza Test for Data Quality
I just had a conversation with Dylan Jones, editor and publisher of DataQualityPro, in Stratford-upon-Avon, England. During the conversation we agreed that data-management professionals who "speak data," and people who benefit from the quality, often do not speak the same dialects.
This made me realize a delicious disambiguator, and I'm sure I'm not the first to taste it: SELECT value FROM Experience WHERE DQ < > DQ;
To resolve ambiguities, it's useful to listen to the Sancho Panzas of the organization: the humble workers who stand by, deal with, and adjust to the company's data, of whatever quality, on a daily basis. Find out what they say and how they say it, and you may find the fast route to the data that needs correcting.
I call this the Sancho Panza test: if you can explain your DQ initiative's objective in proverbial terms--terms that make intuitive sense to the people in the organization--, it probably will succeed.
Influences and Influencers
It's not just IT vs. the business. It's hard for a person with specific data management expertise, who "speaks data," to reach the people in the organization who are the influencers--people who "speak WITH data" and can give a DQ effort the boost it needs to succeed. Similarly, these influencers are effective because of hidden forces in the organization, tribal undercurrents, that are the influences on how information gets produced and consumed.
As a result, more often than not, the promised claims of data quality initiatives look or sound quixotic--hard to understand, nice to dream about, impossible to achieve.
People who understand data, who "speak data" at the concrete level of data management, may speak very differently from people who use information. They bring wizard-like wisdom to how data systems actually work. For working this magic, they are both trusted and mistrusted by their business managers. These two tribes need each other, and mining that dialectic is critical to a successful data-quality initiative.
This mining includes understanding the social-status dimension of the organization. Both tribes--and both types of DQ specialist--claim that their contributions are the ones that really matter. The "data speakers" insist that if you can't model/collect/store/move/publish data correctly, you won't have a business at all. The "context-speakers" are too busy to be confused with these details--the business pays for this data, so it needs to serve the business.
Miguel de Cervantes to the Rescue
Consider this analogy: After he's read lots of exciting romantic novels about knights and giants, Don Quixote (let's call him DQ) really believes that a windmill is a giant. Off he goes off to attack it, like an organization trying to do business with weak data and a whole lot of enthusiasm for its business logic. And Sancho Panza represents the faithful DQ consultant, who patiently and persistently picks up his bruised master while pointing out that, uh, that's a windmill.
But this is why they call it a great book: the analogy reverses without losing value. And when we reverse the analogy, we uncover the Sancho Panza test for DQ consultants.
This time, think of Don Quixote as the data-quality specialist or even the data management specialist or software vendor, bringing to the world his specialist's perspective and vocabulary and enthusiasm, influenced by the books he's read, visioning everyday business practices, with his value added, as goldmines for the organization. Meanwhile Sancho Panza represents the person who does a practical job every day, who knows what works around here and what doesn't.
I advocate to Data Quality (let's call it DQ) consultants that they listen to this Sancho Panza, and consider themselves as Don Quixote. Sancho doesn't know much about data, but he knows what he likes. He likes information that he can process in his proverbial way. He's open to listening, but slow to change, and he'll tell you what he thinks.
Cervantes created the wise fool Sancho Panza to accompany DQ on his picaresque journey. Thanks to this foiling arrangement, readers of the novel discover that while they find Sancho Panza a relief from the knight's wacky imagination, in the end they enjoy Don Quixote just as much, for his willingness to see an elevated world. The two together help to clarify each one's value.
In contrast to DQ's high-flown language about common things, Sancho spouts sayings like, "Hunger is the best sauce."
(The poor father of a poor family, he knows what he's talking about. If all politics is local, and all information is local, it's possible that this little proverb speaks to the politics of data quality. Power issues about data happen where the business context (the hunger) realizes the data's value (the sauce). And DQ professionals must admit that the pain and shame of getting the data wrong, or the wrong data, are also realized in the live business context, not when the data's sitting there in the warehouse.)
Here's an example from the novel: Sancho Panza has just stopped Don Quixote from attacking a group of actors who were putting on a play that DQ didn't like.
"The sceptres and crowns of those play-actor emperors," said Sancho, "were never yet pure gold, but only brass foil or tin."
"That is true," said Don Quixote, "[...] in the comedy and life of this world, where some play emperors, others popes, and, in short, all the characters that can be brought into a play; but when it is over, that is to say when life ends, death strips them all of the garments that distinguish one from the other, and all are equal in the grave."
"A fine comparison!" said Sancho; "though not so new but that I have heard it many and many a time, as well as that other one of the game of chess; how, so long as the game lasts, each piece has its own particular office, and when the game is finished they are all mixed, jumbled up and shaken together, and stowed away in the bag, which is much like ending life in the grave."
"Thou art growing less doltish and more shrewd every day, Sancho," said Don Quixote.
"[A]lways, or mostly, when Sancho tried to talk fine and attempted polite language, he wound up by toppling over from the summit of his simplicity into the abyss of his ignorance; and where he showed his culture and his memory to the greatest advantage was in dragging in proverbs." (Chapter XII)
Applying the Sancho Panza Test
To understand the "culture and memory" of a company, keep a notebook of the habits and sayings of information workers who would be affected by the data quality initiative. Mining these sayings can help to assess the culture to which you are bringing the DQ message, and can help to identify where DQ might attack successfully, and where DQ might go on fruitless and painful quests.
I worked with a company of 25,000 employees where a singular speech habit prevailed throughout the company: Everybody who wanted to be anybody used the word "So" to start a sentence. They did it in place of "Um"--as if the speaker were announcing a sophisticated conclusion (like "therefore," but more casual) from data, but instead of using data, they were invoking the spirit of data, like Don Quixote's magic books.
This happened when a conversation was started, when a question was answered, even in the middle of an otherwise normal conversation. It was a "language gesture," a way of establishing authority because it borrowed from the logic of inductive conclusion that was a meme in the company's culture. It also meant that managers made their mark not by performing to specifications, but by coming up with insightful ideas. In this culture, more data is better, and quality is second-rate, because at the mid-manager level, the behavior of LOOKING and SOUNDING analytic got jumbled up (like the kings with the pawns) with the true value-added analysis that led to dollars and cents.
Moreover, in this culture, actual data quality, if it were delivered, risked exposing these off-the-cuff analysts. Their analysis might easily be flawed, which carried with it political risk: in a culture that valued analysis, the ability to convince others that you are analytic carried as much political weight as the analysis itself. To be exposed would mean to lose face.
Solution?
The best step we made in this environment was to ask senior executives to declare themselves owners of key reports. From this position we could identify the metrics on those reports that mattered most, and then drill down from there to the data elements whose quality would have the greatest effect. When we asked the report generators to identify the data that mattered, we got quixotically unmanageable numbers of attributes, because they regarded everything they did as important--and they claimed to "speak data" by naming granular systems and sources, rather than principles of correction that would add lasting value.
The obvious lesson to be learned from this analogy is that DQ (be they Don Quixote or Data Quality consultants) sometimes tilt their lances at Windmills; they put their efforts into fighting false enemies, often to the detriment of addressing the real challenges. If only they listened to the Sancho Panzas of the world they might understand the context of the data quality issues and focus on fighting the battles that matter - the ones that deliver real value to the business.
Posted by: Steve Tuck, Datanomic | March 23, 2009 at 06:01 PM
I thought that this article was fantastic for two reasons.
First, although I started and ended my academic pursuits with the study of computer science, in between I was a literature major and Don Quixote was one of my favorites.
Second, I must admit that earlier in my career I occasionally failed the Sancho Panza test. As an “expert data quality consultant” enamored with my own knowledge and experience, I ignored the client’s Sancho Panzas and went tilting at windmills and my beautifully architected, wonderfully coded, elegantly implemented technical solution resulted in...complete and utter failure.
As my career advanced, I developed a useful Quixote/Panza split-personality that not only improved my perspective but also focused my efforts on understanding and collaborating with my client.
Posted by: Jim Harris | March 23, 2009 at 09:08 PM
An excellent post. It ties in well with a series of articles I wrote back in 2006 for the IAIDQ, and the theme of a lot of my conference presentations since then, about putting the Information Quality agenda back in the language of the business (ie in terms they'll understand) and also linking it to real objectives so that you can make the case for change.
Back in the day, unfortunately, all our articles were pdf'd, but the archives are being brought into HTML at the moment, so I'll post a link to them when they are available.
As Steve says, we need to ensure that we are putting data quality issues in context and only fighting the battles that really matter to the business - particularly in the current economic climate
Posted by: Daragh O Brien | April 03, 2009 at 04:48 AM
Great post Paul. It seems to me the reason this is a problem for data more than process, software, and infrastructure is that data isn't as far along in the general scheme of things. While influential constituencies have built up around the other three, we're just now beginning to understand what it means to manage data as a key resource. Traditionally data is seen as integral to the software, and not considered on its own merits.
Data consumers typically haven't seen the sausage factory from which their data emerges, and until the field is further developed SP and DQ will continue their quest. Part of the answer will be embedding in the culture data concepts, just as the culture overall has begun to understand concepts like software modularity.
The Sancho Panza/Don Quixote is something that we can apply in so many aspects of our lives - thanks for your post!
Posted by: Bob Lambert | May 03, 2009 at 09:38 PM