Mentality of an Antifragile Data Quality Team

Dan Dutrow
Qualytics
Published in
5 min readDec 16, 2021

--

“Some things benefit from shocks; they thrive and grow when exposed to volatility, randomness, disorder, and stressors and love adventure, risk, and uncertainty. Yet, in spite of the ubiquity of the phenomenon, there is no word for the exact opposite of fragile. Let us call it antifragile. Antifragility is beyond resilience or robustness. The resilient resists shocks and stays the same; the antifragile gets better.” Nassim Taleb, Antifragile

What is the mentality of your data quality team? Are they passive, reactive, or proactive? Are they building a fragile data quality pipeline, or are they building it to be antifragile?

A passive team follows their standard operating procedures. When they encounter a data quality incident, they follow protocol, alerting the data owner of the issue and asking them to correct the data at their earliest convenience. If that issue doesn’t dramatically impact operations, that moment of convenience may never come. With some ingenuity, they will automate the notifications so that the support team isn’t overwhelmed with writing tickets or emails all day. The data owner, when the question of data quality arises, acknowledges that they have some issues, and yes, overall data quality seems to be getting worse and it’s hard to trust the results. They loosen their data quality expectations and bemoan their AI/ML systems that never seem to get beyond their false positives and quiet failures. They accept the risk and figure that data quality mishaps is just part of being a data driven business.

A reactive team follows their protocols, but their standards are higher and they do real incident management. They will analyze the impact of the data quality issue, search for how wide the data has dispersed through downstream systems, plan for how to correct the (known) issues, and then execute a recovery plan. They contain, eradicate, and recover from the incident. They get it mostly right, at the cost of significant time, possibly outages, and reduced confidence. They build tools and muscle-memory to respond more quickly the next time, and there certainly will be a next time. At least they’ve reduced the consequence of that risk by building more robust response plans.

A proactive team starts off right and puts forth an appropriate level of initial effort. They anticipate many of the likely failures, the engineers craft their workflows to detect and respond to those failures. They build tests and instrument checks using their tooling. Not all types of incidents can be anticipated, but if you catch 80%, you’re doing pretty good. If something slips through, and in the eyes of the engineers it may happen again, they will add another check or two for that particular pipeline. At that point, the team’s done their due diligence, and any critical data problems were probably due to some external incompetence (or lack of data quality) that couldn’t have been anticipated or prevented.

Then there is the antifragile team. Whether they start off proactive or reactive doesn’t particularly matter. What matters is that every time there is a surprise, they build up a layer of defense around not just that problem, but every place where that problem may occur. (When one levee breaks, all levees are reinforced.) The team goes beyond the specific problem, anticipates similar related problems, and layers solutions to that across their enterprise. (Because of Hurricane Katrina in New Orleans, New York was more prepared for Superstorm Sandy.) The antifragile team is able to execute that strategy by putting a comprehensive system in place that allows for the solution to a local problem to be applied globally. The volatility of data in whole allows for more and more of the system to be stressed, building muscle around anywhere there is a tear.

“The antifragile loves randomness and uncertainty, which also means — crucially — a love of errors, a certain class of errors. Antifragility has a singular property of allowing us to deal with the unknown, to do things without understanding them — and do them well.” Nassim Taleb, Antifragile

Antifragile teams use systems which may be built in-house or in partnership with a vendor. They build increasingly deeper levels of sophistication to proactively mitigate data quality threats. By participating in a community, such as a large enterprise or through a significant customer base, any member of the community can identify a threat. They can then work with their team or vendor to not just solve the problem once but for everyone and for all time. They benefit not just from the issues they’ve seen, but all the issues that others in the community have seen. Layers of capability have been built into the entire system and the triumphs of one member is a win for all.

Come join our antifragile community at Qualytics!

Qualytics is the complete solution to instill trust and confidence in your enterprise data ecosystem. It seamlessly connects to your databases, warehouses, and source systems, proactively improving data quality through anomaly detection, signaling, workflow and enrichment. Check out our other blogs to learn more about how you can start trusting your data. Contact us at hello@qualytics.co.

--

--

Dan Dutrow
Qualytics

Experienced engineering leader who manages people, process, and technology on cross-functional, multidisciplinary teams.