Marci Augustine, Director, Business Intelligence & Analytics, Xo Group Inc.
Businesses have leveraged data collected through business intelligence long before the advances in computing power, data storage, and Internet connectivity became things we take for granted. As technology advanced, the volume and the variety of available data have grown exponentially. This bevy of riches has come with more than a few challenges for analysts, who are charged with trying to make sense of it all and tell meaningful stories, allowing our stakeholders to make informed decisions. The best intelligence comes from large volumes of high-quality data. Often, solutions that solve one of these aspects can introduce challenges in others, and the key is implementing the solution that provides the best of both worlds.
Digital data collection utilized to analyze user behavior provides seemingly endless opportunity to know every imaginable detail. In the early days the possibilities were overwhelming and often challenging to capture, thus more care was taken to ensure the most critical data points were captured properly. The hallmark of a good analysis is that it not only answers the questions being asked of it, but it also inspires new ones. So naturally, as analytical capabilities and output increase the next questions, such as: “How can we get this information for more business processes?”, “When can we have this for all processes?” The natural next step is to find a better way to get more data faster.
The hallmark of a good analysis is that it not only answers the questions being asked of it, but it also inspires new ones, so naturally, as analytical capabilities and output increase the next questions
Set free of the constraints of costly data storage, engineering investment, and embracing the mantra “democratize the data”, individual departments can own their data and the responsibility for capturing anything and everything. This free for all ushered in a new onslaught of questions: “Why are there five different answers to the same question?”, “What is the source of truth?”, “Why is this single data point being captured in three different ways?” Suddenly analysts were finding they had too much of a good thing, and quite a bit of some not-so-good things as well.
Sorting through all of these questions and reconciling data is time-consuming and frustrating. Product and business owners have the time to think through their data needs in detail-they need to be focused on the resulting decisions. At the same time, data teams are not typically staffed to handle detailed curation for each individual stakeholder. These groups have to work together to develop a system to facilitate quality data collection at the speed of business. This can best be accomplished with a two-pronged approach, implementing technical safeguards to back up a well-managed process for identifying data requirements.
The key to getting support for this kind of approach is to first make it easy for people to start using and then demonstrate immediate value by showing results. Successful rollouts will start with a handful of basic templates that handle most use cases in the organization with close collaboration between business and technical teams to ensure everyone is on board. By templatizing the data requirements where possible, less time is required to identify what needs to be captured up front. There will be cases that don’t fit into the predefined templates, but those will be the exceptions, not the rule. Leveraging these best practices and evangelizing them across the organization allows data teams to create more meaningful, clean data without introducing huge bottlenecks in the process. At the end of the day, humans make mistakes, even with the best processes outlined and executed. That’s where a technical solution that can provide validation can provide a last line of defense to ensure the cleanest data possible for our most important business processes. A tool, whether custom built or off the shelf, which allows for the configuration of validation rules for each type of data collected, as well an opportunity to correct bad data before it goes too far. The robustness required in this tool will vary depending on the organization and the volume data collected, but it is a critical piece of the data quality puzzle.
Demonstrating success with a program like this among a pilot group will facilitate broader adoption in the organization. As working groups see analysts leveraging this process spending fewer cycles reconciling data or tracking down the meaning of specific data points and spending more cycles providing insights, they are incentivized to make the investment themselves. This can then have a spiraling effect across the organization with groups making the time to prioritize adopting these tools and process and over time clean, quality data readily available will be the rule, not the exception.