Is small data holding back your ambitions to be more data driven?
Author:Simon Trewin founder DataOps Thinktank, founder DataOps Academy, Author of The DataOps Revolution.
What is small data?
Small data is all of that information that you attach to your operational data / big data to be able to make sense of it or to transform it into business insight. Typically, it is owned very close to the where data is leveraged for operations, or insight. It helps with cleansing, grouping, aggregating, filtering, and tagging, or it helps to drive a business process.
Why is it needed?
Small data is needed because the use cases for operations /data insight change rapidly. This cadence is generally too fast for enterprise IT to keep up with and I would argue that it is something that they should not try to keep up with.
What are the Organisational Challenges wishing to be data driven and incorporate ML and AI?
The challenges are that often the truth in terms of data only exists once data pipelines have passed through the small data filters and checkers. Therefore, the accuracy of machine learning and AI models and is hindered by the fact they do not have access to the truth. The CDAO strategies are held back by their ability to leverage the right information.
Small data is also very easy to copy and to reuse making it hard to maintain and master. It exists within reporting systems and end user applications like Excel and is emailed around, linked, and reused. It is the source of reporting errors that can lead to regulatory fines, missed opportunities, and bad decisions. It can also lead to many versions of the truth preventing organisations from knowing the true state of things and preventing them from making decisions.
It often gets complex making it hard to unwind and building up organisational inertia which makes it hard to move forward with a digital strategy. It needs to be combined in the overall data strategy for the organisation but is often considered too hard and complicated to tackle.
What you need to do
The key to small data is to be able to democratise it incorporating data quality controls and master it to empower your employees. This needs to be done incrementally in a system that provides secure transparency through lineage, usage statistics, and links to business terminology. This system should provide an easy migration of assets to enterprise systems for the purpose of enabling the digital enterprise.
To deal with the complexity you need to automate the analysis of your existing estate efficiently and effectively to be able to group your small data by complexity, importance, risk and dependencies. You can then prioritise the actions to take to make improvements, at this stage you should be able to track improvements through time to see the changes made and the changes still required.
For some small data it is OK for it to remain as small data however for completeness this should be logged and monitored and kept up to date through time.
Small data is essential in any organisation to bridge the gap from operational data to business processes and knowledge. It moves quickly due to the nature of changing business requirements; it quickly can become complex and will introduce poor data quality through duplication and inconsistencies. As a CDO you need to incorporate it into your overall strategy if you truly want to deliver the data driven enterprise.