In recent meetings with clients we have come across a lot of instances of the need to improve the art of capturing requirements and building the 'Solution Contract'. Typically, these projects are large data analytics and reporting projects where the data is spread across the organisation and needs to be pulled together and analysed in particular ways. A typical data science, data engineering task in today's data centric world.
The problem that people are describing is that they are quite often asked to solve problems that the data does not support, or the requirements process does not extract the true definition of what a solution should be.
We are asked “how can you improve the process of requirements negotiation using the DataOps methodology?”
Kinaesis breaks down its DataOps methodology into the 6 IMPACT pillars. These are;
• Meta Data
• (Extensible) Platforms
• (Collaborative) Analytics
The requirements process is predominantly within the Target Pillar with leverage of the Instrumentation and Meta Data pillars. The Target pillar starts with establishing the correct questions to ask within the requirements process. These questions recognise the need to establish not only output, but people, process and data. You should ask a series of questions to capture this for the immediate requirement, but within the context of an overall Vision.
The second step is then Instrumenting the data and Meta Data. It is important to capture these efficiently and effectively using tools and techniques, but also to run profiling of the data to match to the model and check feasibility. Through the results of this process you can then work Collaboratively with the sponsor and stakeholders to solve the data and process requirements. Using data prototyping methods to illustrate the solution further assists in communicating the agreed output and the identified wrinkles which helps to build the collaboration through shared vision.
We find in our projects that following a structured approach to this part of the project yields results of building consensus, establishing gaps and building trust.
In one particular client engagement this improved the delivery velocity to the level that is the difference between success and failure. In another client engagement we are able to deliver very large and complex problems within an incredibly tight timescales.
The key point is that requirements definitions are there to build a shared contract that defines a solution that is achievable, therefore you need to include the DataOps analysis into the process to achieve the results that you want.
An ex colleague and I were talking a few days ago and he mentioned people don't want to buy DataOps. To this I have given some thought and can only come up with the conclusion that I agree, in the same way that I can agree that I don't want to buy running shoes, however I want to be fit and healthy. It is interesting to find so many people that are happy with their problems that a solution is less attractive.
The way to look at DataOps methodology and training is that it is an investment in yourself, or your organisation that enables you to tackle the needs that you struggle to make progress on. The needs that might resonate more with you are machine learning and AI, Digitisation, Legacy migration, Optimisation and cost reduction, improving your customer experience, and improving the speed to delivery and compliance of your Reporting and Analytics.
The DataOps methodology provides you with a recipe book of tools and techniques that if followed enables you to deliver the data driven organisation.
At one organisation that works with us, they found that working with the DataOps tools and techniques enabled them to deliver an important regulation in record time, rebuild the collaboration between stakeholders and form a template for future projects. For more information then please feel free to reach out to me.
What is the extensible platforms pillar within the DataOps methodology? The purpose of the platform within DataOps is to enable the agility within the methodology and to recognise the fact that data science is evolving rapidly. Due to the constant innovation around tools, hardware and solutions, what is cutting edge today could well be out of date tomorrow. What you need to know from your data today may only be the tip of the iceberg once you have productionised the solution and the next requirement could completely change the solution you have proposed. To address this issue, DataOps requires an evolving and extendable platform.
Extensibility of data platforms is delivered in a number of different ways through:
• Infrastructure patterns
• A DataOps Development Approach
• Architecture Patterns
• Data Patterns
In most large organisations, data centres and infrastructure teams have many competing priorities and delivery times can be as much as 6-9 months for new hardware. With data projects this can be the difference between running through agile iterations or implementing waterfall where you collect requirements to size the hardware upfront. To manage the risks, project teams either over order hardware creating massive redundancy or to keep costs down, under order and then have large project delays. An example of this are Big data solutions requiring a large number crunching capability to process metrics which stresses the system for a number of hours each day, but after this the infrastructure sits idle until the next batch of data arrives. The cost to organisations of redundant hardware is significant. To address this the developing answer is the cloud where servers can be set up with data processes to generate results and then brought down again reducing the redundancy significantly. Grids and internal clouds offer an on premises option. To migrate and leverage this flexibility, organisations need to consider their strategy and approach for data migration where lift and shift would duplicate data therefore meaning incremental reengineering makes more sense.
DataOps Development Approach
A DataOps development approach enables the integration of Data Science with Engineering leading to innovation reaching production quality levels more rapidly and at lower risk. Results with data projects are best when you can use tools and techniques directly on the data to prototype, profile, cleanse and build analytics on the fly. This agile approach requires you to build a bridge to the data engineers who can take the data science and turn it into a repeatable production quality process. The key to this is a DataOps development approach that builds operating models and patterns to promote analytics into production quickly and efficiently.
One of the challenges in driving innovation and agility in data platforms forwards is the architecture of production quality data with traceability and reusable components. Too small and these components become a nightmare to join and use, too large and too much is hardcoded hampering reuse. Often data in production will need to be shared with the data scientists. This is difficult because the production processes can break a poorly formed process, and poor documentation can lead to numbers being used out of context. Complexity exists where outputs from processes become inputs to other processes and sometimes in reverse creating a tangle of dependencies. The key to solving this is building out architecture patterns to enable reuse of common data in a governed way, but with the ability to enrich the data with business specific content within the architecture. Quality processes need to be embedded along the data path.
The final challenge is to organise data within the system in logical patterns that allow it to be extended rapidly for individual use cases, but to form a structure from which to maintain governance and control. Historically and with modern tools, analytical schemas enable slice and dice on known dimensions which is great for known workloads. To deliver extensibility, DataOps requires a more flexible data pattern to generate either one off analytics or to tailor analytics to individual use cases. The data pattern and organisation needs to allow for trial and error but with this there is a need to have discipline. Meta data should be kept up to date and in line with the data itself. External or enrichment data needs to be integrated almost instantly and removed again, or promoted into a production ready state. To do this you need patterns which allow for the federation of the data schemas.
The capabilities above combine to enable you to create an extensible platform as part of an overall DataOps approach. Marry this up with the other 5 pillars of DataOps then each new requirement should become an extension to your data organisation rather than a brand new system or capability.