We are delighted to announce that Kinaesis has been recognised by Microsoft as a Power BI Partner! You can now see our profile on the Power BI directory here. We have provided a Power BI cloud-based solution (Kinaesis® Clarity KYR) which has been servicing our clients in the Investment Management sector for over two years. It’s fantastic to see our hard work with Power BI technology being acknowledged by Microsoft.
We would like to thank everyone who made this recognition possible and especially our clients whose kind recommendations were instrumental in getting us recognised.
Governance, in the sense of control, repeatability and transparency is currently much stronger in software development than data delivery. Software delivery has recently been through the DevOps revolution - but even before DevOps became a buzzword, the best teams had already adopted strong version control, continuous integration, release management and other powerful techniques. During the last two decades, software delivery has moved from a cottage industry driven by individuals to a relatively well-automated process supported by standard tooling.
Data delivery brings many additional challenges. Software delivery, for instance, is done once and then shared between many users of the software; data must be delivered differently for each unique organisation. Data delivery involves the handling of a much greater bulk of data and the co-ordination of parts that are rarely under control of a single team; it’s a superset of software delivery, too, as it involves tracking the delivery of the software components that handle the data and relating their history to the data itself!
Given these challenges, it’s unsurprising that the tools and methodologies which have been in place for years in the software world are still relatively rare and underdeveloped in the data world.
Take version control, for example. Since time immemorial, version control has been ubiquitous in software development. Good version control permits tighter governance, fewer quality issues, greater transparency and thus greater collaboration and re-use.
You expect to be able to re-create the state of your business logic as it was at any point in the past.
You expect to be able to list and attribute all the changes that were made to it since then.
You expect to be able to branch and merge, leaving teams free to change their own branches until the powers that be unify the logic into a release.
On the data side, that's still the exception rather than the rule - but with the growing profile of DataOps, the demand is now there and increasingly the tooling is too. The will to change and reform the way data is delivered also seems to be there - perhaps driven by auditors and regulators who are increasingly interested in what's possible, rather than in what vendors have traditionally got away with in the past. We stand on the brink of great changes in the way data is delivered and it's going to get a fair bit more technical.
What isn't quite visible yet is a well-defined methodology, so as we start to incorporate proper governance and collaboration into our data pipeline, we face a choice of approaches. Here are a few of the considerations around version control, which of course is only a part of the Governance pillar:
» Some vendors are already strong in this area and we have the option of leveraging their offerings - for example, Informatica Powercenter has had version control for some time and many Powercenter deployments are already run in a DevOps-like way.
» Some vendors offer a choice between visual and non-visual approaches - for example with some vendors you can stick to non-visual development and use most of the same techniques you might use in DevOps. If you want to take advantage of visual design features, however, you'll need to solve the problem of version and release control yourself.
» Some enterprises govern each software system in their pipeline separately, using whatever tools are provided by vendors and don't attempt a unified paper trail across the entire pipeline.
» Some enterprises that have a diverse vendor environment take a 'snapshot' approach to version and release control - freezing virtual environments in time to provide a snapshot of the pipeline that can be brought back to life to reproduce key regulatory outputs. This helps ensure compliance, but does little to streamline development and delivery.
It's no small matter to pick an approach when there are multiple vendors, each with varying levels of support for varying forms of governance, in your data estate. Yet the implications of the approach you choose are profound.
DevOps has helped to define a target and to demonstrate the benefits of good tooling and governance. To achieve that target, those who own or manage data pipelines need to consider widely different functions, from ETL to ML, with widely different vendor tooling. Navigating that complex set of functions and forming a policy that takes advantage of your investment in vendors while still covering your pipeline will require skill and experience.
Knowledge of DataOps and related methodologies, knowledge of data storage, distribution, profiling, quality and analytics functions, knowledge of regulatory and business needs and, above all, knowledge of the data itself, will be critical in making DataOps deliver.
By Benjamin Peterson
DataOps comes from a background of Agile, Lean and above all DevOps - so it's no surprise that it embodies a strong focus on governance, automation and collaborative delivery. The formulation of the Kinaesis® DataOps pillars isn't too different from others, although our interpretation reflects a background in financial sector data, rather than just software. However, I believe there's an extra pillar in DataOps that’s missing from the usual set.
Most versions of DataOps focus primarily on the idea of a supply chain, a data pipeline that leads to delivery via functions such as quality, analytics and lifecycle management. That's a good thing. However supply chains exist for a purpose - to support a business vision. The creation and management of that vision and the connecting of that vision to actual data is just as important as the delivery of data itself.
The importance of Design and UX
On the software development side, it's been accepted for a long time that User Experience (UX) is an important and somewhat separate branch of software delivery. Even before the 'digital channels' trend, whole companies focused on designing and building a user friendly experience.
Delivering an experience is different from working to a spec because a close interaction with actual users is required. Whole new approaches to testing and ways of measuring success are needed. UX development includes important methodologies such as iterative refinement - a flavour of Agile which involves delivering a whole solution at a certain level of detail and then drilling down to refine certain aspects of the experience, as necessary. Eventually, UX has become a mature recognised branch of software development.
Delivery of data has much to learn from UX approaches. If users are expected to make decisions based on the data - via dashboards, analytics, ML or free-form discovery - then essentially you are providing a user experience, a visual workflow in which users will interact with the data to explore and to achieve a result. That's far closer to UX development than to a traditional functional requirements document.
Design and DataOps: a match made in heaven
To achieve results that are truly transformative for the business, those principles can be applied to data delivery. 'User journeys' can provide a way to record and express the actual workflow of users across time as they exploit data. Rapid prototyping can be used to evaluate and refine dashboard ideas. Requirements can, and should, be driven from the user's desktop experience, not allowed to flow down from IT. All these artefacts are developed in a way that not only contributes to the vision, but allows pragmatic assessment of the required effort.
Most of all, work should start with a vision expressing how the business should ideally be using information. That vision can be elicited through a design exercise, whose aim is not to specify data flows and metadata (that comes later) but to show how the information in question should be used, how it should be adding value if legacy features and old habits were not in the way. Some would even say this vision does not have to be, strictly speaking, feasible; I'm not sure I'd go that far, but certainly the aim is to show the art of the possible, an aspirational target state against which subsequent changes to data delivery can be measured. Without that vision, DataOps only really optimises what’s already there - it can improve quality but it can't turn data into better information and deeper knowledge.
Sometimes, the remit of DataOps is just to improve quality, to reduce defects or to satisfy auditors and this in itself is often an interesting and substantial challenge. But when the aim is to transform and empower business, to improve decisions, to discover opportunity, we need a Design pillar along with an early investment in developing a vision. That way, our data delivery can truly become an agent of transformation.