Machine data operational intelligence platform specialist Splunk has hosted its .conf 2017 conference and exhibition in Washington DC.
Many of the firm’s partners (Splunkners, perhaps?) also attended.
The Splunk global partner ecosystem is now said to be some 950 partners strong – it is a union of system integrators, distributors, value-added resellers, technology alliance partners, OEMs and managed service providers.
What the partners said
Speaking in relation to this event, CTO of xMatters Abbas Haider Ali told the Computer Weekly Developer Network that his firm works to integrate activity across the tools and people that build and run enterprise applications.
“Gleaning real time intelligence across all of these activities is a critical component of performance and secure applications and Splunk is the tool of choice for our customers to make that happen. The Splunkbase platform makes it easy to build and distribute integrations for the Splunk community and connect them to the full ecosystem of xMatters integrations. As a partner, we benefit from new use cases and applications of Splunkiness to hard operations problems and alerting people of notable events with xMatters,” said Haider Ali.
VP for business & corporate development at Digital Shadows is Alex Seton. A Splunk partner for some time now, Digital Shadows monitors & manages so-called digital risk across a range of data sources to protect a business.
“The app we have developed for Splunk Enterprise customers means they can now use Digital Shadows’ solution to help manage and mitigate their digital risks across the open, deep and dark web alongside Splunk’s real time operational intelligence. This will enable customers to manage their digital risk from cyber threats, data loss, brand exposure, VIP exposure, infrastructure exposure, physical threats and third-party risk, and create an up-to-the minute view of their organization’s digital risk with tailored threat intelligence.”
“Splunk users know that operational intelligence makes outsized demands on file storage infrastructure. These workloads have new requirements for scale that legacy storage appliances are unable to meet. It’s no longer just the storage capacity that matters; it’s also the number of files that can be stored and managed and here’s where legacy storage runs out of gas,” said Ben Gitenstein, senior director, product management at Qumulo.
At Qumulo, Gitenstein says they have taken a different approach for a completely different level of scale. They know that users of large-scale storage need control over (and insight into) file system usage and performance in real time. Gitenstein says he also knows that software developers and engineers need complete programmability of infrastructure.
Operational intelligence isn’t just storage as usual, he said.
V for VictorOps victory
VictorOps develops a full-stack DevOps incident management platform that ingests real-time operational intelligence from Splunk (and other monitoring tools) into a timeline of activity for people watching the systems.
“Whereas Splunk delivers intelligent insights throughout the delivery chain, VictorOps connects those insights to the people who have the expertise to take the right action. As more modern development organizations invest heavily in continuous deployment, microservices and agile practices, they use these rapid cycles to deliver value to customers faster. Splunk is an important partner in collecting data across that delivery chain. VictorOps takes that information from Splunk, disseminates it, and facilitates continuous learning so that teams retro on what went wrong and don’t make the same mistakes twice,” said Joni Klippert, VP of product, VictorOps.
Other comments will follow…
Amido is a technical consultancy specialising in customer identity, search and cloud services.
Gray writes as follows…
Companies are attracted to cloud native applications because it takes them further away from the ‘bare metal’ maintenance associated with traditional infrastructures; for instance installing software packages and managing updates etc.
When companies decide to go cloud native, you can essentially move into a serverless architecture and can choose to go with a managed service provider like Azure App Service or AWS Elastic Beanstalk/ECS. By doing so, you remove the overhead of having multiple teams –a team in-house writing code to create what you the application or solution, and, another team making sure it operates well, and instead have a number of skilled workers who can develop and operate simply by using platform tools.
Additionally, you don’t need to hire as many specialists to implement more complicated elements of data processing – i.e. data pipelines or machine learning algorithms; these are all now drag and drop interfaces so developers can focus on the insights the data brings, rather than the infrastructure and algorithm build.
What changes in cloud native?
So what elements of programming architecture change when you go cloud native?
It can be difficult to remain vendor agnostic when you embrace some cloud native platforms because they work in very different ways and do not offer feature parity. It is, therefore, worth consulting with a technology specialist firm who are not tied into specific architectures as they’re more likely have DevOps teams that can help design a system that can be deployed over multiple clouds whilst also removing the layer associated with legacy architecture designed for monolithic applications.
When you find a trusted partner/consultant, the process to move into the cloud becomes exciting. You realise that you can introduce microservices and containers to help build services and minimise disruption when upgrades are needed, or if there is an issue.
Design for failure
An important change you need to consider when going cloud native is designing for failure.
Cloud native PaaS components usually have lower individual SLAs (typically 99-99.9% as opposed to 99.99%) which is a large increase in potential downtime. Designing for failure means being able to recover elegantly if an individual component is not responsive, or, returns an error.
Alternatively, it can also mean being more creative when designing a system which needs to have a high uptime. For example, we recently built an application which is designed for 99.99% uptime and built on some partial components which offer 99.9%. The trick is to understand the failure scenarios and design for these to ensure that the system stays responding when there is an outage of a sub component.
Microservice architecture allows businesses to manage parts of larger projects individually, avoiding blanketed updates or uploads, which can result in system delays or down-time. Their purpose is to increase flexibility throughout the business offering, allowing an enterprise to be more competitive and to become increasingly agile, as they can adapt parts of the business in isolation. This way of operating is increasingly important for today’s High Street retailers, for instance, as they look to compete with, and adapt to, omnichannel operating models.
The path to adopting a microservices architecture does not require wholesale digital transformation as it doesn’t have to be an all or nothing proposition. It is entirely possible to simply dip their toes in the microservices world without starting from scratch. This is likely to be music to businesses’ ears as they look to keep up with, and exceed, their customer’s growing expectations and demands when it comes to online capabilities.
With containers, it gives you more control over the infrastructure you are deploying. This is because you are not creating a Virtual Machine for every instance of an application, meaning deployments are rapid and the overhead of the operating system is significantly lower. This combination is powerful as it means that we can issue an upgrade/change that will take effect almost immediately, without disruption to the general use of the portal.
However, despite the advantages of containers, they are not a magic fix for all. Not every organisation can benefit from its use as some applications are not suitable for containerised deployment, and so the decision to containerise software must be considered carefully.
Our monolithic past
Monolithic applications, favoured by traditional enterprises, are not as well suited to containerisation owing to the considerably different tooling required by Microservices. Containers are far better suited for a microservices environment where large projects can be broken down into a set of manageable, independent and loosely-coupled services.
For some legacy or monolithic solutions, the decision to containerise software needs to be considered carefully. Containers are valuable when monolithic applications can be split into smaller components which can be distributed across a containerised infrastructure; but this is not to say that any application will work, just that care needs to be taken to see if it is suitable for containerised deployment.
Amido is an independent technical consultancy that specialises in implementing cloud-first solutions. We help our clients build resilience at scale, flexibility for the future and differentiation of customer experience. And, we do this while minimising business-risk and build-cost.
Machine data operational intelligence platform company Splunk has staged its .conf 2017 conference and exhibition in Washington DC.
As part of its market-facing moves for this show, the company has put forward more flexible pricing programmes for application development professionals and data engineers looking to tailor machine data analysis techniques into the software they create.
New componentised IT models
Reflecting what might be seen as the new (perhaps even funky?) model of IT consumption in the as-as-Service cloud-based world of componentised composable applications, Splunk is aiming to provide a more tailored approach to pricing its services so that its core platform can be engineered into modern application structures in a more natively flexible way.
“Splunk is committed to offering value-based pricing tailored to the unique needs of our customers, whether they are just getting started or an existing customer expanding their use of Splunk,” said Doug Merritt, President and CEO, Splunk.
Different data ‘lenses’
Merritt further explains that the Splunk platform enables what he calls ‘different lenses’ for viewing the same data. A suggestion that perhaps explains why different organisations have different needs for different use cases.
The Splunk Machine Learning Toolkit is used by firms such as Recursion Pharmaceuticals to equip its operations team with tools to comb through metrics for a view into operations and for wrangling large quantities of data to understand real-time correlations as they are happening.
Other related products here include Splunk Insights for AWS Cloud Monitoring, a piece of software for providing insights, from the Splunk platform, for monitoring cloud computing instances, from AWS – unsurprisingly.
Dev/Test, for developers
[Splunk provides] free personalized Dev/Test licenses for customers interested in testing new use cases without consuming their existing production license capacity.
The company confirms that individual users at any organisation with a paid Splunk Enterprise license or Splunk Cloud subscription can request a personalised Dev/Test license to experiment with Splunk at no additional cost.
New users can also receive a one-year license for indexing volumes of up to 20 GB of data per day — Splunk Enterprise Free (for Docker) unifies insights across container environments and the entire technology stack.
This is a guest post for the Computer Weekly Developer Network written by Patrick McFadin in his role as role of VP of developer relations at DataStax,.
DataStax Enterprise is an always-on data platform powered by a distribution of Apache Cassandra, an open source distributed database management system designed to handle large amounts of data across many commodity servers.
McFadin writes as follows:
If you are building an application today, you’re probably thinking about scale. Over time, web-based applications have gone from replicating traditional software designs to building their own best practices.
From Salesforce designing Software-as-a-Service as a delivery model in 1999, through the launch of services like Amazon Web Services and the increase of mobile-first applications on iPhone and Android phones, the role for cloud in application design has grown over time.
The data deal is a big deal
But what makes a cloud app (a native one) really different to those traditional applications is how they deal with data.
From centralised data stored in one place and served out to everyone that asked for it, cloud applications now demand that data gets spread across multiple locations.
When you have users around the world, hosting all your application data in one place is problematic.
A request can take an inordinate amount of time when everything is being held on the other side of the world. Even the speed of light isn’t fast enough. Shifting data closer to users can help reduce this, as well as helping to prevent data loss or outages.
What is cloud scale?
Working at cloud scale means dealing with hundreds of thousands or millions of users, all creating data all the time. Storing data in one place on a relational database can be difficult when it involves sharding data into multiple locations, all of which are filling up rapidly.
So a new approach around data is needed to help applications running in the cloud work. What elements should we be looking at? From a cloud application perspective, how data can be distributed consistently should be the first point of interest. Can data be stored in multiple locations, with copies of each record across those locations? Without this, it will be difficult to scale out successfully.
Linked into this distribution of data, applications should be ‘always on’ – that is, they should run all the time and avoid downtime.
Spreading data across multiple sites should provide protection against failure, but it should also mean that services can run during updates or patching. For teams looking at how cloud can make applications run more efficiently, this ability to keep running through updates should be a huge bonus.
Alongside running the data storage side more efficiently, cloud applications should provide better service to users. Helping apps to run in real time – so that decisions are made while customers are using the app, rather than after they have carried out some interactions – is a critical area to look at. When customers can make decisions using better data, they get a better experience.
However, this is not just about making a recommendation for a product after the fact; you have to think about how that interaction can take place while someone is making their choice. Using data in real time like this affects a lot of other business decisions, not just software development ones.
Similarly, providing a better user experience through contextual use of data should also be considered. For example, a retailer can provide special offers based on a shopping list comparison to previous purchase behaviour over time while someone is in the store using their app. Providing a user experience that changes based on customer preferences improves utilisation.
Lastly, scalability should be a given for cloud applications. The mention of a new application on social media can lead to huge increases in downloads compared to what was expected.
In cases like these, growing the back-end infrastructure can be difficult without the right choices in advance. Scaling up should not incur huge capital costs, so looking at the right software infrastructure is essential.
For cloud applications, managing data is still one of the biggest problems to solve. Without some thought, cloud applications can provide poor user experience when services have to scale. Distributed data architectures can help, but they have their own nuances to discover and bear in mind too.
Follow Patrick McFadin @PatrickMcFadin on Twitter.
Machine data operational intelligence platform specialist Splunk has gone all Alan Partridge.
The company known for its down-at-the-data-layer technologies and its tongue-in-cheek t-shirts now says it Partner+ Program initiative is delivering “aha” moments from machine data.
The Splunk global partner ecosystem is now said to be some 950 partners strong – it is a union of system integrators, distributors, value-added resellers, technology alliance partners, OEMs and managed service providers.
“Splunk is heavily investing in our Partner+ Program with more global talent, new programs and resources to support and enable our expanding global partner ecosystem,” said Susan St. Ledger, chief revenue officer, Splunk.
Splunk for developers
Part of the work of the partner division, although admittedly stemming from a channel level perspective, is dedicated to tools and training that could benefit software application developers using machine data streams inside their data and application workflows.
This is supposed to be a framework for partners that integrates, implements and configures Splunk products to obtain training and certifications – it also includes a set of best practices.
The Partner+ Technology Alliance Program (TAP) is intended to provide engagement models for partners building complementary technical solutions on top of the Splunk platform for joint customers across IT and security use cases.
What the partners said
DNA Connect, an Australian distributor of infrastructure, security and visibility software, partnered with Splunk in August 2008.
“Our customers needed a platform that would analyse and visualise machine data from all levels of the IT stack, and legacy solutions fell flat,” said Munsoor Khan, director, DNA Connect.
ECS provides a full range of IT services for enterprise clients with headquarters in Edinburgh and offices across the UK. The firm’s MD David Calder offers the final comment here…
“Our relationship with Splunk is set for continued growth as we deliver analytics-driven security to more customers. The Splunk platform enables world-class security for our clients with solutions ranging from us running their security operations centers to handling their Splunk enterprise security deployments.”
Splunk .conf 2017: knowing me machine data, knowing you – aha!
Well, someone had to use that headline, so now it’s done…. move along now please.
Microsoft is working to partner with everybody and be open to everything… as we know, Microsoft loves Linux, well, now it does.
In this regard then, this month we see cloud data management company Veritas Technologies announce new 360 data management capabilities for Veritas and Microsoft Azure customers.
The two firms are seeking to bring the functions that Vertitas does best to Azure users/customers, that is – Vertitas is known for its data management capabilities designed to help with compliance and increase data visibility and simplify workload migration.
Disparate data views
This announcement includes new integrations for business continuity and disaster-recovery readiness. The functions also work to enable hybrid cloud scale-out storage optimization and the ability to visualise data across disparate sources.
“The desire to incorporate Infrastructure as a Service (IaaS) into the enterprise IT strategy is abundantly clear,” said Mike Palmer, executive vice president and chief product officer, Veritas. “However, data management is critical to helping customers ensure compliance and protection of their data while extracting maximum value from it. That’s why this partnership has been so well-received by customers.”
The Veritas Resiliency Platform (VRP) works to monitor and help failover/failback multi-tiered applications to and from Azure. The concept here is the chance to use the cloud as a recovery target.
What is storage optimisation?
What does software-defined storage optimisation means in real terms? It is the chance to automatically migrate old data to the cloud for long-term storage, or mix on-premises and Azure cloud storage to create a single scale-out file system for storage management.
Veritas says that customers can also take advantage of a host of other previously announced Veritas data management offerings to enhance their experience on Microsoft Azure.
“With Veritas NetBackup, the company’s flagship unified data protection solution for the enterprise, customers can migrate and protect data stored on the Azure platform. As part of Veritas NetBackup 8.1, customers can use deduplication technology to help reduce backup times and lower long-term data storage costs,” said the company, in a press statement.
The notion of a ‘solid cloud’
If we can bring storage optimisation functions and use the cloud as a backup target, then would it be fair to suggest that the Veritas + Microsoft union in the vein that it exists might represent a more solid heavy (metallic or galvanised, even) type of cloud.
It could be a term that might stick as the cloud model now finesses and develops in so many ways.
Cloud-centric software-defined information management company Veritas Technologies reminds is that 2.5 quintillion bytes of data a day are currently being produced today in 2017.
The company is now strongly focused on digital compliance and data visibility in relation to the tools it is currently working to refine and develop.
“Developers don’t want to think about storage,” asserts Mike Palmer, executive VP and chief product officer at Veritas. We need to provide the coding community with a means of knowing that the storage power is there and a means of indexing the information held within.
Storage not snorage
So this is data storage yes, but Veritas goes rather further than crusty old tapes and disks… this is cloud-controlled software-defined data management that embraces a notion of not just information technology (IT), but also information (on its own, as an entity) and technology (as platforms, tools and functions that look after our information).
Veritas CEO Bill Coleman spoke at the firm’s 2017 ‘Vision’ conference and exhibition to explain how his firm has grown now as a ‘de-merged’ distinct entity outside of Symantec.
NOTE: Veritas spent 10 years between 2004 and 2014 as a part of Symantec.
Coleman spoke of cloud native intelligent analytics and how his firm is providing a Software Development Kit (SDK) to be able to code to what is an information management platform serving cloud centric applications.
Putting secondary data first
Veritas also welcome product VP Palmer to the stage for a keynote session. Speaking of why backup technologies are so important, Palmer noted that Uber, Lyft and others have created the so-called sharing economy… and that this has led to the creation of the ‘backup estate’ which is essentially the bulk of data that could be used for competitive advantage (inside new contemporary cloud centric applications that developers create today) … but it often isn’t, as it simply sits wasting time and money as so-called ‘secondary data’.
A new notion of secondary data is needed here…
What is secondary data?
Secondary data is often defined as research data that has previously been gathered (and can be accessed by researchers) whereas so primary data is data collected directly from its source… but in the sense of information technology we extend the definition to also now suggest that secondary data does come from its source but it is unused user data, extraneous data lake data, unstructured data, peripheral IoT and edge computing data. So essentially it is all forms of additional data that is not being driven into live production systems for competitive advantage inside the business.
“Secondary data is the most under-utilised asset in your business,” asserts Veritas’ Palmer.
As this secondary data sits in legacy databases, proprietary data storage systems and data stores from the previous (less cloud centric) world of IT, Veritas starts to build its argument for its own product set.
“Backup used to be a consolidated platform back in the day, but architectures have changed and data workloads have diversified. While specialisation became a goal as different users wanted to run different workloads what we actually got was diversification and this was not a good as it leads to non-compliance and fragmentation,” said Palmer.
Has storage become sexy yet? Well, it may never quite become sexy… but the new way storage is pushing all data (and all unused data) further upwards in importance in contemporary IT systems and the modern software application development stack is real.
Veritas used this year’s show to announce new developments to the Veritas 360 data management portfolio spanning Veritas NetBackup—the company’s flagship offering, Veritas Information Map and Veritas Appliances.
“We are living in the age of multi-cloud, where organisations require a policy-driven data management strategy to protect valuable data assets across multiple datacentres, public and hybrid clouds,” said Palmer. “Today, with more than 20 new connectors in Information Map and advancements to NetBackup 8.1, organisations can now protect more cloud-based workloads, reduce storage costs in multi-cloud environments and gain increased visibility of data that historically has been hard to identify—all critical components of a successful multi-cloud strategy.”
With 2.5 quintillion bytes a day today and that figure set to rise, we had better worry about more intelligent storage and look to ways to manage this challenge with analytics and Machine Learning (ML)… and this is precisely where Veritas seeks to now develop its technology stack and so validate its customer facing proposition.
Information management company Veritas Technologies is now self-styling itself as a multi-cloud data management and control specialist.
Pretty soon the firm’s marketing department might be trying to re-position the corporate tagline as the ‘smart technologies multi cloud information management’ business.
Now in the throws of staging its Vision 2017 conference in Las Vegas, the firm has announced Veritas Cloud Storage as a new software-defined storage function designed for massive amounts of unstructured data – and the secret is in the smartness inside.
What are smart tool layers?
So what do we mean by smartness in this instance? It is of course data analytics, machine learning and (especially important in the field of data management) the use of classification technologies.
Essentially these are all software-defined storage technologies designed to optimise machine learning intelligence to extract more value from data by making it proactive, predictive and actionable.
To complement this launch, Veritas also announced the Veritas Access Appliance, which is a software-defined storage appliance.
The birth of massive data
According to Veritas 2017 Data Genomics Survey – which analyzed more than 31 billion anonymised files globally – enterprise data repositories have grown by nearly 50 percent (48.7 percent) annually, largely driven by the proliferation of new apps and emerging technologies such as Artificial Intelligence (AI) and the Internet of Things (IoT) that use massive data sets.
“With the unprecedented growth of data driving a new wave of storage demands, it is imperative that enterprises deploy a software-defined storage strategy that is optimized for cost, performance and agility,” said David Noy, vice president of product management, Veritas. “Customers also need to deploy software-defined storage solutions that turn dormant data into intelligent insights, helping businesses offer better customer experiences while delivering strong business outcomes.”
Storage got smart, almost sexy
Noy talks of a new world of intelligent data management and says this is a space where enterprises can make storage ‘smart’ by applying analytics, machine learning and classification technologies, offering a new level of intelligence and management to large quantities of unstructured stored data.
Building on the Veritas 360 Data Management platform, Veritas Cloud Storage claims to be able to scale to petabytes, storing and managing billions of files with the ability to handle a quintillion number of objects.
According to a product statement, “Veritas is focused on helping get people (i.e. customers) out of legacy and proprietary storage hardware.”
The appliance enables enterprises to embrace cloud adoption with the ability to build their own private cloud or provision cloud storage platforms as a low-cost storage tier to meet performance requirements, across a range of leading cloud service providers.
Storage may still be one step off of sexy… but it is definitely getting smarter.
It’s 2017 – and that means we’re now starting to build software application development architectures, algorithms and applications for the cloud computing model of service-based application and data storage/analytics… and we’re starting to do it natively.
In what will now comprise a series of pieces for the Computer Weekly Developer Network, we zone in on commentary relating to the cloud native world and feature a number of guest pieces focused on analysing key issues in this space.
Bow to your sensei
First up we turn to Sumo Logic, a firm known for its cloud-native machine data analytics platform designed for what has been called ‘continuous intelligence’ in the continuous always-on world of cloud.
The company’s ‘State of Modern Applications in the Cloud’ report is based on anonymised data from more than 1,500 customers using Sumo Logic’s own machine data analytics platform.
“Today’s enterprises are striving to [build] services built on ‘modern architectures’ i.e. an application stack with new tiers, technologies and microservices — typically running on cloud platforms like AWS, Azure and Google Cloud Platform,” said Kalyan Ramanathan, vice president of product marketing for Sumo Logic. “Sumo Logic is known for our work building and operating massive multi-tenant, highly distributed cloud systems, [we are] the industry’s first machine data analytics platform to natively ingest, index and analyse structured and unstructured data together in real-time.”
Key findings of the report
Linux OS is a legitimate option across all cloud platforms.
- Linux is the dominant operating system running on AWS.
- Linux is also growing dramatically in Azure from four percent (2016) to 12 percent (2017).
Containers and functions growth is unprecedented.
- AWS Docker adoption has grown from 18 percent (2016) to 24 percent (2017).
- AWS Lambda adoption has almost doubled from 12 percent (2016) to 23 percent (2017).
Legacy vendors are struggling to find relevance in the modern app world.
- MySQL is the number one database running in AWS and along with Redis and MongoDB, accounts for 40 percent of database adoption.
- Microsoft SQL and Oracle DB significantly lag in terms of usage in AWS and are only adopted by a combined six percent of customers.
- Nginx and Apache are the leading web servers in AWS.
Cloud security paradox
Organisations are uncovering a cloud security paradox.
- Security remains a top concern for enterprises moving to the cloud as their legacy on premise security/SIEM tools are insufficient.
- Unfortunately only 50 percent of enterprises are leveraging CloudTrail, the primary security audit for AWS.
- Enterprises of all sizes must leverage security, networking and audit services from their native cloud providers.
This is a guest post for the Computer Weekly Developer Network by Barry Devlin in his capacity as founder and principal of 9sight Consulting.
Dr Barry is a regular blogger, writer and commentator on information and its use – he is based in Cape Town, South Africa and operates worldwide.
Devlin writes as follows:
If you’re planning to create a data warehouse, make sure you create one that is cross-functional and provides a long-life foundation for data provision and decision support. This means covering the entire enterprise and satisfying the needs of multiple projects and groups for several years. The foundation must provide consistent, reconciled, legally binding data to your business clients.
Easier said than done. Right?
Think of your project in these four steps: Design, Build, Implement and Maintain.
Designing your data warehouse
Let’s start at the design phase. When planning your design, the vision for your new data warehouse is best laid out over an enterprise data model (EDM), which consists of high-level entities including customers, products and orders.
A traditional design approach involves mapping entities to “loosely normalized” tables based on third normal form (3NF) or based on a dimensional or star-schema model.
Another approach uses the Data Vault Model (DVM), which is a hybrid of the 3NF and star-schema forms. First introduced by Dan Linstedt, the Data Vault is a detail-oriented, history-tracking, linked set of normalized tables designed to support multiple functional business areas.
The DVM consists of three specialized types of entities/tables: hubs based on rarely changed business keys, links that describe associations or transactions between business keys, and satellites that hold all temporal and descriptive attributes of business keys and their associations. A new version introduced in 2013 consists of a data model, methodology, and systems architecture, which provides a design basis for data warehouses to emphasize core data quality, consistency, and agility to support enterprise-wide data provision requirements.
In May 2017, data warehouse automation specialist, WhereScape announced automation software to enable rapid and agile Data Vault 2.0 development, cutting delivery time of Data Vault-based analytics solutions by two-thirds.
Get busy building
Once you set your design, now comes the hard work of building your data warehouse. But before you start, accept the fact that no matter how nicely you’ve designed your model, you will face the reality of imperfect data source systems. Data warehouse builders struggle with missing data in source systems, poorly defined data structures, incorrect content and missing relationships. Implementation is a delicate balancing act between the vision of the model and the constraints of the sources.
The building process comes down to five steps:
- Understand the data sources. Keep in mind that legacy systems might be “bent to fit” emerging and urgent requirements. And modern big data sources might lack documentation.
- Compare the data available to the data warehouse model and define appropriate transformations to convert the former to the latter.
- Where transformations are too difficult, modify the data warehouse model to accommodate the reality of the data sources. Changing the data sources is usually impossible for reasons of cost and politics.
- Test performance of load/update processes and check the ability of the modified model to deliver the data the business requires.
- If successful, declare victory. Otherwise, rinse and repeat.
Improved approaches to automating the process have emerged in stages over the history of data warehousing: extract, transform, load (ETL) tools, data integration systems and, finally, data warehouse automation (DWA). In essence, each stage on this journey depicts an increasing level of automation, using DWA to address the entire process of designing, building, operating and maintaining a data warehouse.
Companies such as WhereScape offer useful tools to automate the data source discovery, design and prototyping phases of projects. Additionally, advanced automation solutions with an integrated development environment (IDE) targeted to your data platform can eliminate the majority of traditional hand-coding required and dramatically streamline and accelerate the development, deployment, and operation of data infrastructure projects.
A DWA tool automates the transformation of the data structures of the various sources to the optimized model of the Data Vault and populates the target tables with the appropriate data. This approach creates necessary indexes and cleanses and combines source data to create the basis for the analysis to address the business need.
Shifting to operations
The clear aim here is to automate and speed deployment in an agile environment to reduce human error across the full lifecycle.
Having deployed the system to production, the next—and ongoing—task is to schedule, execute, and monitor the continuing process of loading and transforming data into the data warehouse. In this phase, jobs consist of a sequence of interdependent tasks. To ensure that data consistency is maintained, if a task fails during execution, then all
downstream dependent tasks are halted. When the problem has been resolved, the job is restarted and will pick up from where it left off and continue through to completion. From an operational point of view, given potential interdependencies of data across these systems, it makes sense to manage this ensemble as a single, logical environment.
The smooth, ongoing daily operation of the entire data warehouse environment is a fundamental prerequisite to its acceptance by users and its overall value to the business.
Maintaining with agility
In more traditional IT projects, when a successful system is tested, deployed and running daily, its developers can sit back and take a well-deserved rest. Developers of today’s data warehouses have no such luxury. To make life easier, leverage and apply agility whenever possible.
Now, ongoing digitalization of business is driving ever-higher demands for new and fresh data. Some people think a data lake filled with every conceivable sort of raw, loosely managed data will address these needs. That approach may work for non-critical, externally sourced social media and Internet of Things data. However, it doesn’t help with historical and real-time data.
Fortunately, the agile and automated characteristics of the Data Vault / DWA approach applies also to the maintenance phase. In fact, it may be argued that these characteristics are even more important in this phase.
Automation = agility
At this point, widespread automation is essential for agility because it increases developer productivity, reduces cycle times, and eliminates many types of coding errors. Another key factor in ensuring agility in the maintenance phase is the ongoing and committed involvement of business people. An automated, template approach to the entire design, build and deployment process allows business users to be involved continuously and intimately during every stage of development and maintenance of the data warehouse and marts.
With maintenance, we come to the end of our journey through the land of automating warehouses, marts, lakes, and vaults of data. At each step of the way, combining the use of the Data Vault approach with DWA tools simplifies technical procedures and eases the business path to data-driven decision-making.