Data Matters

December 20, 2019  2:00 PM

Using data for good, not evil

Brian McKenna Profile: Brian McKenna

This is a guest blog post by Cindi Howson, Chief Data Strategy Officer, ThoughtSpot.

Nothing illustrates the double-edged nature of data use more vividly than the recent general election in the UK. On the ‘data for good’ edge, pro-democracy organisations successfully reached youth, disabled people, and other disenfranchised groups to raise voter turnout. The Electoral Reform Society logged 3.1 million new voter registrations since the last general election in 2017, 67% of which were by people aged 34 and under.

But there’s way too much for my liking on the ‘data for bad’ edge. Here we witness organisations abusing data to blast misinformation through unregulated social media channels in order to swing public opinion. This reckless, cynical activity is eroding democracy, dividing societies and undermining ‘data for good’ initiatives.

I can’t remember the last time I’ve felt so equally inspired and terrified in my 25-year career in data and analytics. Though I get discouraged, ultimately I believe the forces of good will prevail. As more scams are exposed, people get more data savvy and literate. Legal and regulatory frameworks will adapt, but more importantly, existing laws will become more effectively enforced. This ‘Wild West’ period can’t go on forever.

In the meantime, I challenge all CEOs, CDOs, and analytics leaders to work on ‘Data for Good’ initiatives and use data responsibly for commercial gain. While this may seem easy in theory, I often get questions on how to put it into practice. Here are five steps to get started:

  1. Identify a cause that matters to you personally. Then make it professional. For example, ThoughSpot’s co-founder and chairman Ajeet Singh, for family reasons, felt a personal connection to the cause of improving cancer outcomes. I personally care about data to improve education and eliminate homelessness, seeds of empathy sown during a difficult upbringing. Whatever you do, don’t embark on data for good for marketing purposes. That’s the first misstep on a slippery slope to data for evil.
  2. Investigate how data and analytics can drive the cause forward. Research projects and organisations in your interest area and how they can be served by data. In the UK, groups like Data Orchard and DataKind have local chapters to promote the use of data for social good, and the Bloomberg Data for Good Exchange just held its first summit in London.
  3. Establish what’s needed most: it could be data, money, software or expertise. Many organisations urgently need access to data held in the private sector. That’s why some banks and telecommunications companies, for example, are now sharing anonymised data to support smart city projects.
  4. Team up with colleagues and other like-minded companies in your area of interest. Being part of a community can be motivational. For example, technology providers, food producers and retailers have joined forces to support FareShare, which fights hunger and food waste in the UK.
  5. Set a vision and plan for sustainable action. One-off actions provide some benefit. However the longer you invest in a cause over time, the more you learn about its unique challenges so you can make a lasting difference.

Data for good programmes can have a profound impact on society and their own company’s culture. It can be hard not to get thrown off course by bad news stories.

Fortunately, there at least as many good, albeit less prominent, examples out there. I was heartened, for example, when investment firm BlackRock’s CEO urged fellow CEOs to consider their contributions to society, not just short-term profit gains. I encourage anyone working in data to do the same and make a plan to help shape society for the better.

December 20, 2019  12:05 PM

The next big thing in analytics: understanding cause and effect in user behaviour

Brian McKenna Profile: Brian McKenna

This is a guest blogpost by Adam Kinney: Head of Machine Learning and Automated Insights at Mixpanel

When it comes to data, machine learning (ML) is one of the hottest industry trends. But while ML is typically associated with process automation and IoT, there is a lot more that businesses can do with these insights. In fact, a growing number of organisations are using ML data to understand how new marketing campaigns and product features impact user behaviour.

It’s no doubt that machine learning has become more commonplace to help break through confusing, or even conflicting, observational data and give insights that can drive meaningful business impact. However, the problem is that user behaviour is very complex and does not necessarily follow pre-agreed rules. A machine can quickly evaluate the data but to successfully interpret this data and draw the right conclusions, businesses need advanced algorithms. This is why a lot of brands are starting to look at causal inference as a way to better understand user behaviour. Causal inference is a new trend within machine learning used to help marketers and business decision makers better understand the relationship between causes and impacts so they can make better decisions.

For instance, typically, people who frequently write product reviews buy more online than people who do not write reviews. If this is true and there is a causal relationship, it would make sense to encourage more reviews to increase revenues. But people who leave online reviews could also be a group of users who are more engaged with the brand than other users. This would explain why there is a correlation between willingness to buy and writing reviews, but not causal relationship. If that’s the case, then a marketing strategy encouraging customer loyalty will be more effective in driving customer engagement and sales than encouraging more users to leave reviews. Causal inference does exactly this – it assesses your current processes and allows you to zero in on the most important areas, so you focus your efforts in the right place.

It can also be used to help understand whether new product features or services are impacting user behaviour in the desired way. This could be done by controlling certain variables and analysing how they impact user behaviour. Of course, you can get answers like this from A/B tests as well, but A/B tests themselves take time and engineering work to run.

Causal inference allows businesses to simplify this process. The idea is to use statistical methods to create a model that predicts the most likely explanations for particular user behaviour. This simplifies decision-making processes and also allows companies to better allocate their data analysis resources.

Causal inference is only just beginning to move outside the world of academics into the business world, but I believe it will be the next big thing in Machine Learning and data analytics. Not only can it improve the understanding of how new product features and marketing initiatives impact user behaviour but, most importantly, it can enable businesses to innovate faster and stay ahead of the competition.

November 27, 2019  11:35 AM

Less data, slower? Not with the right integration

Brian McKenna Profile: Brian McKenna

This is a guest blogpost by Derek Thompson, VP EMEA, Boomi, which is a Dell Technologies business unit.

If there’s one thing all organisations today can agree on, it’s that no one is asking for less data slower, and the need to satisfy rising consumer demand is a constant source of pressure. What can businesses do to maintain customer satisfaction in such an environment and still find the space to grow and meet these demands – without a blank chequebook?

In particular, the problem for large, established enterprises is that their inevitable complexity limits business strategy options. Brittle architecture in IT landscapes can lead to high maintenance costs, greater risk of failure, and limited future use. Other organisations should not be complacent, however. Businesses of all shapes and sizes are wrestling with this burden, especially in the age of hybrid multi-cloud capabilities.

Data Management

Data management continues to be at the heart of overcoming this “pace of change” challenge. With great data and actionable insights, businesses are able to innovate and stay ahead of the competition. Without it, they are stymied.

The startup community is at an advantage in this regard because startup infrastructures are relatively new, and they are not burdened with the cost of maintaining legacy architecture. This allows them to innovate and deploy the latest as-a-Service cloud technologies and analytics technologies with very few constraints.

One restriction they do have, however, is the lack of access to the kind of capital available to established enterprises. Bigger organisations can unlock and repurpose their funds and apply these to innovation. As a result, they can afford to maintain their leadership position in the market with relatively little risk.

Hybrid Cloud

For businesses who want to acquire new customers, streamline their supply chain and enhance their existing portfolio of products and services, innovation is key. Entering new markets either happens organically or via M&A, and the speed at which change is delivered ultimately comes down to having the right technology at the right time.

Whether a business’ architecture is on-premise, private cloud or public cloud, it does not matter, as long as the technology is up-to-date and the infrastructure is as streamlined as possible. In fact, modern approaches to business efficiency seem to indicate that a combination of all three is the preferred approach, and this is the core tenet of hybrid cloud.

Certainly for larger enterprises, committing their entire infrastructure to just cloud or on-prem is impossible. Hybrid cloud environments are becoming more and more important, and if you deploy the right technologies to integrate everything correctly it doesn’t have to be complicated.


In the area of integration, the key to success is to ultimately move businesses away from complex, developer-heavy, high-cost, on-premise solutions whilst recognising that this process cannot happen overnight. The journey to low code, agile, easy extensible and low-maintenance SaaS platforms, in particular, makes this transition easy and does not require a radical, costly overhaul which dispenses with ingrained legacy architecture all at once.

The second step of the journey, of course, is making sure that whatever integration platform used needs to be embedded with intelligence so that the journey to full automation is as painless as possible. Machine Learning and insights play a key role in this. If you have the right SaaS integration platform, you know where all of your data is going. The insights you can gain from this can be extremely powerful.

Innovation and Regulation

Put simply, regulation comes down to data. Regulators will not be concerned about the infrastructure you have in place or which applications you are using as long as your data can be located and is not vulnerable to attacks and breaches.

Whether merging disparate IT environments, looking at what is possible for your future business or going through a modernisation program, digital transformation can only succeed and be viable if organisations make sure that the right technologies are selected for the right applications.

“Where is the data, is it secure, can I trust it?” are the key questions at the heart of each process. When selecting new hybrid technologies, decision-makers need to ask themselves: is this going to satisfy the needs of regulation, not just for today but for the future?

In conclusion

When we talk in the industry about technical debt, it doesn’t necessarily equate to old or legacy technology. For larger, established enterprises, in particular, what it refers to is having to decide between delivering tactical, medium-term business objectives or building a strategic, agile business platform which will perform well in the future. Now, it is possible to have both and unlock the technical debt in your organisation, which in turn means diverting previously ring-fenced funding into innovation and growth. Ultimately, this is key for businesses to gain a competitive advantage.

November 11, 2019  10:44 AM

How to get top-level buy-in for “zero-click” intelligence

Brian McKenna Profile: Brian McKenna

In this guest blog post, Rob Davis, vice president of product management at MicroStrategy, discusses how CIOs can spread the use of data analytics to reach 100% of the enterprise.

Enterprise mobility continues to be a key focus for CIOs. According to research from Oxford Economics, 80% of IT execs, CEOs and other senior managers say workers cannot do their jobs effectively without a mobile device. Mobile working means so much more than simply email or network access; the field of data analytics on-the-go is also on the rise.

Staff across the entire enterprise could potentially access data analytics and make a valuable contribution to the business, but often they’re not even given the option. CIOs have a real opportunity to ‘democratise data’ throughout the organisation with mobile technology, and while data analytics skills have often been a barrier, new developments such as our zero-click intelligence could make the difference.

What is zero-click intelligence?

Typically, data analysis will occur over a dashboard and involve a sequence of clicks to access valuable information, but ‘zero-click’ changes all that, presenting instant insights to users wherever they are.

Zero-click intelligence, in our view, fits seamlessly into users’ existing workflow, automatically scanning every webpage they visit and underlining relevant keywords. Business users can simply hover highlighted keywords to surface content-rich, friction-less insights on websites and other applications, like email, Salesforce or Office 365.

This means that, with minimal training, employees across the board – even those who are not data-literate – can access valuable data on a mobile device to help them make an informed decision. That could be on the shop floor, the stockroom, the contact centre or essentially in any work environment.

How to get senior buy-in

Senior support – cultural, structural and financial – is critical to the success of rolling out data analytics throughout the enterprise. Here are the steps CIOs can take to secure the support of senior decision makers, many of whom will not have a data background themselves.

Build the business case: Real-time analytics with zero-click intelligence, in our view, empowers mobile workforces to access critical data on-the-go, which enables them to make critical decisions there and then. This completely transforms the way some people work: for example, construction professionals can use Augmented Reality (AR) on-site to superimpose data against imagery of their site and make decisions there and then. Likewise, retailers can check for stock from a mobile device to answer customer queries instantly on the shop floor, and healthcare professionals can gain critical patient data at the bedside.

Develop advocates: For a new way of working to take off, it needs its champions; people within the business who can be exponents for the technology, demonstrate the value to their colleagues and potentially train them.

Test and Learn: Only by using zero-click intelligence and opening it up across the organisation will you understand the full potential of where it could be deployed across the business

Pivot and improve: As with any IT project, you will have learnings that you can take away and improve on with zero-clicks analytics, but the possibilities are endless

People are naturally resistant to change but given the practical, financial and cultural benefits of democratising data within the enterprise, zero-click intelligence could be the easiest way to open conversations about – and access to – data analytics across the business.

October 29, 2019  4:34 PM

Will context fuel the next AI revolution?

Brian McKenna Profile: Brian McKenna

This is a guest blogpost by Neo4j’s Amy Hodler. Imagine the possibilities when AI can handle ambiguity, she says.

Business and governments are turning to Artificial Intelligence (AI) to automate and improve their decision-making and uncover multiple opportunities. The problem is that AI has been effective in powerful, but narrow, contexts, on applications where it can do one thing extremely well. But AI systems don’t readily flex to new situations at the moment and certainly don’t offer a nuanced understanding of complexity.

An increasingly promising approach for teaching AI systems to be more intelligent is by extending their power with graph technology. Why? Because graphs help us better understand and work with complexity, as it’s a technology uniquely suited to managing connections.`

Lack of context equals poorer understanding

Context is the information that frames something to give it meaning. We humans deal with ambiguity by using context to figure out what’s important in a situation, then extend that learning to understanding new situations. Consider autonomous cars – teaching them how to drive in rainy conditions is difficult because there is so much variability. If the autonomous vehicle’s AI needs to see every possible combination of light and weather conditions, it’s a huge challenge to train it for all possible situations. If the AI is supplied with connected, contextual information (rain and night plus night and temperature plus temperature and bridge, etc.), however, it is possible to combine information from multiple contexts and infer the next action to take.

Graph software’s ability to uncover context is being used to make AI and ML (Machine Learning) applications more robust. This means outcomes that are far superior to results from AI systems that rely on disparate data points. That’s part of why between 2010 and 2018, AI research  that mentions graphs has risen over threefold, from less than 1,000 to over 3,750.

One example where graph enhanced AI can have a high-value impact today is fraud.  According to Stratistics MRC, the global fraud detection and prevention market was valued at $17.5 billion in 2017, and is expected to grow to $120 billion by 2026.  We can use graphs today to find predictive elements in data (feature engineering) that are highly indicative of fraud and then use those features to increase our machine learning accuracy.

In another area, knowledge graphs are being used to help AI systems make smarter decisions by dynamically adding rich, connected data. For example, with the eBay App on Google Assistant, a knowledge graph holds the probabilistic models that aid their AI in understanding conversational shopping scenarios.

Responsible AI

Finally, an issue in AI is avoiding the danger that we will automate human flaws and biases, creating systems that efficiently discriminate against certain groups.

Context-supported AI could also help accountable humans better map and visualise the AI decision paths. This helps reduce the ‘black box’ aspect of decision-making that can reduce confidence in why AI systems reached a particular conclusions/recommendations.

It is our belief at Neo4j that context should be incorporated into AI to ensure we apply these technologies in ways that do not violate societal and economic principles. We know that context can help guide AI systems – and we’re so convinced of this, that we have submitted a graph and AI proposal to NIST, the US government’s National Institute for Standards and Technology, which is creating a plan for the next wave of US AI government standards.

Could graph technology help us become the beneficiaries of a more accurate, insightful, and responsible technology of the future? For more and more of us, the answer is, yes.

The author is Director, Analytics and AI Program at Neo4j, a graph database company, and co-author of Graph Algorithms: Practical Examples in Apache Spark & Neo4j, published by O’Reilly Media

October 11, 2019  2:17 PM

Why businesses should be concerned by the notion of ‘data lock-in’

Brian McKenna Profile: Brian McKenna

This is a guest blogpost by James Fisher, SVP of data firm Qlik

A little over a year ago we, at Qlik, started to outline our vision for the “third generation” of BI and analytics, which we predicted would be based around three key pillars: the democratisation of data, augmented intelligence and more universally embedded analytics. At the time, our predictions were driven by the belief that the way that businesses and consumers want to interact, own and use their data had fundamentally changed.

Fast forward to 2019 and we’re seeing this sea change in attitudes to data play out in front of our eyes as enterprises strive for better data literacy skills among their workforces – and individuals explore the boundaries of true data ownership. Yet despite the very clear shift towards the idea of data democratisation and the radiant benefits of flexibility and innovation for businesses, the so-called mega vendor cloud platforms in the BI and analytics market seem to be running against the grain of the wider consensus.

Salesforce’s decision to buy Tableau and Google’s acquisition of Looker, are plays to lock-in and hoard vast tranches of data in the cloud – an implicit acknowledgement of the value of using analytics as a way to consolidate ownership of customer data. Fundamentally though, these deals are at odds with the changing mindsets of people everywhere around data management.

History repeating

There are fears with these acquisitions that history is set to repeat itself. Back in 2007, we saw a slew of high-profile acquisitions of first-generation BI tools by IBM, SAP and Oracle making plays to lock in as much data as possible into their on-premise offerings. Subsequently, customers voted with their feet and the second generation of much more analytically-minded BI platforms grew to meet shifting customer needs.

It’s now large cloud platforms that are making forays into this space – but just as before, this approach will likely fail to deliver the value that customers expect. For one, these deals are being driven by the desire to lock as much data into their cloud platforms, and as a result are denying customers flexibility and use of the platforms they want. This integration and exploitation of synergies will, in turn, stifle the ideas of community and innovation. Disconcertingly, data hoarding has also been proven to drive up cost – an unwelcome consequence for long-time customers of the likes of Tableau and Looker.

Single platforms also present difficulties around compliance. Customers are looking for flexibility in supporting government obligations too and single platforms make it harder to comply with complex multi-geographical regulation. The US-implemented CLOUD Act is a case in point. US companies who are subject to the CLOUD Act must disclose data that is responsive to valid US legal process, regardless of where the company stores the data.

The bigger picture

This shift in enterprise attitudes to data is being underpinned by changing social ideas. The breakout popularity of Netflix’s The Great Hack has helped to underline how some have been willing to play fast and loose with personal data in recent times. This has created widespread mistrust around the usage of data that has extended far beyond the industry and to people everywhere.

The silver lining is that this has, in turn, helped to galvanise unprecedented levels of engagement with their data. This widespread awareness is encouraging but needs to be matched with a greater degree of confidence and competency in handling data, by gaining a better grasp of data literacy. This will allow individuals to be in control, confident and skilled in how they manipulate data – which consequently will help businesses tap into the $320-$534million higher enterprise value that companies with high-scoring data literacy scores are worth.

It’s not just about grasping what to do with data to reveal its value, but also for the individual or organisation to understand how their data is being used. The notion of having these massive tranches of data locked away by single cloud platforms is at odds with these changing attitudes to data ownership.

Ultimately data-lock-in is not good for anyone, business, individuals or society as a whole as it will foster mistrust and reduce the opportunity we all have to do amazing things the data that is available to us.

October 7, 2019  4:10 PM

Business, data and analytics strategies – connecting the dots or just collecting?

Brian McKenna Profile: Brian McKenna

This is a guest blogpost by Michael Corcoran, senior vice president at Information Builders (

Speaking at our Information Builders‘ Summit, IDC Group vice president, Dan Vesset estimated that knowledge workers spend less than 20% of their time on data analysis. The rest of their time is taken up with finding, preparing and managing data, “An organisation plagued by the lack of relevant data, technology and processes, employing 1000 knowledge workers, wastes over $5.7 million annually searching for, but not finding information,” warned Vesset.

Vesset’s comments underline the fact that data must be business-ready before it can generate value through advanced analytics, predictive analytics, IoT, or artificial intelligence (AI).

As we’ve seen from numerous enterprise case studies, co-ordination of data and analytics strategies and resources is the key to generating return on analytics investments.

Building the case for aligning data and analytics strategies

As data sources become more abundant, it’s important for organisations to develop a clear data strategy, which lays out how data will be acquired, stored, cleansed, managed, secured, used and analysed, and the business impact of each stage in the data lifecycle.

Equally, organisations need a clear analytics strategy which clarifies the desired business outcomes.

Analytics strategy often follows four clear stages: starting with descriptive analytics; moving to diagnostic analytics; advancing to predictive analytics and ultimately to prescriptive analytics.

These two strategies must be aligned because the type of analytics required by the organisation will have a direct impact on data management aspects such as storage and latency requirements. For example, operational analytics and decision support will place a different load on the infrastructure to customer portal analytics, which must be able to scale to meet sudden spikes in demand.

If operational analytics and IoT are central to your analytics strategy, then integration of new data formats and real-time streaming and integration will need to be covered in your data strategy.

Similarly, if your organisation’s analytics strategy is to deliver insights directly to customers, then data quality will be a critical factor in your data strategy.

When the analytics workload is considered, the impact on the data strategy becomes clear. While a data lake project will serve your data scientists and back office analysts, your customers and supply chain managers may be left in the dark.

Putting business outcomes first

Over the past four decades, we have seen the majority of enterprise efforts devoted to back-office analytics and data science in order to deliver data-based insights to management teams.

However, the most effective analytics strategy is to deliver insights to the people who can use them to generate the biggest business benefits.

We typically observe faster time to value where the analytics strategy focuses on delivering insights directly to operational workers to support their decision-making; or to add value to the services provided to partners and customers.

How to align data and analytics strategies One proven approach is to look at business use cases for each stage in the analytics strategy. This might include descriptive management scorecards and dashboards; diagnostic back-office analytics and data science; operational analytics and decision support; M2M and IoT; AI; or portal analytics created to enhance the customer experience.

Identify all the goals and policies that must be included in your strategies. Create a framework to avoid gaps in data management so that the right data will be captured, harmonised and stored to allow it to be used effectively within the analytics strategy.

Look at how your organisation enables access to and integration of diverse data sources. Consider how it uses software, batch or real-time processing and data streams from all internal systems.

By looking at goals and policies, the organisation can accommodate any changes to support a strong combined data and analytics strategy.

Focus on data quality

Once you have defined your data and analytics strategies, it’s critical to address data quality. Mastering data ensures that your people can trust the analytic insights derived from it. Taking this first step will greatly simplify your organisation’s subsequent analytics initiatives.

As data is the fuel of the analytics engine, performance will depend on data refinement.

The reality for many data professionals is that they struggle to gain organisation-wide support for a data strategy. Business managers are more inclined  to invest in tangibles, such as dashboards Identifying the financial benefits of investing in a data quality programme, or a master data management initiative is a challenge, unless something has previously gone wrong which has convinced the management team that valuable analytics outputs are directly tied to quality data inputs.

To gain their support for a data strategy consider involving line of business managers by asking them what the overall goals and outputs are for their analytics initiatives. An understanding the desired outputs of data will then guide the design of the data infrastructure.

Pulling together

Often we see data management, analytics and business intelligence being handled by different teams, using different approaches, within the same organisation. This can create a disconnection between what the business wants to achieve from data assets and what is possible. Data and analytics strategies need to be aligned so that there is a clear link between the way the organisation manages its data and how it gains business insights.

  • Include people from different departments who possess a cross section of skills: business, finance, marketing, customer service, IT, business intelligence, data science and statistics. Understand how these colleagues interact and what is important to them in terms of data outputs.
  • Take into account how data interconnects with your organisation’s daily business processes. This will help answer questions about the required data sources, connections, latency and inputs to your analytics strategy. Ensuring that they work together connects data to business value.
  • Finally, consider the technology components that are required. This entails looking at different platforms that deliver the required data access, data integration, data cleansing, storage and latency, to support your required business outcomes.

Measuring the benefits

The following organisations aligned their data and analytics strategies to deliver clear business outcomes:

  • Food for the Poor used high quality data and analytics to reach its fund raising target more quickly: reducing the time taken to raise $10 million from six months to six days, so that it could more quickly help people in dire need.
  • Lipari Foods integrated IoT, logistics and geo location data, enabling it to analyse supply chain operations so that it uses warehouse space more efficiently, allowing it to run an agile operation with a small team of people.
  • St Luke’s University Health Network mastered its data as part of its strategy to target specific households to make them aware of specialised medications, reaching 98 per cent uptake in one of its campaigns focused on thirty households. “Rather than getting mired in lengthy data integration and master data management (MDM) processes without any short-term benefits, stakeholders decided to focus on time-to-value by letting business priorities drive program deliverables,” explains Dan Foltz, program manager for the EDW and analytics implementation at St. Luke’s. “We simultaneously proceeded with data integration, data governance, and BI development to achieve our business objectives as part of a continuous flow. The business had new BI assets to meet their needs in a timely fashion, while the MDM initiative improved those assets and enabled progressively better analysis,” he adds. This approach allowed the St. Luke’s team to deliver value throughout the implementation.

These are just a few examples of organisations having a cohesive data strategy and analytics strategy which has enabled them to generate better value from   diverse and complex data sets.

Gaining better value from data

While analytics initiatives often begin with one or two clear business cases, it’s important to ensure that the overall data analytics strategy is bigger than any single initiative. Organisations that focus on individual projects may find that they have overlooked key data infrastructure requirements once they try to scale.

As Grace Auh, Business Intelligence and Decision Support manager at Markham Stouffville Hospital, observed during Information Builders’ Summit, “Are you connecting the dots? Or are you just collecting them?”

Capturing data in silos to serve tactical requirements diminishes the visibility and value that it can deliver to the whole organisation. The ultimate path to creating value is to align your data and analytic strategies to each other and most importantly to the overall strategy and execution of your organisation.

August 7, 2019  10:18 AM

The Enterprise Data Fabric: an information architecture for our times

Brian McKenna Profile: Brian McKenna

This is a guest blogpost by Sean Martin, CTO and co-founder, Cambridge Semantics

The post-big data landscape has been shaped by two emergent, intrinsically related forces: the predominance of cognitive computing and the unveiling of the data fabric architecture. The latter is an overlay atop the assortment of existing distributed computing technologies, tools and approaches that enable them to interact for singular use cases across the enterprise.

Gartner describes the data fabric architecture as the means of supporting “frictionless access and sharing of data in a distributed network environment.” These decentralized data assets (and respective management systems) are joined by the data fabric architecture.

Although this architecture involves any number of competing vendors, graph technology and semantic standards play a pivotal role in its implementation. By providing business meaning to data and flexibly integrating data sources of any structural type, respectively, this tandem delivers rapid data discovery and integration across distributed computing resources.

It’s the means of understanding and assembling heterogeneous data across the fabric to make this architecture work.


The primary driver underpinning the necessity of the data fabric architecture is the thresholds of traditional data management options. Hadoop inspired data lakes can co-locate disparate data successfully, but encounter difficulty actually finding and integrating datasets. The more data that disappears in them, the more difficult organizations have governing them or achieving value.  These options can sometimes excel at cheaply processing vast, simple datasets, but have limited utility when operating over complex multiple entity laden data which restricts them to only the simplest integrations.

Data warehouses can offer excellent integration performance for structured data, but were designed in the slower pace of the pre big data era. They’re too inflexible and difficult to change in the face of the sophisticated and ever increasing demands of today’s data integrations, and are poorly suited for tying together the unstructured (textual and visual) data inundating the enterprises today. Cognitive computing applications like machine learning require far more data and many more intricate transformations, necessitating modern integration methods.

Semantic Graphs

The foremost benefit semantic graphs endow data fabric architecture with is seamless data integrations. This approach not only blends together various datasets, data types and structures, but also the outputs of entirely distinct toolsets and their supporting technologies. By placing a semantic graph integration layer atop this architecture, organizations can readily rectify the most fundamental differences at the data and tool levels of these underlying data technologies. Whether organizations choose to use different options for data virtualization, storage tiering, ETL, data quality and more, semantic graph technology can readily integrate this data for any use.

The data blending and data discovery advantages of semantic graphs are attributed to their ability to define, standardize, and harmonize the meaning of all incoming data. Moreover, they do so in terms that are comprehensible to business end users, spurring an innate understanding of relationships between data elements. The result is a rich contextualized understanding of data’s interrelations for informed data discovery, culminating in timely data integrations for cutting edge applications or analytics like machine learning.

With the Times

Although the data fabric architecture includes a multiplicity of approaches and technologies, that of semantic graphs can integrate them—and their data—for nuanced data discovery and timely data blending. This approach is adaptable for modern data management demands and empowers data fabric architecture as the most suitable choice for today’s decentralized computing realities.

The knowledge graph by-product of these integrations is quickly spun up in containers and deployed in any cloud or hybrid cloud setting that enhances germane factors such as compute functionality, regulatory compliance, or pricing. With modern pay on demand cloud delivery mechanisms in which APIs and Kubernetes software enable users to automatically position their compute where needed, the data fabric architectures is becoming the most financially feasible choice for the distributed demands of the modern data ecosystem.

July 24, 2019  2:39 PM

Why empathy is key for Data Science initiatives

Brian McKenna Profile: Brian McKenna

This is a guest blogpot by Kasia Kulma, a senior data scientist at Mango Solutions

When we think of empathy in a career, we perhaps think of a nurse with a good bedside manner, or perhaps a particularly astute manager or HR professional. Data science is probably one of the last disciplines where empathy would seem to be important. However, this misconception is one that frequently leads to the failure of data science projects – a solution that technically works but doesn’t consider the problem from the business’ point of view. After all, empathy isn’t just about compassion or sympathy, it’s the ability to see a situation from someone else’s frame of reference.

To examine the role of empathy in data science, let’s take a step back and think about the goal of data science in general. At its core, data science in the enterprise context is aimed at empowering the business to make better, evidence-based decisions. Success with a data science project isn’t just about finding a solution that works, it’s about finding one that meets the following criteria:

  • The project is completed on time, on budget, and with the features it originally set out to create
  • The project meets business goals in an implementable and measurable way
  • The project is used frequently by its intended audience, with the right support and information available

None of these are outcomes that can be achieved by a technical solution in isolation; instead, they require data scientists to approach the problem empathetically. Why? Because successful data science outcomes rely on actually understanding the business problem being solved, and having strong collaboration between the technical and business team to ensure everyone is on the same page – all of which is essential, and a key resource for getting senior stakeholder buy-in.

In short, empathy factors in throughout every stage of the process, helping create an idea of what success looks like and the business context behind that. Without this, a data scientist will not be able to understand the data in context, including some of the technical aspects such as what defines an outlier and subsequent treatment in data cleaning. The business process, even with less technical understanding, will have far better insight into why data may look “wrong” than a data scientist alone could ever guess at. Finally, empathy helps build trust – critically in getting the support of stakeholders early in the process, but then also in the deployment and evaluation stage.

Given the benefits, empathy is key in data science. To develop this skill, there are some simple techniques to drive more empathetic communication and successful outcomes. The three key questions that data scientists should be looking to answer are: “What do we want to achieve?” “How are we going to achieve it?” and “How can we make sure we deliver?”

What do we want to achieve?

For the first point, one approach is to apply agile development methodology to the different users of a potential solution and iterate to find the core problem – or problems – we want to solve. For each stakeholder, the data science function needs to consider what type of user they represent, what their goals are and why they want this – all in order to ensure they understand the context in which the solution needs to work. By ensuring that a solution addresses each of these users’ “stories”, data scientists are empathetically working to recognise the business context in their approach.

How are we going to achieve it?

Then it’s a case of how to go about achieving a successful outcome.  One helpful way to think about it is to imagine that we are writing a function in our code: given our desired output, what are the necessary inputs? What operation does our function need to perform in order to turn one into the other? Yes, the “function” approach does not only apply to data, but also to the process of creating a solution. Data scientists should be looking at an input of “the things I need for a successful solution” a function for “how to do it” and then an output of the desired goal. For example, if the goal is to build a successful churn model, we need to consider high level inputs such as sign-off from relevant stakeholders, available resources and even budget agreements that might contain the project. Then, in the function stage, it may be time to discuss the budget and scope with senior figures, work out if additional resources need to be hired and any other items needed to drive the right output at the end. This can then be broken down into more detailed individual input-function-output processes to get desired outcomes.  For example, working out if additional resources need to be hired can become a function output that will now have a new set of relevant inputs and actions driving the solution.

How can we make sure we deliver?

Finally, there are questions that need to be asked in every data science project, no matter what the scope or objective. In order to ensure that none of them are omitted, stakeholders should form a checklist, a strategy that has been successfully used in aviation or medical surgery to reduce failure.  For example, preparing to build a solution that suits the target environment shouldn’t be a final consideration, but instead a foundational part of the planning of any data science project. Thus, a good checklist that data scientists could consider in the planning stage could include:

  • Why is this solution important?
  • How would you use it?
  • What does your current solution look like?
  • What other solutions have you tried?
  • Who are the end-users?
  • Who else would benefit from this solution?

Only with this input can data scientists build a deployable model or data tool that will actually work in context, designed for its eventual users rather than for use purely in a theoretical context.

Empathy may seem an unusual skill for a data scientist, however embracing this value fits into a wider need for a culture of data science within organisations, linking business and data science teams rather than keeping them in siloes. By encouraging dialogue and ensuring all data science projects are undertaken with the stakeholders in mind, data scientists have the best chance of building the most effective solutions for their businesses.

June 4, 2019  2:28 PM

Why the real value of AI in business is in automating backend tasks

Brian McKenna Profile: Brian McKenna

For all the hype around artificial intelligence (AI), and the excitement around some of its potential – personal assistants that develop a personality, robot-assisted micro surgery, etc. – it is arguably adding most value to businesses in less glamorous, but ultimately more valuable, ways, says Nuxeo’s Dave Jones, in a guest blogpost.

Backend tasks in a business are few people’s favourite. They are hugely time consuming, rarely rewarding but are vitally important. Automating these tasks is an area where AI has the potential to add incredible value for businesses.

AI and information management

Information management is an area with many ways in which AI can be of benefit. AI allows organisations to streamline how they manage information, reduce storage, increase security, and deliver faster and more effective searches for content and information.

Many companies are struggling with the volume of information in modern business and find it difficult for users to locate important information that resides in multiple customer systems and transaction repositories. The key to solving this problem is having accurate metadata about each content and data asset. This makes it easy to quickly find information, and also provides context and intelligence to support key business processes and decisions.

Enrichment of metadata is one area that AI really excels at. Populating and changing metadata before AI was a laborious task – not made any easier by the fixed metadata schemas employed by many content management systems. However, metadata schemas in an AI-infused Content Services Platform (CSP) are flexible and extensible. Much more metadata is being stored and used than ever before, so the ability to use AI to process large volumes of content and create numerous and meaningful metadata tags is a potential game-changer.

Unlocking the content in legacy systems

Another powerful way in which AI can address backend tasks, is in connecting to content from multiple systems, whether on-premise or in the cloud. This ensures the content itself is left in place, but access is still provided to that content and data from the AI-infused CSP.

It also provides the ability for legacy content to make use of a modern metadata schema from the CSP – effectively enriching legacy content with metadata properties without making any changes to the legacy system at all. This is a compelling proposition in itself, but when combined with the automation of AI, even more so.

By using a CSP to pass content through an AI enrichment engine, that content can be potentially enriched with additional metadata attributes for each and every one of the files currently stored. This injects more context, intelligence, and insight into an information management ecosystem.

But by using an AI-driven engine to classify content stored within legacy systems, this becomes much easier to do. Even simple AI tools can identify the difference between a contract and a resume, but advanced engines expand this principle to build AI models based on content specific to an organisation. These will deliver much more detailed classifications than could ever be possible with generic classification.

Backend AI in action

A manufacturing firm I met with recently has been automating the classification and management of its CAD drawings. There is a misconception that AI needs to be super intelligent to add real value. But in this example the value of AI is not the intelligence required to identify what qualifies as a particular kind of design drawing, but to be ‘smart enough’ to recognise the documents that definitely ‘aren’t’ the right type – essentially to sift out the rubbish and allow people to focus on the relevant information much faster.

Information management and associated backend tasks may not be the most glamourous AI use cases but if done well, they can provide significant value to businesses all over the world.

Dave Jones is on the Board of Directors at the Association for Intelligent Information Management (AIIM) and is also Director of Product Marketing, Nuxeo.

Forgot Password

No problem! Submit your e-mail address below. We'll send you an e-mail containing your password.

Your password has been sent to: