Data Matters

Page 1 of 1012345...10...Last »

August 4, 2017  10:08 AM

Blockchain beyond the realm of financial services

Brian McKenna Profile: Brian McKenna

This is a guest blog post, in which Jody Cleworth, CEO of MTI [Marine Transport International], explains why blockchain is, in his view, good for businesses in the real economy, including the shipping industry.

Blockchain is best known as the technology behind the bitcoin cryptocurrency. Although this is its most common association, it is important for businesses to recognise that blockchain is a standalone technology, with its value reaching beyond the realm of cryptocurrencies and financial services.

Blockchain is unique in its ability to provide a secure, yet transparent, platform that enables permissioned access to private transactions and digital records. Even more impressively, it can do this in real time and can be used in just about any industry. To name but a few, shipping, manufacturing, and real estate are beginning to recognise and exploit the promise of blockchain: that of absolute trust and accessibility.

Simply put, blockchain is a way of recording data: that is, anything that needs to be individually recorded and certified as having happened. It is a type of digital ledger that can be used to record a wide range of different processes. It is of most value when it comes to recording the steps in a process that involve a wide range of parties, where responsibility is handed off at each point. If a supply chain is blockchain enabled, absolute certainty can be created at each step of the process. Placing a digital ‘block’ on the ledger indicates that a process has taken place, and all parties have the ability to view this. It is similar to using a Google document, which is shareable and visible to all those inputting data.

In the business world, blockchain technology is becoming increasingly attractive for tracking the movement of items through supply chains which link a variety of organisations. A container logistics company, for example, is obligated to interact with stakeholders such as shipping lines and port authorities, and so there are many points at which accountability could become an issue. In order to ensure that containers are shipped and received in a good state, a safeguard is needed. Blockchain ensures that each party within the supply chain takes responsibility for their own dataset, but also shares their data with all other parties.

Early Proof of Concepts in the shipping industry have attested that greater efficiency is achieved by supporting supply chains with blockchain technology. Shipping companies, suppliers and distributors have recognised that ‘smart contracts’ can decrease costs and increase profitability, as a result of capitalising on the links between supply chain actors. The use of blockchain in the shipping industry serves as a promising example for all sectors that engage in a series of interrelated processes, as blockchain increases collaboration between parties, heightens visibility, and reduces resistance.

In addition, the distributed nature of a blockchain database makes it harder for hackers to attack. Central servers that store data are easy targets for cyberattacks, but the blockchain model does not include this. Instead, the data is copied identically across each ‘node’ in the network, meaning that if one computer is successfully targeted, it does not result in business devastation. Ultimately, in order to hack a blockchain database, simultaneous access to every copy of the database would be needed.

Blockchain has the ability to empower a multitude of industries to better adapt to the digital economy. Embracing this technology allows executives to assert greater control over their transactions, whilst protecting the privacy of terms and conditions between parties. Essentially, blockchain creates a secure business environment, where reliable transactions become a reality without the need for a centralised authority.

August 3, 2017  1:49 PM

How to get the best business value out of data scientists

Brian McKenna Profile: Brian McKenna

This is a guest blog post by Jane Zavalishina, CEO, Yandex Data Factory

Well-established enterprises like retailers or manufacturing companies now have an abundance of data at their disposal. Unfortunately, merely possessing vast amounts of raw data does not lead directly to increased efficiency or the rapid development of new revenue streams. Instead, everyone must now figure out exactly how to make this data work for them.

Following in the footsteps of the internet giants – Google, Facebook and others – established enterprises are eager to invest in advanced analytics solutions to capitalise on the opportunities that possessing this data presents. To address this, an increasing number of businesses are deciding to bring machine learning in-house – introducing new departments and resources to accommodate. Others are choosing to collaborate with external teams to tackle the task. Regardless of the approach chosen, both bring a new distinct set of challenges to resolve.

The main challenge is revealed in the name of the discipline itself: “data science“. To succeed, enterprises need to merge two very different worlds – an economics-driven business and a scientific, data-driven department. While the cultural and organisational clashes are hard to avoid, they are rarely foreseen.

Here are few things to keep in mind:

Business is to set the goals, and it may be not easy 

Decision-making in businesses is far from data-driven –  with authority, persuasion and vision playing a significant role. Science, however, is based purely on evidence and experiments. Synthesizing these two approaches is the primary challenge when you start to work with data science.

Businesses will have to learn to formulate tasks based on what they want to predict or recommend, rather than “understanding” or “insights”. Despite being used to explicable measurements and disputable arguments, they will have to learn to work with uninterpretable results by managing them through correctly defined metrics. The task of translating the business problem into a mathematical statement, with precisely defined restrictions, and setting a goal in such a way that actually measures its influence on the business is an art in itself.

For example, if the goal is to improve the efficiency of marketing offers, it would be incorrect to task a data scientist with investigating the top ten reasons for refusals or delivering an innovative way to segment the audience. Instead, they should be tasked with building a “recommender” system that will optimise a meaningful sales metric: the margin of the individual order, the number of repeating purchases, or increase in sales of a specific product group. This must be identified beforehand based on the business’ strategic priorities.

The whole business will be affected, not just one department

When it comes to data science, it is wrong to think that a new means to solve a set of given tasks will be handled by one single department. The introduction of highly precise “black box” models will eventually affect company culture, the organisational structure and approaches to management – all of which must be taken into account to succeed.

One aspect of this is data access. Businesses should work on the ways to establish easy sharing of data which is too often siloed within each individual department.

Another is the ability to experiment. The only way to estimate the success of a machine learning model is by putting it to practice and measuring the effect as compared to the existing approach, isolating all other factors. However, running such A/B tests in an established business may not always be straightforward. For example, retailers aren’t going to just stop promotional activities in a few stores to have a baseline group for comparison. Preparing, organising and strictly following such procedures to measure the effect of data science applications on a business is part of the job and so will need to be integrated in the company DNA.

Last, but not least: managing blame. For decision automation to succeed, top management must support the initiative. However, once the models are put to work, it is impossible to place responsibility on any individual person any more – say, the one who used to sign off the demand forecast. Instead, the roles should be given new definition and new ways of assigning responsibilities and controlling the results, introduced.

Every data science project is a small research project

Leading an in-house data science team also requires a different approach to project management. Data science differs from software development, or other activities where the project can be broken down into pieces, and the progress easily monitored. Building machine learning models involves a trial-and-error approach, and it is impossible to track whether your model is, say, 60% or 70% done.

Businesses, on the other hand, are used to the following process: planning ahead, tracking the progress and looking at tangible intermediate results. With data science, it is no longer viable to plan a whole project and expect smooth movement towards the end goal. Project managers should instead be carefully planning quick iterations, keeping in mind that failure is always a possible outcome. Adjustment to a no-blame culture should be also part of the job.

As always with science, you cannot expect to make a discovery according to a plan. But managing a lab efficiently will deliver a product of predictable quality, and not just exploratory research. To succeed businesses will need to understand how to make this work for them, rather than transferring old project management guidelines to a new team.

You will have to get comfortable with a lack of understanding

Data science is a complex discipline, involving major chunks of statistics, probability theory, and, in essence, years of rigorous studies. Sadly, managers have very little chance of acquiring this knowledge quickly.  

When acting as team leads, or as internal clients for data science, managers will need to get comfortable will this lack of technical knowledge. Instead, they should articulate success through results, according to defined experimental procedures put in place to check the quality of the models.

 You cannot do everything in-house

Build or buy is the often question faced by organisations. Many businesses with stronger IT teams are very tempted to run an own data science department, sometimes not fully comprehending the challenge and the implications.

If data science is very close to the core business that is probably the right choice – it would be hard to imagine Amazon outsourcing it recommendations engine. But when you are an offline retail, factory, or bank, having data science in-house often equals building a separate R&D business. This presents new business challenges like competing with the internet giants when hiring rare talent. What’s more, many soon discover that either financial resources are unavailable, or is it strategically incorrect to add a new data science line of business.

At the other extreme, full outsourcing means lacking the expertise on a matter of major importance to the modern businesses. The answer is in combination. What businesses should strive to do is build teams that can tackle the challenges that are close to the core business, or require deep domain knowledge, while competently outsourcing the rest.

Developing this “data science purchasing” capacity is of outmost importance to always stay on the top of the game. You should be able to formulate the tasks and establish business cases, to easily test and compare quality of external algorithms, even probably running several different at a time, and switching to the best one without any disruption.

Summing up

The role of a data scientist within a business is to access data and “extract value” from it, based on the objectives they have been given. The truth is, data scientists are not magicians. They must be given the right metrics, huge amount of data and the time to experiment to deliver the required outcomes, with failure still remaining a possible option.

This scientific approach is unusual in business, and understanding how to work with data scientists as a business leader is key to success.


August 1, 2017  3:49 PM

Advancements in automation: why AI won’t replace the human touch

Brian McKenna Profile: Brian McKenna

This is a guest blog post by Larry Augustin, CEO, Sugar CRM

The way we engage with technology is changing constantly and there’s hardly a day that goes by where we don’t hear about Artifical Intelligence (AI) and the ‘rise of machines.’ While mature AI-related technologies are still in their infancy, it’s a fact that workplace automation is already here across all sectors. From farming to fashion, businesses are using automation to reduce costs and pick up the pace of production. A report from PwC earlier this year claims that in the next 15 years, 30% of jobs in the UK could be affected by automation, with 46.4% of manufacturing jobs and 56.4% of storage jobs being automated by the early 2030s. Despite these high figures, PwC argues that for the most part, automation will enhance productivity, cut costs and add real value to sectors that depend on intelligent human interaction.

AI requires machines learning. And, to do so, machines needs data – lots of it. But, humans are also becoming better at their jobs because of the weath of data out there that gives us a universe of information whenever, wherever. We need only to turn to the likes of Google to see how data driven applications have re-invented the way we exist, and all this free information has improved the way businesses operate. But getting the right data fast can be a cumbersome process – after all, not all data is good data. The good news is that help is at hand and technology is advancing to help automate the most lengthy and mundane of data entry tasks.

The evolution of  CRM

Technology has always progressed at dizzying rate. In the 1960s, Gordon Moore coined the term Moore’s law, which predicted that the overall processing power of a computer would double every two years. Although technology has far surpassed Moore’s predictions, the basic premise remains the same – that technology by its very nature will continually advance.

Customer Relationship Management (CRM) systems have not followed Moore’s prediction. For a long time, legacy CRM was stuck as online record keeping systems that was good for generating reports and telling how effective you were last quarter, but didn’t do a lot to help you improve in the future. Finally, with more data, analytcs and automation, the evolution of CRM has enabled businesses to streamline data to get optimum results. It has evolved from somewhat a cumbersome platform to a productivity enabler, providing relationship intelligence and bringing in data from outside source sand it’s something we’ve been proud to bring to the market in our own recent launch of Sugar Hint. Ultimately, it’s about automating mundane tasks, allowing humans to focus on tasks that only humans at present can do.

The importance of human interactions

Despite the rapid growth of technology, a report by Gartner outlines that 89% of companies now believe that the customer experience they can offer is the most important benchmark by which they are judged. It’s a figure that encapsulates the vital role that humans still have in the workplace. Although data can be effectively sorted by a machine in under a millisecond, humans are the ones who implement technology and give it meaning.

In the context of AI and chatbots managing customer interactions, there are some benefits in using computer power to handle high volume tasks – like in e-commerce. But I’d argue that in high value situations – car sales or investment banking for example – people still want to deal with people. Where the technology comes in, is in supporting humans in giving them the information they need intelligently at a time they need it. It complements the job, rather than replacing it.

What will become of humans?

The rise of automation will reinvent traditional business practices. We don’t need to sit and cower in fear of a global AI takeover, we need to understand how automation will enable us as humans to do more of what we do best: being human. Automating menial tasks such as data sorting and emailing will enable businesses to re-work the goalposts of traditional jobs. Technology will be the enabler that allows employees to focus on their own skillsets. A report by Accenture maintains that 80% of businesses believe that technology will enable a more fluid workforce. It would seem that if we get automation to do the groundwork, employees, employers and customers will all see positive benefits.

History is littered with prophesises that automation will make humans redundant. We need only to turn to the First Industrial Revolution where textile Luddites were concerned that machines would take their jobs and steal their livelihoods. But, with hindsight we can see these advancements only enhanced the workers lives and increased productivity. As we now enter the Fourth Industrial Revolution this fear again has once again come into play. Although there will be a marked increase in automation, technology will aid businesses and employees – making for a better employee and a more successful business.


July 12, 2017  10:30 AM

The Role of the CDO: gatekeeper or innovator?

Brian McKenna Profile: Brian McKenna

This is a guest blogpost by Yasmeen Ahmad, director of Think Big Analytics, Teradata.

In recent years, the hype of big data has fuelled board and executive level awareness of the value to be gained from data driven opportunities, a big step from when data did not even appear on the corporate agenda. This awareness has unavoidably resulted in increased scrutiny of data quality, accuracy, transparency and privacy, as well as necessitating further compliance and regulatory reporting. Thus, we see the rise of the Chief Data Officer (CDO).

But does anybody know exactly what a CDO does? It’s a frequently asked question, and one that continues to be shaped as companies become increasingly data driven.

When it comes to defining the role of Chief Data Officer (CDO), Usama Fayyad, the world’s first CDO at Yahoo stated, “it’s not just about internal decisions from data; it’s what can we provide the customers in terms of data.”

Fayyad initially suggested the CDO title light heartedly to ex-Yahoo CEO Jerry Yang, but the definition and creation of this role was crucial to Yahoo and now many other companies around the globe. As Fayyad indicated, this role was essential to push organisations beyond simply capturing and storing data, to driving significant business value from data through analytics.

In businesses today, the level of investment in data technologies and platforms makes it clear that data is widely recognised as an asset and a means for achieving competitive advantage. The CDO role is essential to win-back on this investment by driving business outcomes from big data.

The world’s most successful CDOs use data and analytics to fuel business value. By proliferating analytics throughout an organisation, a CDO can use insight gained from data to develop strategic advantage as well as preserve a competitive edge.

This means forming a balance between the roles and responsibilities of Gatekeeper and Innovator. As gatekeeper, the CDO is focused on the important task of managing to a data strategy, fine tuning and implementing a data governance process, and making sure regulatory compliance is stuck to. Additionally, security is a top concern.

Tom Davenport, professor in IT and management at Babson College puts it like this: “Defence is a tricky area to inhabit as CDO, because if you succeed and prevent breaches and privacy problems and security issues, no one necessarily gives you any credit for it or even knows if your work was successful. And if you fail, it is obviously very visible and bad for your career.” CDOs must supplement defence with offence – productionising analytic processes, adding insights to enable business actions and creating digitalised data products.

To enable innovation and company transformation, the onus will be on the CDO to define and execute an organisational data vision, and map out a future that drives business value. To do so, the CDO strategy will need to:

  • Direct data into the hands of business analysts as quickly as possible: in seconds not minutes, hours not days, and days not weeks or months.
  • Push iterative learning –test-and-learn; fail fast, learn faster – “quick wins” that build organisational credibility, alignment, and momentum, as well as demonstrate true business value.

Ultimately, CDOs must be defensive about data and comply with all of the regulations, governance and security requirements – however, this delivers no value to the business. It is in fact the creation of data driven insights for the company, monetisation of data and analytics that creates value and a real competitive edge.


July 11, 2017  3:35 PM

GDPR and data portability – how do we solve a problem like Maria?

Brian McKenna Profile: Brian McKenna

This is a guest blogpost by Michael Corcoran, Chief Marketing Officer at Information Builders

In less than a year the EU General Data Protection Regulation (GDPR) will come into force, updating and superseding the UK’s Data Protection Act 1998, which currently governs the protection of personal data in the UK.

From 25 May 2018 UK organisations will be bound by the new regulation until Britain leaves the European Union and thereafter, any organisation that deals with customers in the European Union will have to comply with EU GDPR, regardless of where their organisations are based.

What does it cover?

GDPR stipulates that personal data should only be collected for specific purposes and that organisations must not store data for longer than necessary. Citizens have the right to be informed that their personal data is being processed. They have the right to access their own data and the right to rectify information held about them that is incorrect or incomplete. Citizens also have the right to receive compensation from data controllers for any damages suffered as a result of their data being misused.

Any breach must be reported as soon as possible, (ideally within 24 hours) to limit the damage to citizens whose data has been accessed, corrupted, or stolen and organisations found to be non-compliant face fines of up to 4% of their annual turnover.

A key data governance issue that organisation face is identifying which information to protect. Good data management practices will help to mitigate some of the risk.

Data Quality and Data Portability

A notable reform contained in the new regulation provides citizens with, “a specific right to be forgotten. This is a fundamental modernisation of the rules establishing a number of new rights for citizens, for instance the right to freely transfer personal data from one service provider to another”.

While the GDPR fact sheet points out that ‘the right to be forgotten is not absolute’, it also states that partner organisations must also be informed of the customer’s wishes, stating: ‘companies should take every reasonable step to ensure that third parties, to whom the information has been passed on, are informed that the individual would like it deleted. In most cases this will involve nothing more than writing an email.’

This right to data portability poses a problem for organisations that are storing several versions of the same customer’s details. Take for example a customer named Maria Brown, née Maria Green. Your databases may have her recorded as Mrs Maria Brown; Ms Maria Brown and Miss Maria Green. The mobile numbers might match, the addresses might not. You may also have an entirely different customer, Miss Maria Brown, who is also known as Mrs Maria Curtis. The issue becomes more complex if an organisation has undergone a merger or acquisition and Maria Brown also appears in several instances on the second company’s databases.

How do we solve a problem like Maria?

To manually comply with this aspect of the new EU law, a data steward needs to go into your CRM application, search for Maria Brown by name, extract any data found and put it into a spreadsheet. However, customer data can be stored in a multitude of systems: order processing systems, customer support databases, marketing automation tools, website databases. Each of these systems may contain variations on the customer’s name, address, email and telephone contact details. This will require many employees to search for all the possible variations of data relating to that customer, to extract the data. To verify that your organisation has complied with the new law, you will then need to take this data from all of those systems and put them into a format that can be transferred to another company.

The data portability clause, ‘requires controllers to provide personal data to the data subject in a commonly used format and to transfer that data to another controller if the data subject so requests.’ Therefore, the data extracted must be in an accessible, transferrable format.

This means that one (or several) of your employees will have to massage the data extracted from all the different systems that stored it and enter it into an Excel or CSV file format ready for transfer.

If Maria Brown contacts your organisation, citing her rights under EU GDPR, and requests that her details are transferred to another service provider and removed from your own databases, would your organisation be able to identify which Maria it was dealing with and whether the correct details had been transferred then deleted?

Many organisations will be tempted to employ manual processes to handle such requests under GDPR. However, this is exactly the type of manual, time-consuming, tedious work that is ripe for automation.

Show evidence

EU GDPR also requires that organisations show how they have complied, for example, by documenting the decisions taken about processing an individual’s personal information.

As a result, data cleansing and data quality processes currently used for marketing and customer relationship management purposes, will now have a role to play in organisations’ preparation for GDPR compliance. We will see large enterprises, financial organisations, telecoms providers and the public sector moving from data swamps to data lakes as they focus on the data assets that they need to protect. Auditing will also play a central role to enable organisations to demonstrate compliance, with clear, watertight processes for documenting customer opt-ins and opt-outs.

Moving from the data swamp to the data lake

Having examined the requirements under the new law, we recognise some familiar data quality painpoints which master data management (MDM) technology was specifically designed to address.

MDM can be used to define rules that connect data from different systems, creating a golden record: a single view of each customer, citizen, patient, or employee which contains the most recent correct information. Once organisations have established which is the right personal data, they can focus on protecting it.

If customers do exercise their right to data portability, MDM can also be used to minimise manual efforts. This will help organisations to reduce the pain and administrative cost associated with identifying the right records, verifying that they are correct and transferring the right data in the right format on behalf of customers.

MDM addresses just one aspect of the EU GDPR requirements, but as we prepare for May 2018 it’s reassuring to know that there are already tools in place that can help to reduce the pain and cost of compliance.


June 19, 2017  11:52 AM

Forensic financial analysis software used to combat fraud

Brian McKenna Profile: Brian McKenna

This is a guest blogpost by Ian Watson, CEO of Altia-ABM. It reflects his experience and judgement.

Specialist financial analysis software is now a crucial tool in the investigation and prosecution of crime. It not only increases the speed of criminal investigations but also enables prosecutions in circumstances where they would have been very difficult and time consuming previously.

The following two significant cases of fraud highlight where prosecutions have been made possible due to the fact that the use of specialist software can aggregate huge volumes of paper-based and electronic data into a standard format and demonstrate links between separate items of evidence in complex cases.

Housing fraud in Southwark

The first case involves a homelessness case officer, employed by Southwark Council, who was found to have committed serial housing fraud over a period of three years. Ibrahim Bundu, a Southwark Council housing officer, fraudulently obtained council property for his mother, his ex-wife, his estranged wife and her aunt, and others who paid him in cash, totalling 23 properties allotted to people who should never have received them, some of whom were legally entitled to be in the UK.

Southwark Council, HMRC and the UK Borders Agency worked in partnership to establish the circumstances of the fraud.

Bundu had created reams of bogus documents including fake identification papers, references and forged medical certificates claiming that some of the applicants were pregnant and therefore should be considered a priority for accommodation.

The investigating officers used specialist software to bring together all of Bundu’s case notes and supporting documents over a three-year period. All of this documentation was combined with information held by HMRC and the UK Borders Agency and from this vast volume of data, investigators and prosecutors were able to pinpoint key evidence which demonstrated Bundu’s involvement in each suspicious case and ultimately led to his conviction.

The software was also used to establish the immigration status of individuals he had assisted, to enable UK Borders Agency to take action against individuals who were in the country illegally.

Investigators from all three agencies reported that they would not have been able to construct a viable case against Ibrahim Bundu without the ability to use technology to compile this amount of data into a format where investigators could begin their forensic analysis that demonstrated his guilt.

Southwark has one of the longest council house waiting lists in the UK and, it could be said, this fraud meant that people in genuine need were pushed further down the queue.

He was sentenced to four years in prison, later increased to six years after he failed to make reasonable efforts to pay back the £100,000 for which he was personally liable.

NHS fraud

The second case was a brazen £3.5m fraud siphoning money from NHS funds over a seven-year period before the perpetrators were brought to justice.

Neil Wood was a senior manager at Leeds and York Partnership NHS Trust and also worked with Leeds Community Healthcare Trust before moving to NHS England. Wood was responsible for the awarding of training contracts for NHS staff in all three roles. He awarded the vast majority of these contracts to a company called The Learning Grove, which was run by his friend Huw Grove. The Learning Grove gradually transferred a total of £1.8m to LW Learning Ltd, a company registered in the name of Lisa Wood – Neil Wood’s wife. While at NHS England, Wood awarded a training contract worth £231,495 to a company in Canada called Multi-Health Systems, which was run by Terry Dixon. Dixon was a contact of Neil Wood. He kept £18,000 of the money and transferred the rest back to LW Learning Ltd.

The investigation was conducted jointly by Police North East, HMRC and NHS Protect. Seven years of financial data had to be compiled into a format whereby investigators from the three organisations could identify the relevant financial transfers between the many bank accounts used in the UK and Canada and follow the money trail to show, beyond doubt, that these individuals had knowingly and intentionally committed the fraud.

Neil Wood, Huw Grove and Terry Dixon were sentenced to a total of 9 years, 8 months in prison, for fraud, abuse of position and money laundering. Lisa Wood was given a suspended sentence for money laundering.

Investigative teams from NHS Protect, HMRC and police forces nationally and internationally are increasingly relying on investigation software as a key weapon in their fight against crime.

My own company Altia-ABM has, I would say, been at the forefront of developing software to enable investigators to achieve more in a shorter time and to assist in the development of cases against criminals.

Our technology automates much of the data mapping and cross-reference process, allowing trained and experienced investigative staff to home in on the key transactions that prove wrongdoing. It has been estimated that complex investigations take more than 10 times the man-hours to complete if the data was cross-referenced manually. In addition, our  software also generates documentation which is accurate enough to stand up in court and withstand close scrutiny.


June 12, 2017  10:55 AM

12 Months to GDPR: the year of metadata

Brian McKenna Profile: Brian McKenna

This is a guest blogpost by Ciaran Dynes, senior vice president of Product, Talend.

The General Data Protection Regulation (GDPR) is a bit like Brexit for some: you secretly hoped the day was never going to arrive, but GDPR is coming and there will be major penalties if companies don’t have a strategy for how to address it.

Just under a year from now, on 25 May 2018, GDPR will go into effect. That means all businesses and organisations that handle EU customer, citizen or employee data, must comply with the guidelines imposed by GDPR. It forces organisations to implement appropriate technical and organisational measures that ensure data privacy and usage is no longer an after-thought.

GDPR applies to your organisation, regardless of the country in which it’s based, if it does any processing of personal data from European citizens or residents. So, depending on how your organization manages personal data on behalf of its customers, such as “opt-in” clauses, GDPR could become your worst nightmare in the coming year if you aren’t properly prepared.

As an industry, we talk about a lot about digital transformation, being data-driven, data being the new oil, and any other turn of phrase you might consider, but for a moment spare a thought for metadata. Metadata is your friend when it comes to addressing the many requirements stipulated by GDPR. Of course, metadata has been in the news for different reasons in the recent past, but I would reiterate that it is critical to solving GDPR.

The regulation applies if the data controller (organisation that collects data from EU residents) or processor (organisation that processes data on behalf of data controller e.g. cloud service providers) or the data subject (person) is based in the EU.

Does GDPR apply to your company?

If the answer is ‘yes’ to any of the following questions, then GDPR should be a high priority for your company:

  • Do you store or process information about EU customer, citizens or employees?
  • Do you provide a service to the EU or persons based there?
  • Do you have an “establishment” in the EU, regardless of whether or not your store data in the EU?

Where to begin when addressing GDPR for your customers

First, you need to understand the rights that your customers have in regards to their personal data. When it comes to GDPR there are many regulations around personal data privacy.

For example, perhaps you implement the following GDPR data privacy guidelines in your systems:

  • Customer has the right to be forgotten
  • Customer has the right to data portability across service providers
  • Customer has the right to accountability and redress
  • Customer has the right to request proof that they opted in
  • Customer is entitled to rectification of errors
  • Customer has the right of explanation for automated decision-making that relates to their profile

In a world where customer data is ‘king’, being captured by the terabyte, you need a controlled way to collect, reconcile, and recall data from multiple, disparate sources in order to truly comply with GDPR regulations. It should be stated that GDPR impacts all lines of business, not just marketing, so a holistic approach is fundamentally required in order to be compliant with the regulations. That’s where metadata comes in.

The value of metadata

In order to have a complete view of all the data you have about a person, you need to have access to the associated metadata.

Metadata sets the foundation for compliance as it brings clarity to your information supply chain, for example:

  • Where does data come from?
  • Who captures or processes it?
  • Who publishes or consumes it?

This critical information is the backbone to establishing a data governance practice capable of addressing GDPR. Your organization needs to define the policies, such as anonymization, ownership, data privacy, throughout your organizations, including an audit trial for proof of evidence should an auditor arrive at your door. 

Stephen Cobb of welivesecurity.com has published a great article on GDPR where he compiles the following list that highlights the key implications of the forthcoming GDPR regulations —including financial consequences and costs. I strongly recommended reading the article in full. 

11 things GDPR does

  1. Increases an individual’s expectation of data privacy and the organization’s obligation to follow established cybersecurity practices.
  2. Establishes hefty fines for non-compliance. An egregious violation of GDPR, such as poor data security leading to public exposure of sensitive personal information, could result in a fine of millions or even billions of dollars (there are two tiers of violations and the higher tier is subject to fines of over 20 million euros or 4% of the company’s net income).
  3. Imposes detailed and demanding breach notification requirements. Both the authorities and affected customers need to be notified “without undue delay and, where feasible, not later than 72 hours after having become aware of [the breach]”. Affected companies in America that are accustomed to US state data breach reporting may need to adjust their breach notification policies and procedures to avoid violating GDPR.
  4. Requires many organizations to appoint a data protection officer (DPO). You will need to designate a DPO if your core activities, as either a data controller or data processor, involve “regular and systematic monitoring of data subjects on a large scale.” For firms who already have a chief privacy officer (CPO), making that person the DPO would make sense, but if there is no CPO or similar position in the organization, then a DPO role will need to be created.
  5. Tightens the definition of consent. Data subjects must confirm their consent to your use of their personal data through a freely given, specific, informed, and unambiguous statement or a clear affirmative action. In other words: silence, pre-ticked boxes, or inactivity no longer constitute consent.
  6. Takes a broad view of what constitutes personal data, potentially encompassing cookies, IP addresses, and other tracking data.
  7. Codifies a right to be forgotten so individuals can ask your organization to delete their personal data. Organisations that do not yet have a process for accommodating such requests will need to establish one.
  8. Gives data subjects the right to receive data in a common format and ask that their data be transferred to another controller. Organisations that do not yet have a process for accommodating such requests will need to establish one.
  9. Makes it clear that data controllers are liable for the actions of the data processors they choose. (The controller-processor relationship should be governed by a contract that details the type of data involved, its purpose, use, retention, disposal, and protective security measures. For US companies, think Covered Entities and Business Associates under HIPAA.)
  10. Increases parental consent requirements for children under 16.
  11. Enshrines “privacy-by-design” as a required standard practice for all activities involving protected personal data. For example, in the area of app development, GDPR implies that “security and privacy experts should sit with the marketing team to build the business requirements and development plan for any new app to make sure it complies with the new regulation”.

But there is much more…

All of the above points are noteworthy, but as a parent of three children, #10 is worth a special callout. If organisations are gathering data from underage people, they must have systems in place to verify ages and gain consent from guardians.

Article 8 of the GDPR requires that companies:

  • Identify who is, or is not, a child
  • Identify who the parents or guardians of those children are.

So as you see, GDPR puts an enormous onus on any organisation that collects, processes and stores personal data for EU citizens. I’ve got feeling that 2018 will be the year metadata becomes even more important than we ever previously considered. For more on metadata management, see how Air France-KLM is using Talend Metadata Management to implement data governance with data stewards and data owners to document data and processes.


June 9, 2017  11:12 AM

Humans and machines: the partnership

Brian McKenna Profile: Brian McKenna

This is a guest blogpost by Yasmeen Ahmad , Director of Think Big Analytics, Teradata

Will you be superseded by an intelligent algorithm? Despite the fear mongering of robots taking over the world (let alone stealing our jobs), it will be a significant length of time before machines are versatile enough to do the breadth of tasks that humans can. Automating the human mind is out of reach for now but we can leverage intelligent algorithms to support and automate certain levels of decision making. Our focus is thus on augmentation – machines and humans working collaboratively.

Here are three predictions for the ever-developing relationship of the human analyst and the algorithmic machine:

  1. The ongoing need for human expertise

In the future, there will be many jobs that exist alongside smart machines, either working directly with them or doing things they cannot. But which jobs are going to get displaced, and which enhanced?

Although algorithms and AI can automate more and more labour-intensive roles, they are not yet able to complete complicated tasks such as persuading or negotiating, and are unable to generate new ideas as efficiently as solving problems. As a result, jobs that require a certain level of creativity and emotional/social intelligence are not likely to be superseded by algorithms any time soon. It’s likely job titles such as entrepreneur, illustrator, leader and doctor will stay human for now.

In addition, acting upon the intelligence of machines will still require a human in many cases. The sophisticated algorithms may predict high risk of a cancer, but it is the doctor who will relay that information to a patient. Self-driving cars may move us from point A to point B, but it is the human that will be the ultimate navigational influence deciding the destination of the journey and changes along the way.

As these applications of intelligent machines develop, the most advanced technology companies have kept their human support teams. When there is an issue with automated processes, the fixing is often carried out by a human. The need for onsite human expertise dealing with smart machines is not being eliminated: the new systems require updates, corrections, ongoing maintenance and fixes. The more we rely on automation, the more we will need individuals with the relevant skills to deal with the complex code, systems and hardware. This creates a raft of new careers, disciplines and areas of expertise not existing today.

  1. Humans will adapt skillsets to sync with machines

The jobs of tomorrow do not exist in the job ads of today. We will need to change human skillsets, and become digital-industrial people. But what exactly does this mean?

In the future machines will automate a range of tasks and survival will belong to those who are most adaptable and learning agile. Digitalisation and algorithms are creating a new interface between the worker and end task. Humans are now faced with dashboards providing indicators of machine performance. Interpreting, understanding and acting upon the data in this dashboard becomes the task of the future. A new interface is emerging for the human.

This new interface drives a change in the skillsets required. In order to adapt to the possibilities that Artificial Intelligence creates, businesses globally will have to hire a multitude of individuals who are data and digital savvy, as well as understand how to interact with machine interfaces. We will see the continued rise of new teams with data and analytical expertise to create the intelligent algorithms of the future.

Not only has technology opened up new jobs and departments within businesses, but it’s also created the requirement for completely new organisations and business models. Siemens is an example of a traditional rail industry transforming from no longer selling trains to now providing an on-time transportation service. The need for data and analytical expertise is only likely to increase as analytical automation grows: autonomous vehicles will still need mechanics, as will the self-driving systems within the vehicle.

  1. Humans in the loop: a new role will be established for analysts and business users

As we embrace AI and deep learning algorithms that automate the detection of insights – we must not lose sight of the importance of the analyst who deploys the algorithm and the user who consumes insights.

Analysts explore data, generating new ideas, being creative and solving problems by using algorithms. Machine learning and AI models will be able to harness complex data and make more accurate predictions, but it is still the human analyst that will make the decisions on what type of data to feed the algorithm, which algorithms to deploy and how best to interpret the results.

As algorithms create more and more predictions, can we leave all decision making to automated algorithms? Is there a danger that this automation will become a crutch for business users – allowing human judgement to be overlooked? It is crucial that business users are equipped to understand the value of human judgement and how to manage algorithms making questionable decisions.

If CIOs want to take the lead in introducing AI to their organisations, they should begin to identify which business processes have cognitive bottlenecks, need fast and accurate decisions or involve too much data for humans to analyse. These are the areas that can be positively impacted by human analysts leveraging algorithmic machines.

When it comes to the next step for businesses globally, augmentation with smart humans alongside smart machines is the most likely future.


May 22, 2017  10:35 AM

Multi-model databases as way to tame data management complexity

Brian McKenna Profile: Brian McKenna

This is a guest blogpost by Luca Olivari, president, ArangoDB.

Data is one of today’s biggest assets but organizations need the right tools to manage it. That’s why, during the last decade, we’ve seen new data-management technologies emerging and getting past the initial hype. The so-called “big data” products, Hadoop and its surrounding ecosystem first and NoSQL immediately thereafter, promised to help developers to go to market faster, administrators to reduce operational overhead and devote less time to repetitive tasks and ultimately companies to innovate faster.

The need for change

One could argue there’s been some success, but the craving for new and different approaches has given birth to hundreds of products all addressing a specific niche. Key value stores are blazingly fast with extremely simple data, document stores are brilliant for complex data and graph solutions shine with highly interconnected data.

Every product solves a part of the problem in modern applications. But besides having a steep learning curve (in truth, many steep learning curves), keeping your data consistent, your application fault-tolerant and your architecture lean is rather impossible.

Teams were forced to adopt way too many technologies resulting in the same issues faced at first: complexity and inelasticity. NoSQL companies realized they were narrowing their use cases too much, and started to add new features in the direction of relational data models.

Relational incumbents reacted and vendors added document-based or graph-based structures and features, removing schemas and generally trying to mimic the characteristics that made NoSQL successful at first, and semi-successful at last.

Relational databases have been so successful for one main reason: broad applicability and adaptability. All developers on earth know relational and it comes to mind as the first underlying data-management technology when building something new, regardless of the application itself… True, relational are almost good for everything, but they never excel.

There’s something wrong with two worlds colliding

Of course, you can take a petrol car, find a way to put a battery plus electric motor in and call it an electric car. You’ll end up with low range, low performance and no users. The huge success of Tesla is by design. A Tesla is superior because it is designed from scratch for e-mobility.

The reality is, the underlying architecture is so important that something that’s not built from the ground up, will never be as effective as a product conceived with the end in mind. Exponential innovations are architected in a different way and that’s why they are disruptive.

This is happening in the database world as well. There’s a new category of products that’s solving old issues in a completely different way.

Native multi-model databases

Native multi-model databases, like my own company’s ArangoDB, are built to process data in different shapes: key/value pairs, documents and graphs. They allow developers to naturally use all of them with a simple query language that feels like coding. That’s one language to learn, one core to know and operate, one product to support, thus an easier life for everyone.

Let’s quantify the benefits for a fictitious Fortune 500 customer. When you have an army of tens of thousands of developers and so many different on-premise or cloud based databases to administer, even a small improvement in productivity means a lot. An approach like ours allows you to build more things with fewer things and simplify your stack in the process. You could say the mission is to improve the productivity of every developer on earth.


April 21, 2017  10:24 AM

Beware the AI black box

Brian McKenna Profile: Brian McKenna

A guest blog by Matt Jones, Analytics Strategist at Tessella 

Big tech vendors have been piling into analytics and now AI. IBM’s Watson has been charming the media with disease diagnosis and Jeopardy prowess, and Palantir has been finding terrorists. Other major vendors such as Oracle and SAP have also been joining the scene, albeit with slightly less fanfare.

These companies have been leading tech innovation for years; one would expect them to offer a credible AI solution, and indeed their technology is good.

But the black box approach that many such companies currently offer presents problems. The first is that analytics is not a plug and play solution; it needs to be built with an understanding of data and context.

The second is that, in buying a black box solution, you lose control of your data. This means you are not sure what it’s telling you, you allow others to benefit from it for free, and you may not be able to access it in future.

And now you may even be in for a big bill for the privilege. A recent court case ruled that drinks company Diageo had to pay SAP additional licences for all customers that indirectly benefitted from SAP software used in their organisation, costs that could run to £55m.

If upheld, this could have a chilling effect on the analytics platform industry. Could every beneficiary of data insights across and beyond your organisation now need a licence? Companies should now think a lot harder before committing to a platform on which their whole business relies.

Your data is your company’s lifeblood – value it

Even before this case, it’s stunning how much control of data companies were willing to give up to vendors, and how little thought goes into the consequences of over reliance on one technology.

Vendor lock-in is nothing new of course, consumers have been allowing Apple and Google to track their every move, and global businesses putting everything in the Microsoft cloud, for years.

But data and AI represent this problem on steroids. Data projects done right are embedded throughout the entire organisation. Some solutions will suck in all your data from every system, lock it away, and even refuse you access to it. So these platforms have all your data, the context, and the insights it provides. And now they have even more power to charge you to benefit from it.

How to plan for an analytics solutions

So, how can you benefit from data analytics without storing up future problems?

Before you even think about technology, get the right mix of technical and business people to look at what your business needs to achieve and how data can help you. Then get people in with data science expertise who can explore how your data can be used to support those needs.

Only then should you look at what platforms you need to achieve this. The most powerful is not necessarily the most suitable, find one suited to your need. In doing so, consider licensing models and demands from your data – does it leave your site? Can you access it when you need to? Look at the company’s overall culture – are they transparent or opaque? This will guide you in how they are likely to handle your data.

Perhaps more important is to consider whether you need a black box at all. Google, Microsoft and Facebook, amongst others, all offer openly available Artificial Intelligence (AI) APIs on which anyone can build bespoke AI or machine learning platforms – which are as sophisticated as any black box on the market. Furthermore this allows you complete control and transparency of how the data is fed in, processed and presented, so you can identify causal links between data and outcomes, rather than having to trust someone else’s insights into your business are correct.

If you do need a black box solution – and there are times when they are the right option – you should ask whether the vendor is a partner or just a platform. Do they understand your business context? Do they integrate with your particular data setup? Do they leave you with control of your data? Do they make the data analysis process clear, so you can understand whether your business insight is based on a causal links or just an unsupported pattern spotted in the data.

The approach you take should be driven by the most appropriate approach to solving your challenge or finding the insight you require to make better decisions – not by the platform itself – and it should consider what level of data control and oversight you are willing to give up. Once you have properly defined that, then you can make the best decision about how to use your data to meet those goals.


Page 1 of 1012345...10...Last »

Forgot Password

No problem! Submit your e-mail address below. We'll send you an e-mail containing your password.

Your password has been sent to: