Software definitely helped cause the pernicious problem of fake news – but could also ameliorate it, says data expert Emil Eifrem, in a guest blogpost.
Time was it was the tabloid newspapers that cornered the market when it came to fake news (remember ‘Freddy Starr Ate My Hamster’?). These days it’s a whole industry in itself – with the technology in the hands of people who don’t even pretend to be objective journalists.
In fact, traditional journalism’s decline has meant the loss of checks and balances, allowing fake news a free and disruptive reign. Technology advances also supports the ways misinformation is spawned and travels so rapidly, as social media is after all about active sharing. According to one of the biggest studies to date on the problem, conducted by researchers at MIT and published in March 2018 in the journal Science, the truth takes six times longer to be seen on Twitter than misinformation.
Researchers tell us lies are 70% more likely to be retweeted than the truth — even when they controlled for factors such as whether the account was verified or not, the number of followers, and how old the account was. This is not good news for society, democracy and human knowledge, one can argue.
Interestingly, while technology certainly is an enabler of fake news, it may also be the answer to helping combat it. Specifically, graph database technology, a powerful way of recognising and leveraging connections in large amounts of data, may offer some hope of salvation.
Indeed graph software is already used by legitimate investigative journalists: it helped the International Consortium of Investigative Journalists track its way through the TBs of data known as the Panama and the Paradise Papers, for instance. But graph software also turns out to be a way to potentially combat fake news.
Visualising patterns that indicate fake content
It’s reported that Russia used social media in a bid to influence the 2016 US presidential election – and with graph technology’s help, the US’s NBC News has uncovered the mechanism of how that was achieved.
That’s because NBC’s researchers found that the key to detecting fake news is connections – between accounts, posts, flags and websites. By visualising those connections as a graph, we can understand patterns that indicate fake content. The group behind the Russian trolling was small, but very effective, working to leveraging Twitter with popular hashtags and posting reply Tweets to popular accounts to gain traction and followers. In one account, for example, of 9,000 Tweets sent, only 21 were actually original, and overall 75% of all the material were re-tweets, specifically designed to broadcast the messages to as wide an audience as possible. While some accounts posed as real-world citizens, others took on the guise of local media outlets and political parties.
When graph software was used to analyse the retweet network, it revealed three distinct groups – one tweeting mainly about right-wing politics, a second group with left leanings, and a final group covered topics in the Black Lives Matter movement, in an invidious but effective triangulation of extreme content sharing and emotional manipulation.
At internet scale, fake news is just too hard to spot without the right tools. We should look to graph technology, specifically designed to expose connections in data, as a possible way of helping to address the issue.
The author is co-founder and CEO of Neo4j, the world’s leading graph database (http://neo4j.com/)
This is a guest blog by Claudia Imhoff, CEO Intelligent Solutions and founder, Boulder BI Brain Trust
Like any new initiative, there are both challenges and benefits to weigh when deciding whether cloud computing is suitable for your company’s analytic environment. Let’s start with understanding the challenges.
IT governance and control – IT departments are still leery of letting go of their data. There are many reasons but certainly job loss and the concerns about security and privacy over data rank high on the list. IT is generally responsible for corporate data assets being implemented and used according to agreed-upon corporate policies and procedures. This means that service level agreements between the company’s IT department and the cloud provider are critical to ensure acceptable standards, policies and procedures are upheld. IT personnel may also want insight into how the data is obtained, stored, and accessed by its business personnel. Finally, it is recommended that IT determine whether these cloud-deployed assets are supporting your organisation’s strategy and business goals.
Changes to IT workflows – IT workflows dealing with compliance and security become more complicated in hybrid environments (those consisting of both on-premises and cloud deployments). The workflows must take into consideration the need of advanced analysts and data scientists to combine data that is on-premises with data in various cloud computing sites. Keeping track of where the data resides can be quite difficult if good documentation and lineage reports are not available.
Managing multiple cloud deployments – Often, companies have more than one cloud computing implementation; they may use a mix of both private and public deployments – maybe even multiple ones in each category. The company must determine if each cloud provider is in compliance with regulatory requirements. Also, when considering your cloud provider(s), determine how security breaches are prevented or detected. If data security concerns are great, it may make sense for the corporation to maintain highly sensitive data (like customer social security numbers, medical health records, etc.) within their premises rather than deploying them to cloud computing.
Managing costs – The on-demand and scalable nature of cloud computing services can make it difficult to determine and predict all the associated costs. Different cloud computing companies have different cost plans. Some charge by volume of data stored, others by the number of active users, and others still by cluster size. Some have a mixture of all three. Be sure to watch out for hidden costs like requested customizations, database changes, etc.
Performance – It is clear that if your provider is down, so are you. All you can do is wait for the provider to come back up. A second concern is your internet bandwidth. A slow internet means slow connectivity.
Now let’s turn to the many benefits of migrating to a cloud computing environment:
Lowered operating costs – This is perhaps the first benefit that companies realize when considering a move to the cloud. There is a significant difference between capital expenses and operating expenses. Basically, you are “renting” the infrastructure rather than bearing the costs upfront of building your own environment. The cloud computing provider bears all the system and equipment costs, the costs of upgrades, new hardware and software, as well as the personnel and energy costs.
No maintenance or upgrade hassles – These are again the headaches for the cloud computing provider. This frees up all resources to have a laser focus on obtaining, accessing, and using the data, not on managing the infrastructure.
Ease of implementation – For most companies, purchasing a cloud computing environment is as easy as swiping your credit card. It takes only minutes to access the environment because the technological infrastructure is all ready to go. This must be differentiated from the data infrastructure that must also be established. Whether you implement a data lake, a data vault, or a data warehouse, design and development work must be performed in addition to the technological set up.
Innovation from new cloud companies – Cloud technologies have been “born” from very innovative new companies. They make full use of all the advantages that the cloud has to offer. These technology companies can also add new features, functions, and capabilities, making them available to all customers immediately.
Elastic scalability – Many customers say this is the most appealing attribute of cloud computing. You can quickly scale up and down based on real needs. There is no need to buy extra computing capacity “just in case” you may need it at a later date. Cloud data warehouses can increase or decrease storage, users, clusters with little or no disruption to the overall environment.
Ability to handle the vast diversity of data available for analytics – Cloud computing providers can handle both well-structured data (like from operational systems) as well as the “unusual” data so popular today (like social media, IoT, or sensor data). Cloud implementations can support both fixed schemas and dynamic ones, making it perfect for routine production analytics like Key Performance Indicators or financial analyses as well as unplanned, experimental, or exploratory analyses so popular with data scientists.
Taking the time to identify both the challenges and benefits associated with the cloud is the first step in evaluating whether a move to the cloud is right for your organisation.
Why should Computer Weekly be any different? After all, one of our more distinguished alumni is Paul Mason, these days a public intellectual in the classical Marx tradition.
Mason’s book PostCapitalism (2015) contains an intriguing chapter in which he expands on a neglected text of Marx (the “Fragment on Machines”, published in English in 1973, in that massive tome, the Grundrisse) that seems to predict today’s information economy. Marx figures here as a “prophet of postcapitalism”, according to Mason:
“The productive power of machines like the ‘self-acting’ cotton-spinning machine, the telegraph and the steam locomotive was ‘out of all proportion to the direct labour time spent on their production, but depends rather on the general state of science and on the progress of technology, or the application of this science to production’.
“Organization and knowledge, in other words, made a bigger contribution to productive power than the labour of making and running the machines.”
Gartner’s Frank Buytendijk is another who finds value in Marx when it comes to reading today’s IT-saturated economy. In his book Socrates Reloaded: the case for ethics in business and technology (2012), he writes:
“Marx would have been the first to say [of Facebook, Google et al.] that all the ingredients of a revolt [by internet users] are there. What could happen to Internet giants if they shoot themselves in the feet by pushing their data collection and analyses too far?”
Buytendijk argues that Google and Facebook are as hungry for data as Victorian capitalists were for capital. They want to collect as much data as they can for the benefit of advertisers, not for the producers of [that data]. People alienate their data in the way Marx says we alienate our labour power. Moreover, our love of Google and Facebook’s “free” services are the counterpart of the religious “opium of the people”, in his view.
But – in a twist of the dialectic – in the networked world we can “leverage the same community-based business model of the Internet giants to overthrow them”. Go somewhere else and they crumble.
What would Marx, the nineteenth-century economist, make of the particular, venture-capital fuelled economy of Silicon Valley? Would he throw up his hands in horror? More likely, he’d analyse it; most probably at tedious length. And he’d probably applaud its dynamism, just as he hailed the dynamism of industrial capitalism in the Communist Manifesto of 1848.
But surely the hyper-individualist political and social culture of Silicon Valley would be an anathema to this enemy of individualism and promoter of the collective?
Well, maybe. However, Marx’s Economic and Philosophical Manuscripts of 1844 would seem to demonstrate an advocacy of all humans realising their individual powers, their “species-being”, once economic scarcity has been consigned to the past. It is just that they can only do so through other beings. You can’t be an individual on your own, seems to be the paradoxical gist.
Was Marx right?
The cultural theorist Terry Eagleton makes this argument in his book Why Marx was right (2011):
“For Marx, we are equipped by our material natures [as labouring, linguistic, desiring creatures] with certain powers and capacities. And we are at our most human when we are free to realise these powers as an end in themselves, rather than for any purely utilitarian purpose”.
And Eagleton contends elsewhere in the same text:
“Marx was an implacable opponent of the state. In fact, he famously looked forward to a time when it would wither away, His critics might find this hope absurdly utopian, but they cannot convict him at the same time of a zeal for despotic government”.
Even so, the movers and shakers of Silicon Valley are famously more indebted to the libertarian thinker Ayn Rand (exiled by the Russian Revolution) than Marx.
And yet Marx figures more prominently than does Rand in Silicon Valley luminary Peter Thiel’s book Zero to One (2014). Here is Thiel:
“As Karl Marx and Friedrich Engels saw clearly, the 19th-century business class ‘created more massive and more colossal productive forces than all preceding generations together. Subjection of Nature’s forces to man, machinery, application of chemistry to industry and agriculture, steam navigation, railways, electric telegraphs …. what earlier century had even a presentiment that such productive force slumbered in the lap of social labour?’”
Among his many ventures, Thiel is co-founder and chair of Palantir Technologies, a big data analysis company whose CEO, Alex Karp, wrote his PhD in dialogue with the Frankfurt School tradition of Theodor Adorno and Jürgen Habermas.
Not poles apart, then, Marx and today’s laboratory of the future on the West coast of the US (and its clones elsewhere)? (It’s moot).
But would he fit into the geek house in Silicon Valley, the HBO comedy? Given his roistering fondness for pub crawls in Soho, and his famous maxim, “Nihil humani a me alienum puto” [nothing human is alien to me], one would have thought so.
PS: In the interests of balance, fellow economist and philosopher Adam Smith’s birthday is on 16 June; we’ll have to wait till 2023 to register his 300th, along with the rest of the media. What would the author of The Wealth of Nations make of Silicon Valley?
The House of Lords report on AI and UK economy and society came out this week, with the guardedly bullish title: “AI in the UK: ready, willing and able?” The question mark is moot.
I think a strong case can be reasonably made that the government has been using AI as a fig leaf to cover the economic uncertainty generated by the Brexit decision of June 2016. It is hard to blame the prime minister or her chancellor for making this rhetorical move. Neither of them wanted the country to leave the European Union, and vaunting the UK’s putative special strengths in AI, as part of a “global Britain” narrative, provides a quantum of solace. So, why not?
And having an industrial strategy is common ground between Conservative and Labour parties today. The hands-off neo-liberalism of Thatcher and Blair seems to belong in the past.
Moreover, the report has emphasized the strategic need for the government to do more to bolster the UK’s network infrastructure to support artificial intelligence – not just to spawn new start-ups, but to improve economic productivity more generally.
Britain leads the world in AI. Really?
Deep in the House of Lords report (paragraphs 392 to 403) is a judicious dissection of the claim that “Britain leads the world in AI”. It is a cup of cold water realism rather than a bowl thereof. Nevertheless, it is realistic and balanced, and makes an argument well worth thinking about. Essentially, the report acknowledges that the US and China are the real leaders in AI, and contends that the UK should find itself a specialist niche, putting forward the ethics of AI as its preferred candidate.
Would attending to the ethics of AI give the UK enough heft in the field? We do, in the UK, have a tendency to reach for a claim to “lead the world” in doing good things. It was the Christian missionary flip-side of our gunboat diplomacy in the days of the Empire.
The CND movement, at its several peaks in the late 1950s and 1980s is a good example of this: we can lead the world by moral example, said Bertrand Russell and Bruce Kent. I marched for unilateralism myself in the 1980s, but I digress. Suffice to say it is a noble part of the British liberal tradition, and the House of Lords has often given it a home, sometimes outflanking the House of Commons on the left, ironically for the non-elected chamber. (Indeed, the Lords did this this week, with the vote to demand the government includes a customs union in its negotiation agenda with the EU).
It should not, in other words, be ruled out as an idea, this proposed UK specialization in the ethics of AI. Someone should do it.
The case for specializing in ethics
This is the train of argument in the Lords committee’s report:
“we have discussed the relative strengths and weaknesses of AI development in the UK, but questions still remain regarding Britain’s distinctive role in the wider world of AI. The Government has stated in its recent Industrial Strategy White Paper that it intends for the UK to be ‘at the forefront of the AI and data revolution’. What this means in practice is open to interpretation.
“Some of our respondents … made comparisons with the United States and China, especially in terms of funding. For example, Nvidia drew attention to the large investments in AI being made in these countries, including the $5 billion investment announced by the Tianjin state government in China, and the estimated $20–30 billion investments in AI research from Baidu and Google. Balderton Capital emphasised the ‘many billions of funding’ being invested in AI and robotics in China and the US, and argued that the UK Government needed to invest more in academic research to ensure that the UK ‘remains [?] a global leader in the field’.
“Microsoft also highlighted the disparities in computer science education, noting that ‘in a year when China and India each produced 300,000 computer science graduates, the UK produced just 7,000 ….
“However, it was more commonly suggested that it was not plausible to expect the UK to be able to compete, at least in terms of investment, with the US and China …. [W]e were greatly impressed by the focus and clarity of Canada and Germany’s national strategies when we spoke with Dr Alan Bernstein, President and CEO of CIFAR and Professor Wolfgang Wahlster, CEO and Scientific Director of the DFKI. Dr Bernstein focused on the Pan-Canadian AI Strategy’s bid to attract talented AI developers and researchers back to Canada from the United States, while Professor Wahlster emphasised that Germany was focusing on AI for manufacturing”.
There then follows the proposed UK focus on the ethics of AI:
“In January 2018, the Prime Minister said at the World Economic Forum in Davos that she wanted to establish ‘the rules and standards that can make the most of artificial intelligence in a responsible way, and emphasised that the [UK’s] Centre for Data Ethics and Innovation would work with international partners on this project, and that the UK would be joining the World Economic Forum’s new council on artificial intelligence, which aims to help shape global governance in the area.
“On the basis of the evidence we have received, we are convinced that vague statements about the UK ‘leading’ in AI are unrealistic and unhelpful, especially given the vast scale of investment in AI by both the USA and China. By contrast, countries such as Germany and Canada are developing cohesive strategies which take account of their circumstances and seek to play to their strengths as a nation. The UK can either choose to actively define a realistic role for itself with respect to AI, or be a relegated to the role of a passive observer ….
“We believe it is very much in the UK’s interest to take a lead in steering the development and application of AI in a more co-operative direction, and away from this riskier and ultimately less beneficial vision of a global ‘arms race’. The kind of AI-powered future we end up with will ultimately be determined by many countries, whether by collaboration or competition, and whatever the UK decides for itself will ultimately be for naught if the rest of the world moves in a different direction. It is therefore imperative that the Government, and its many internationally-respected institutions, facilitate this global discussion and put forward its own practical ideas for the ethical development and use of AI.”
Finally, the Lords committee has called on “the Government [to] convene a global summit in London by the end of 2019, in close conjunction with all interested nations and governments, industry (large and small), academia, and civil society, on as equal a footing as possible. The purpose of the global summit should be to develop a common framework for the ethical development and deployment of artificial intelligence systems. Such a framework should be aligned with existing international governance structures”.
It’s a thoughtful argument. And it’s surely better than wrapping AI in the Union Jack, trying to gain an edge over nations in a necessarily global field?
Nevertheless, the UK’s most obvious comparative advantage in AI is located at GCHQ, with its special relationship with the US’s NSA. Might cyber-security prove a better niche than ethics?
This is a guest blogpost by Jim Conning, Managing Director of Royal Mail Data Services (RMDS).
The forthcoming 25 May implementation date for the General Data Protection Regulation (GDPR) is focusing businesses on the whole topic of customer data. How can they ensure that they are compliant and avoid potential fines of up to 4% of global turnover? Research into customer data management from my organisation, Royal Mail Data Services highlights the pressure that companies are under – and how collaboration between IT and marketing is necessary for effective customer data management strategies.
GDPR – varying confidence levels
In a recent survey carried out by Royal Mail Data Services among key decision makers, we found that compliance with the GDPR was the number-one concern for survey respondents, with 29% citing it as their biggest worry.
Focusing on specific areas, the study asked how confident respondents were that their internally held and third-party customer data was GDPR compliant. The positive news is that 78% were either “very” or “reasonably” confident that their internally held customer data complied – although 11% were not confident, including 2% who even more worryingly didn’t know if they were compliant or not.
However, when it comes to third-party data, confidence levels drop dramatically. Just 43% of respondents were “very” or “reasonably” confident when it came to compliance, which demonstrates the difficulty of gathering evidence that the right permissions are in place when data has come from other sources. Only 9% of brands said they were very confident in their data compliance, which shows that there is plenty of work to do ahead of 25 May 2018.
Collaboration is the key
When it comes to data strategy, companies are adopting a range of approaches. Just over half (51%) of marketing teams set data strategies, while other groups such as central data management (26%) and the board (25%) were also involved. Legal and compliance teams were naturally heavily involved in privacy and permissions decisions, taking lead responsibility within 38% of organisations. Forty-four per cent of marketing departments led in this area, compared to 20% of IT/IS teams.
Responsibility for actually managing customer data is also split between different departments. IT/IS was in charge in 30% of cases, behind marketing (37%) and central data management teams (also 37%).
This demonstrates the need for departments to work closely together – each has different skills and approaches that together provide the complete solution for a business and help it to achieve its overall objectives.
Data quality is still an issue
Poor-quality data hits the bottom line, and survey respondents recognise this – they estimated the average cost to the business of poor-quality customer data to be around 6% of annual revenue. For major brands this is measured in millions of pounds – and this excludes any potential GDPR fines.
So what leads to poor-quality data? Respondents saw basic errors as the main culprits, specifically out-of-date information and incomplete data. Increasing automation around validation helps overcome this – but a significant minority (19%) of survey respondents said they didn’t validate website data, and 16% didn’t check data coming into internal systems at all. A similar gap is visible when it comes to data cleansing. While 22% of companies undertake this daily or continuously, one-third (33%) still have no formal processes in place to clean customer contact data. Overall, many businesses are putting themselves at risk of data-quality issues – and potential GDPR investigations over non-compliance.
The Royal Mail Data Services research demonstrates that GDPR is acting as a wake-up call to organisations, providing an opportunity to focus on how they collect, manage and store customer data. Successfully achieving compliance and getting the best out of customer data therefore requires IT and other departments to work together, now and in the future.
You can download a full copy of the research report, “The use and management of customer data”, here.
This is a guest blogpost by Neo4j’s CEO Emil Eifrem, in which he says graph databases are about to grow up
Graph technology has come a long way: from financial fraud detection in the Panama and Paradise papers to contextual search and information retrieval in NASA’s knowledge graph and its support for true conversational ecommerce in eBay’s ShopBot.
What propels this success is graph’s unique focus on data relationships. And we’ve witnessed the value of connected data explode, as businesses look to drive innovation as they connect supply chain, IoT devices, marketing technology, logistics, payment history, making the value of connectedness across all those data elements increase exponentially.
But only a decade ago, the graph industry was just Neo4j and a few niche players. In the subsequent years other startups made their entrance as part of the NoSQL revolution, while more recently tech giants such as Oracle, Microsoft, SAP and IBM have each produced graph offerings of their own. Today the graph paradigm offers choices – with native platforms, NoSQL multi-model containers and embedded-in-SQL variants.
Amazon’s long-standing absence from this list of tech behemoths was always a notable irony, given that its business models, in both ecommerce and the data centre, are so graph-influenced. So the recent launch of Amazon Neptune is a welcome progression, marking the full acceptance of graph software into the mainstream. Amazon’s entrance should be welcomed by the graph database market, as it will drive the growth generally and contribute to graph technology’s commercial success.
As with all markets, more competition and choice means stronger market and better products. Ultimately, customers will benefit.
Still in the graph database kindergarten
Now that all of the major database players are in the graph game, the next phase of the market’s development will be all about solutions – though it’s evident that we are only at the beginning of this journey. Graph platforms will likely become foundational elements of corporate technology stacks, interweaving different types of data sources, applying comprehensive graph analytics, deploying easy-to-use graph visualisation tools and constructing purpose-built graph-based applications, which will speed widespread adoption.
Creating the graph ‘SQL‘
Second, to achieve widescale adoption, the market needs a standard graph query language analogous to SQL that is simple as well as easy to learn and implement.
I believe Cypher will become this standard, because in addition to years of real-world validation it has by far the widest adoption among actual graph end users.
Cypher is overseen by the openCypher project, whose governance model is open to the community; it now has over 30 participants including academics, vendors and enthusiasts. To date, Cypher is used by Neo4j, SAP HANA, Redis Graph and Splunk, and the project has released Cypher for Apache Spark and Gremlin. Amazon is interesting, having hedged bets on two older languages; its decision here may well have an influence.
The graph community is growing
Finally, along with this commercial success comes a growing interest in graph skills and awareness. The community needs to ensure that every developer, data scientist, data architect and even business analyst is skilled in graph technology.
2017 was a massive year for graphs. More entrants into the graph community means 2018 will be even bigger.
Philip Hammond’s Spring statement, as UK chancellor, reached, predictably, for the rhetoric of the so-called fourth industrial revolution.
Not for the first time. Whenever he gets the chance to say the UK is in the forefront of artificial intelligence, big data analytics, and so on, and so forth he takes it. He might be taking his “spreadsheet Phil” moniker a bit too seriously.
This nationalistic appropriation of AI/machine learning functions as a fig leaf for Brexodus, it almost goes without saying. “Don’t worry about Brexit, we’ve got the AIs and the hashtags to keep us warm”, is the gist of government patter here, whether from Hammond or Amber Rudd, home secretary. How much any of them know about technology is anyone’s guess.
Hammond seems to believe Matt Hancock, secretary of state for culture, media and (also) sport, is himself a product of the software industry — of which he is, admittedly, a scion. This is Hammond, speaking in the House of Commons this week:
“Our companies are in the vanguard of the technological revolution.
And our tech sector is attracting skills and capital from the four corners of the earth.
With a new tech business being founded somewhere in the UK every hour.
Producing world-class products including apps like TransferWise, CityMapper,
And Matt Hancock.”
Hilarious. And Theresa May, the prime minister, is always keen to get in on the 4IR act. Her speech in Davos, to a half-empty hall, was long on technology rhetoric, and short on detail about what the global elite are interested in – viz Brexit.
Now, there is no denying the UK does have some unusual strengths in AI, at least in terms of academic research, and the start-ups therefrom. One can only wonder at the world-class work undoubtedly going on at GCHQ under the AI banner. The UK must, surely, have an advantage to squander?
Hopefully, the forthcoming House of Lords Select Committee report on artificial intelligence will provide a balanced, cool, rational, non-flag waving description of the state of the art in the UK, and offer some policy that will make a positive difference to our economy. But it will only do so if it takes the measure of some of the AI scepticism expressed in the committee’s hearings towards the end of last year. And appreciates that there are different sides in the debate on AI among people who know what they are talking about. It’s not all Tiggerish enthusiasm, whether nescient or not.
This is a guest blogpost by Luiz Aguiar, data scientist at GoCompare.
We produce a massive amount of data every day.
Not only that, our attitudes towards the data we produce are also changing. We’re becoming more comfortable sharing the data we produce with apps, businesses, and other entities, if it means getting better services.
Most of us are happy for companies like Google, Amazon or Netflix to know our preferences to better tailor the content we are served, or recommend the things we want to buy. We’re even inviting these companies into our homes by embracing AI systems, like Alexa, Google Home or Siri to make our lives easier, by using the data we provide them.
So if we produce data at exponential speed and are happy to share it for get tailored services, why aren’t more companies taking advantage of this? Why do so many still rely solely on traditional market research and guesswork?
The key problem is that the sheer amount of data available means it’s hard for companies to analyse it effectively. It would take forever for a person to be to be able to analyse all the data we provide and get some insight from it, let alone being to design better services as a result.
The problem of unstructured data
Not only is the sheer volume of information a problem for analysts, another issue is that the majority of this data is unstructured making it incredibly hard to classify and compare.
That’s because the information we produce is not in the right format, shape or requires some enrichment.
As an example, imagine you are in a restaurant deciding what to order. The likelihood is you’ll look through the menu and choose one of the options based on the information available – this is structured data.
In comparison, unstructured data would be like sitting down to a list of every single raw ingredient and cooking utensil available in the kitchen, then having to piece it all together to figure out what you want. All the information is there – but just not in an easily accessible way.
Obviously, the first option is the easier one to process, the second would be too daunting and complex for a person to analyse and make a quick decision – and this is where machine learning can help.
Machine learning runs information through a series of algorithms that classify and group data and then uses this to find patterns and subsequently predict future behaviours, all on an enormous scales. In short, machine learning techniques are able to extract insights deeply hidden inside your data, that otherwise would be impossible to detect.
Thinking back to our restaurant example, while a person might struggle to sift through the unstructured data for just one establishment, a well-trained AI could do this for any restaurant in the country, or even the world.
Then, using other information about you it could make an informed decision of what you should eat, when you should eat and where you should eat – giving you the best possible experience, without you having to even think about it.
And that’s just one example. Algorithms as Artificial Neural Networks, that try to mimic the functions of a biological neural network are very powerful in pattern recognition and image classification. They have the potential to do a better job than humans at recognising stock market trends, house prices, insurance costs, medical diagnoses, you name it. The possibilities are almost endless.
This is why you should care about machine learning, and why over the next few years machine learning and AI won’t just be the buzzword that everyone is talking about, but will be the fundamental difference between successful tech companies and those that get left behind.
GoCompare has opened access to its APIs to other fintech organisations through a new community development, Machine Learning for Fintech. For more information, or to apply for a developer token, go to https://www.communityapis.com/
Originally from Rio de Janerio, Luiz holds completed his an MSc in Computer Science Optimisation and Machine Learning from the Pontifical Catholic University of Rio de Janerio.
Luiz moved to England in July 2015 and worked for Formisimo as lead data scientist on the Nudgr project and Perform Group as a Data Scientist, before joining the Data Science team at GoCompare.
This is a guest blogpost by Dave Wells, practice director, data management at Eckerson Group.
If there’s one thing the IT industry is exceptionally good at it, it’s proclaiming the death of a particular technology. In the mid 1980s industry observers sagely pronounced that COBOL was dead. Fast forward to today and COBOL still playing a role in healthcare for 60 million patients daily, 95% of ATM transactions, and more than 100 million lines of code at the IRS and Social Security Administration alone. I can’t help but recall Mark Twain’s famous quote, ‘the reports of my death have been greatly exaggerated!’
It’s not only COBOL that people want to consign to history. In 2013 SQL was declared dead, yet thousands of SQL job postings can be found on the web today. Just recently I heard that popular programming language Ruby was on its last legs. And then we have the data warehouse: over the last few years, there’s been a steady stream of obituaries announcing that the data warehouse was about to be consigned to the technology graveyard. But when surveys such as that conducted by Dimensional Research show that 99% of respondents see their data warehouse as important for business operations and 70% are increasing their investment in data warehousing, it appears the data warehouse remains very much alive.
But here’s the issue, while the data warehouse is alive, it also faces many challenges today. The root of the “data warehouse is dying” claim comes from the opinion that it hasn’t ever completely delivered on its promised value. The original vision was a seductive one – got a ton of data but no way to leverage it? No problem. Put it in a data warehouse and you’ll be extracting valuable insights to drive competitive edge in hours. Except, you couldn’t. Companies found that using traditional and very manual tools and processes, building and managing data warehouses wasn’t quite as easy as promised. Once built, typically, the data warehouses didn’t scale well, weren’t particularly agile or easy to rely on (due to performance variability), and, later on as needs evolved, they weren’t particularly well equipped for coping with the challenges of big data.
Data warehousing in the cloud
But, but, but…. The very fact that so many companies have clung doggedly to their (imperfect) data warehouse tells us that they are extracting some value. It’s just that it could be so much more. Enter the data warehouse of the cloud computing age. By migrating to the cloud, some classic data warehouse challenges disappear. Can’t scale or be agile in providing data quickly to those who need it? The cloud data warehouse changes that. Need to deploy rapidly but also dial up (and down) investment? The cloud data warehouse allows you to do that. And if you’re faced with the argument that the cloud erodes confidence in data governance and compromises the reliability of the data warehouse, well, there’s an answer to that too.
However, if we’re to constructively stem the expert proclamations of data warehouse demise, we must re-evaluate the original simplistic expectations of data warehousing as a one-size-fits-all, never evolving data infrastructure model for every organisation to reach its best use of data. Data warehousing must be fluid as organisational needs change and new data technologies and opportunities arise. And to accomplish that, we need to modernise how IT teams design, develop, deploy and operate data infrastructure. Expensive, redundant, laborious and time-intensive efforts intertwined with the use of traditional, non-automated approaches have limited organisational value greatly and cast a heavy cloud over data warehousing. However, organisations using automation software, such as Wherescape’s, to develop and operate data warehouses are providing far-reaching value to business leaders at greater speeds and less cost, while at the same time positioning IT to more easily incorporate timely technologies, new data sources and flex as business needs demand. With these adjustments, the reality of the data warehouse can better live up to the associated vision, and continue to deliver much more to organisations for many years to come.
This is a guest blogpost by Matt Jones, lead analytics strategist at Tessella, in which he argues companies with physical products and infrastructure cannot simply cut and paste the tech giant’s AI strategy
Much written about AI seems to assume everyone wants to emulate Google, Facebook, or other companies built around data.
But many organisations look nothing like these tech giants. Companies in manufacturing, energy, and engineering – long standing, multi-billion-pound industries – derive revenues from physical products and infrastructure, not from targeting adverts at groups or individuals. Their data is usually collected from industrial machines and R&D processes, not people and internet spending habits. Their data collection is often bolted onto decades-old long lived internal processes, not built-in by design.
This type of data will deliver insights such as whether a factory can operate safely or predict the active properties of a new drug like molecule; not whether clicks turn into sales. This is very different from the insights that companies like Google are generating and looking at, and these pre-digital companies must take a very different approach to deriving benefit from AI.
CIOs at these companies can learn from the tech giants but trying to cut and paste their approach is a route to AI failure. Based on our work with companies built in the pre-digital age, we at Tessella recently produced a white paper outlining 11 steps that these pre-digital companies must take if they are to drive growth and stay competitive with AI. Broadly, these steps fall into three categories: building trust into AI, finding the right skills, and building momentum for AI programme delivery.
Trust is important
A key difference between the digital native companies and pre-digital enterprises is that the latter are often looking for very specific insights. Digital companies can afford to experiment and accommodate imprecision; a badly targeted advert will do a little harm. But an AI designed to spot when a plane engine or off-shore oil rig subsurface structure might fail demands absolute certainty.
Pre-digital companies cannot simply let an AI loose on all their data and see what patterns emerge, such unsupervised training experiments may provide estimations or suggestions, but they cannot be depended upon to inform an empirical solution. In these high-risk cases, there is a higher need to find the right data in order to effectively train AIs in a supervised learning regime.
Too many companies start by trying to pool all their data, perhaps looking admiringly at what Facebook and Amazon can do. For most, this is costly and unnecessary, at least in the short term. Companies should start by defining the problems AI can solve, identify the data needed to solve that problem, put people, technology and processes in place to collect and tag that data, then turn it into AI training data.
As AI is developed, there is also a need to maintain oversight to ensure the AI is delivering trustworthy results. Basic AI governance in high risk situations must include random sampling of AI outcomes and checking them against human experts for accuracy.
Finally, AI interaction, the user experience, must be intuitive, or it will not be taken up. AI decision support must take advantage of data visualisation and search technologies to ensure results are presented in meaningful ways. We can learn from digital native companies here, who are experts at making things easy for users: Google Photos runs neural networks, image analysis, and natural language understanding, but all the user needs to master is a search bar.
People not platforms
The temptation can be to completely hand over the problem to so-called data experts, or to buy in expensive technology platforms. But this misses an important point: that AI isn’t about spotting patterns, it’s about understanding what those patterns mean.
AI needs people who understand that data represents something in the real world – material strain, temperature readout, chemical reactions, maintenance schedules – and who can put together effective training regimes. AI should therefore be designed by people who understand the underlying data and what it represents within this business context. The best teams include representatives from IT, operations and business teams, domain experts partnered with embedded AI and data analytics experts who not only possess technical expertise but can also translate between these different roles.
We can again learn from the digital native companies. It is notable that these companies spend their budgets hiring the best people to design AIs which are right for them, not on buying in off the shelf technologies. Whilst the pre-digital companies will need different skill sets and more specific industry understanding in their AI teams, the focus must remain upon finding these right skills. This is the key to AI success, regardless of industry.
The digital native companies started from scratch and created the digital world, which they went on to lead. Longer established companies do not have this luxury – they come with decades of development in a pre-digital world, which has now been upturned and potentially disrupted. Many of their staff and processes are not ready for this new data driven world. They cannot just switch overnight; however ambitious their CIOs might be.
Such companies should set long term goals of digitalising processes and identifying where they see AI automating and advising. But they must work towards this goal determinedly and transparently keeping their people informed and engaged with the digital transformation; gradually shifting the business model and bringing existing staff with them on the journey. Starting too big without a carefully planned digital roadmap often undermines effectiveness and impact.
Pre-digital companies should initially focus on well-understood opportunities that can be executed quickly, with clear measurable milestones to demonstrate success built into their roadmap. This should be accelerated by running multiple agile AI projects in parallel, ensuring the best ideas are progressed rapidly. This will build a critical momentum for AI change programmes.
As they go, they should monitor their many AI projects, checking relative performance of each, immediately abandoning the bad ideas, and using successes (and failures) to improve training regimes. This agility is how digital companies deliver innovation but is lacking in many pre-digital organisations.
To summarise: physical enterprises undergoing digital transformation can and must harness the disruptive potential of AI. If they don’t, they will quickly be outpaced by competitors, startups or even tech giants with an eye on expansion. They start from very different positions to digital native companies. If they want AI to deliver business impact, they must mindfully find their own approach to people, processes, technology and management and form close, strategic partnerships with those that will build momentum behind an AI enabled digital transformation.