It’s indisputable that technology is displacing many of today’s jobs. The question is: what should, or can, we do about it? This series explores the possible consequences of this shift and how information use and decision making support can, and should, drive a better outcome through enhanced and expanded Business Intelligence. Part 3 is dystopian; I apologize in advance.
“The massive forces of globalization and technological progress are removing the need for a lot of the previous kind of white-collar workers,” according to Andrew McAfee of the Center for Digital Business at the M.I.T. Sloan School of Management in a recent New York Times article. It’s just the logical outcome of the trends described in Part 1 and Part 2 of this series. The outcome is the increasing technological displacement of traditional middle and lower-middle job types, combined with continuing downward pressure on wages for these jobs. In response, job seekers are forced to accept lower skilled and paid jobs, often as gig work, without long term security, as well as holding down multiple jobs to make ends meet. Lower incomes and less leisure time will drive down consumption of mass-produced goods, and cause producers to cut costs further, driving further unemployment. A classic race to the bottom. This simple analysis applies to Western economies. In developing economies, further factors come into play, which deserve deeper consideration, but the end result would appear to be largely the same. The economic impacts are severe. In fact, the economic model—as we have operated it for more than two centuries—becomes untenable.
This is not at all about the allegedly coming Singularity; it’s simply about the progression of technology. When technology displaces some yet to be determined percentage of labor, this system becomes unbalanced: there are simply not enough people with sufficient money to buy the products made, no matter how cheaply. We have not yet reached this tipping point because, throughout most of the past two hundred years, the new jobs created by technology have largely offset the losses. However, recent employment trends in the Western world suggest that this effect is becoming less effective.
The subsequent societal disruption can be imagined to be catastrophic. Increasing inequality, already visible today, drives social unrest. Strikes, both legal and “wildcat” become widespread. Many people, whose sense of identity and self-worth is tied to a productive job, drop out, abuse drugs and self-destruct. Violent protest against technologically driven change, already visible around Uber, grows by leaps and bounds. Economic migration, within and across national borders, in search of sustenance becomes endemic. Ghettoization of society ensues: vast sprawling near-shanty towns house the disempowered, while the reducing numbers of the elite retreat into gated communities behind high walls, razor wire and armed patrols.
The outcome may not (yet) reach the “Mad Max” scenario, but a visitor to Brazil or South Africa—to name but two of many examples—can immediately get an idea of how such a dystopian society can emerge, and is already doing so. Crime becomes a way of life, corruption abounds and society disintegrates. I believe that the most likely outcome of the “head in the sand” stance being taken by many economists and most politicians today is to end badly in a dystopian nightmare.
Whither BI in such an environment? With marketing, customer service and, even, worker productivity become memories of a bygone era, the role of BI must inevitably move to the maintenance of wealth and power for those who have them. While Mad Max may focus on mean machines built from scrap automobiles—and they make for more visceral movies—the dispossessed will continue to hack communications and computer security, making the role of data analytics as a defense even more important. But it’s a restricted and increasingly inwardly-focused BI in dystopia.
As a technologist or data management expert following this series, I imagine that the head-in-the-sand and dystopian stances make, at best, distressing reading. Must it end like this? Is there anything that we can do to avoid the Fall? I believe there is. There is a better way, and as I shall demonstrate in the fourth and final part of this series, business intelligence, big data and analytics will be important enablers of the utopian stance.
It’s indisputable that technology is displacing many of today’s jobs. The question is: what should, or can, we do about it? This series explores the possible consequences of this shift and how information use and decision making support can, and should, drive a better outcome through enhanced and expanded Business Intelligence. Part 2 looks at the head in the sand reaction.
In Part 1 of this series, I introduced the three common stances that are taken when confronted with the issue of technological unemployment. Let’s take a deeper look at the first of them now.
Head in the sand
Many mainstream technologists and economists suggest that the jobs market is simply going through a period of re-adjustment—albeit a rather large and painful one—as new technology is adopted. This opinion seems founded mostly on the basis that in previous technology revolutions, such as the move from agriculture to industry in the 1800s and the move from industry to services still ongoing, new jobs have always been created to replace those displaced. Of course, the above timeframes apply to Western economies; emerging economies are at different stages in these transitions. The proposed solutions center on improved and ongoing education, as well as skills diversification. The underlying premise is that there exist, or will soon be created, jobs where robots or algorithms cannot perform better, faster and/or especially cheaper than humans.
The history of predictions of what automated software/hardware solutions cannot do gives little confidence, however. To give but one example, in “The New Division of Labor”, in 2004, the authors describe how driving an automobile requires such complex, instantaneous decisions and actions that it would be extremely difficult for a computer ever to handle it; Google debuted its autonomous car within six years. To be fair to the authors, few people actually get the consequences of the exponential growth rate in computing power that doubles every two years or so. Today’s computers are some 30-40 times more powerful and considerably more cost effective than those of 2004. Whether driving cars or analyzing images for cancerous cells, picking goods from warehouse shelves or making evidence-based recommendations or predictions, technology is displacing an ever-increasing number of previously human activities. My recent TechCrunch article gives some idea of the numbers: they’re not pretty.
On the plus side, new job types are indeed being created. However, their numbers seem small in comparison to those being displaced. A brief review of the possible top jobs in the next ten years, including sex workers (!), from three leading futurists does little to convince that the jobs envisaged will replace the some 4 million driving and support jobs threatened by autonomous cars and trucks.
A recent Fortune article offers the more hopeful view that jobs demanding human accountability, collaborative decision making and interpersonal skills will both be in demand and resistant to automation. I will return to this possibility, in conjunction with “real” BI (actually, Business unIntelligence), as key aspects of the (somewhat) utopian stance. However, from a more contrary viewpoint, we also see robotics aimed at displacing roles that demand human empathy and interaction. The US National Science Foundation (NSF) is spending roughly $1.2 million to fund research on how robots could dress the elderly. Meanwhile, SoftBank has created Pepper, “a social robot able to converse with you, recognize and react to your emotions, move and live autonomously”—seriously!
In the head in the sand stance, business intelligence (BI) plays the traditional role for which it is widely criticized in many businesses: as a means of justification of and reporting on maintaining the status quo. There are always facts and figures to be found and trends to be discovered that justify any viewpoint, especially a mainstream, entrenched view. And, who better to do that than those with their heads in the sand and a deep attachment to the mechanistic, overly rational decision making approaches of the past? In these circumstances, BI definitely makes a meaningful contribution for those involved, but it offers nothing to the understanding or solution of the real issue involved here. Namely, from where will the new sources of income emerge that enables the old consumerist wheel turning?
In part 3 of this series, I address one possible outcome of the wheel seizing up: the dystopian stance where the economy crashes and burns.
It’s indisputable that technology is displacing many of today’s jobs. The question is: what should, or can, we do about it? This series explores the possible consequences of this shift and how information use and decision making support can, and should, drive a better outcome through enhanced and expanded Business Intelligence.
I’ve written occasionally and at length in a Feb-Mar 2014 series on the impact of technology advances on employment. My basic thesis was—and is—as follows. Mass production and competition, facilitated by ever improving technology, have been delivering better and cheaper products and improving many people’s lives (at least in the developed world) for nearly two centuries. Capital, in the form of technology, and people–labor—have worked together relatively well in the consumer society to produce goods that people purchase largely using earnings from their labor. Until now…
As technology grows exponentially better, the return on capital investment in automation technology is improving significantly in comparison to return on investment in labor. The primary goal of the capitalist model is to maximize return on investment. As a result, an ever greater range of jobs become open to displacement by technology. To me, at least, the above logic is largely unarguable. For example, driverless vehicles, from trucks to automobiles, are set to eliminate some 4 million jobs in the US alone. Any complacency that only manual/physical jobs will be displaced by automation is erroneous; many administrative and professional roles are already being outsourced to rapidly improving software solutions. Across the entire gamut of industries and job roles, technology—both hardware and software; and, increasingly, a combination of both—is proving better and/or faster than human labor, and is indisputably cheaper, particularly in developed consumer economies.
What are the possible outcomes from such a dramatic shift in the relative roles and importance of capital (technology) and labor (people)? Let’s keep it simple and restrict the discussion to three main stances that I’ll introduce briefly here, but consider later in depth:
- Head in the sand: the belief of many mainstream technologists and economists that we’re simply going through an adjustment period, after which “normal service will resume” in the market
- Dystopian: the story that our economic and social system is so deeply embedded and increasingly fragile that the shock of such change will lead to a rapid descent to a “Mad Max” world order
- (Somewhat) Utopian: the possibility that we can create a better world for everyone through automation and the transformation of our current economic and social paradigms
Of course, my preference is for option three above! But, how might it work and how would we get there? I believe that judicious application of many of the principles and approaches of Business Intelligence (BI), data warehousing, big data governance in the broadest sense of the concepts, will play a vital role in the new world, and particularly in the transition to it. BI et al. is fundamentally about how decisions are made and how the people who make them can be supported. And business includes the business of government. In the old, narrow sense, BI meant simply providing data from internal systems to decision makers. In the widest sense, which I call “Business unIntelligence”, it encompasses the full scope of such decision making support, from the ingestion and contextualization of all real-world information to the psychological and sociological aspects involved in real humans making optimal decisions. Decisions that increasingly need to go beyond the bottom line of profit.
As of now, I’m not clear where this discussion will take us. But I’d love to incorporate your views and comments. In the next post, I’ll explore the above-mentioned possible stances on the effects of technological unemployment.
Part 2 tackles the head in the sand stance.
The need to clarify the context of information is becoming vital as big data and the Internet of Things become ever more important sources in today’s biz-tech ecosystem.
Suddenly, it seems, it’s almost three months since my last blog entry. My apologies to readers: it’s been a busy time with consulting work, slide preparation for a number of upcoming events in Munich, Rome and Singapore over the coming weeks, and a revamp of my website with a cleaner, fresher look and a mobile friendly layout.
I pick up on a topic that’s close to my heart: the discovery and creation of context around information, triggered by last week’s BBBT appearance of a new startup, Alation, specializing in this very area. It’s a hot topic at present with a variety of new companies and acquisitions making the news over the past 6 to 12 months.
For a number of years now, the IT industry has been besotted with big data. The trend is set to continue as the Internet of Things offers an ever expanding set of bright, shiny, data-producing baubles. The increasing use of data, in real time and at high volumes is driving a biz-tech ecosystem where business value and competition depends entirely on the effective use of IT. What the press often misses—and many of the vendors and analysts too—is that such data is meaningless and, thus, close to useless unless its context can be determined or created. Some point to metadata as the solution. However, as I’ve explored at length in my book, “Business unIntelligence”, metadata is really too small a word to cover the topic. I prefer to call it context setting information (CSI), because it’s information rather than data, its role is simply to set the context of other information, and, ultimately, it is indistinguishable from business information—one man’s business information is another woman’s CSI. In order to describe the full extent of context setting information, I introduced m³, the modern meaning model, that relates information to knowledge and meaning, as shown above. A complete explanation of this model is beyond the scope of this blog, so let’s return to Alation and what’s interesting about the product.
Alation CEO, Satyen Sangani, @satyx, posed the question of what it means to be data literate. At a basic level, this is about knowing what a field means, what a table contains or how a column is calculated. Pressing a little further, questions about the source and currency of data, in essence its quality, arise. Social aspects of its use, such as how often it has been used and who uses it for what, complete the picture. Understanding this level of context about data is a vital prerequisite for its meaningful use within the business.
When dealing with externally sourced data, where precise meanings of fields or calculations of values are unreliable or unavailable, the social and quality aspects of CSI become particularly important. It is often pointed out that data scientists can spend up to 80% of their time “wrangling” big data (see my last blog on Trifacta). However, what is often missed is that this 80% may be repeated again and again by different data scientists at different times on the same data, because the results of prior thinking and analysis are not easily available for reuse. To address this, Alation goes beyond gathering metadata like schemas and comments from databases and data stores to analyzing documentation from wikis to source code, gathering query and usage data, and linking it all to the identity of people who have created or used the data. Making this CSI available in a collaborative fashion to analysts, stewards and IT enables use cases from discovery and analytics to data optimization and governance.
This broad market is red-hot at the moment and rightly so. Big data and the Internet of Things demand a level of context setting previously unheard of. I’ve previously mentioned products in this space, such as Waterline Data Science and Teradata Loom. A challenge they all face is how to define a market that does not carry the baggage of old failed or difficult initiatives such as metadata management, data governance or information quality. Don’t get me wrong, these are all vital initiatives; they have just received very bad press over the years. In addition, there is a strong need to move from perceived IT-centric approaches to something much more business driven. Might I suggest context setting information as a convenient and clarifying category?
When it comes to externally-sourced data, data scientists are left to pick up the pieces. New tools can help, but let’s also address the deeper issues.
Trifacta presented at the Boulder BI Brain Trust (#bbbt) last Friday, 13 March to a generally positive reaction from the members. In a sentence, @Trifacta offers a visual data preparation and cleansing tool for (typically) externally-sourced data to ease the burden on data scientists, as well as other power data users, who today can spend 80% of their time getting data ready for analysis. In this, the tool does a good job. The demo showed an array of intuitively invoked methods for splitting data out of fields, assessing the cleanliness of data within a set, correcting data errors, and so on. As the user interacts with the data, Trifacta suggests possible cleansing approaches, based on both common algorithms and what the user has previously done when cleaning such data. The user’s choices are recorded as transformation scripts that preserve the lineage of what has been done and that can be reused. Users start with a sample of data to explore and prove their cleansing needs, with the scaled-up transformations running on Hadoop within a monitoring and feedback loop.
This is clearly a useful tool for the data scientist and power user that tackles a persistent bottleneck in the journey from data to insight. It also prompts discussion on the process that should exist around the ingestion and use of external data.
There is a persistent desire to reduce the percentage (to zero if possible!) of time spent by data scientists in preparing and cleansing data. Yet, if we accept that such practitioners are indeed scientists, we should recognize that in “real” science, most of the effort goes into experiment design, construction and data gathering/preparation; the statistical validity and longer term success of scientific work depends on this upfront work. Should it be different with data scientists? I believe not. The science resides in the work of experimentation and preparation. Of course, easing the effort involved and automating reuse is always valid, so Trifacta is a useful tool. But, we should not be fooled that the oft quoted 80% can or should be reduced to even 50% in real data science cases. And among power users, their exploration of data is also, to some degree, scientific research. Preparation and discovery are iterative and interdependent processes.
What is often further missed in the hype around analytics is that after science comes engineering: how to put into production the process and insights derived by the data scientists. While there is real value in the “ah-ha” moment when the unexpected but profitable correlation (or even better, in a scientific view, causation) is found, the longer term value can only be wrought by eliminating the data scientists and explorers, and automating the findings within the ongoing processes of the business. This requires reverting to all the old-fashioned procedures and processes of data governance and management, and with the added challenge that the incoming data is—almost by definition—dirty, unreliable, changeable, and a list other undesirable adjectives. The knowledge of preparation and cleansing built by the data scientists is key here, so Trifacta’s inclusion of lineage tracking is an important step towards this move to production.
Remember lastminute.com? How is this for their last word on personal data?
Important information about your personal data
With effect from today, the lastminute.com business has been acquired by Bravofly Rumbo Group. As a result, your personal data has been transferred to LMnext UK Ltd (a member of the Bravofly Rumbo Group) registered in England and Wales with company registration number 9399258.
LMnext UK Ltd is committed to respect the confidentiality of your personal data and will process it fairly and lawfully and in accordance with applicable data protection law.
You are also reminded that you may exercise your rights of access, rectification or removal of your personal data from our database at any time by sending a written request to lastminute.com, Dukes Court, Duke Street, Woking, Surrey, GU21 5BH providing a copy of your ID.
Please do not hesitate to contact us if you have any queries
The team at lastminute.com and Bravofly Rumbo Group”
I assume that they know my name, since they are holding my personal data, but they can’t rise to a mail-merge process for customer relationship?
More irritatingly, they demand a physical instruction with a scan of my ID for a removal. Why? Is it because there is more interesting data about me to be scraped from said ID? Or is it just to discourage me from asking?
So, no I won’t be asking for removal from their database. Nor will I ever do business with them or any company to whom they pass my data. This e-mail is symptomatic of the lack of respect in which many companies hold our personal data. In itself, it not a big deal. But, taken in a broader context, it epitomises the old adage: caveat emptor or even caveat scriptor!
In building out its Internet of Things, is HDS acquiring a data refinery, a data lake or a data swamp? See also Part 1
The Data Lake has been filling up nicely since its 2010 introduction by James Dixon, with a number of vendors and analysts sailing forth on the concept. Its precise, architectural meaning has proven somewhat fluid, to continue the metaphor. I criticized it in an article in April last, struggling to find a firm basis for discussion of a concept that is so architecturally vague that it has already spawned multiple interpretations. Dixon commented in a September blog that I was mistaken and set forth that: “A single data lake houses data from one source. You can have multiple lakes, but that does not equal a data mart or data warehouse” and “A Data Lake is not a data warehouse housed in Hadoop. If you store data from many systems and join across them, you have a Water Garden, not a Data Lake.” This doesn’t clarify much for me, especially when read in conjunction with Dixon’s response to one of his commenters: “The fact that [Booz Allen Hamilton] are putting data from multiple data sources into what they call a ‘Data Lake’ is a minor change to the original definition.”
This “minor change” is actually one of the major problems I see from a data management viewpoint, and Dixon admits as much in his next couple of sentences. “But it leads to confusion about the model because not all of the data is necessarily equal when you do that, and metadata becomes much more of an issue. In practice these conceptual differences won’t make much, if any, impact when it comes to the implementation. If you have two data sources your architecture, technology, and capabilities probably won’t differ much whether you consider it to be one data lake or two.” In my opinion, this is the sort of weak-as-water architectural thinking about data that can drown implementers very quickly indeed. Apply it to the data swamp that is the Internet of Things, and I am convinced that you will end up on the Titanic. Given the obvious focus of HDS on the IoT, alarm bells are already ringing loudly indeed.
But there’s more. Recently, Dixon has gone further, suggesting that the Data Lake could become the foundation of a cleverly named “Union of the State”: a complete history of every event and change in data in every application running in the business, an “Enterprise Time Machine” that can recreate on demand the entire state of the business at any instant of the past. In my view, this concept has many philosophical misunderstandings, business misconceptions, and technical impracticalities. (For a much more comprehensive and compelling discussion of temporal data, I recommend Tom Johnston’s “Managing Time in Relational Databases: How to Design, Update and Query Temporal Data”, which actually applies far beyond relational databases.) However, within the context of the HDS acquisition, my concern is how to store, never mind manage, the entire historical data record of even that subset of the Internet of Things that would be of interest to Hitachi or one of its customers. To me, this would truly result in a data quagmire of unimaginable proportions and projects of such size and complexity that would dwarf even the worst data warehouse or ERP project disasters we have seen.
To me, the Data Lake concept is vaguely defined and dangerous. I can accept its validity as a holding pond for the vast quantities of data that pour into the enterprise in vast quantities at high speed, with ill-defined and changeable structures, and often dubious quality. For immediate analysis and quick, but possibly dirty, decisions, a Data Lake could be ideal. Unfortunately, common perceptions of the Data Lake are that, in the longer term, all of the data in the organization could reside there in its original form and structure. This is, in my view, and in the view of Gartner analysts and Michael Stonebraker, to name but a few, not only dangerous in terms of data quality but a major retrograde step for all aspects of data management and governance.
Dixon says of my original criticism “Barry Devlin is welcome to fight a battle against the term ‘Data Lake’. Good luck to him. But if he doesn’t like it he should come up with a better idea.” I fully agree, tilting at well-established windmills is pointless. And as we discovered in our last EMA/9sight Big Data survey (available soon, see preview presentation from January), Data Lake implementations, however variously defined, are already widespread. I believe I have come up with a better idea, too, in the IDEAL and REAL information architectures, defined in depth in my book, Business unIntelligence.
To close on the HDS acquisition of Pentaho, I believe it represents a good deal for both companies. Pentaho gets access to a market and investment stream that can drive and enhance its products and business. And, IoT is big business. HDS gets a powerful set of tools that complement its IoT direction. Together, the two companies should have the energy and resources to clean up the architectural anomalies and market misunderstandings of the Data Lake by formally defining the boundaries and describing the structures required for comprehensive data management and governance.
In building out its Internet of Things, is HDS acquiring a data refinery, a data lake or a data swamp?
This week’s announcement of Hitachi Data Systems’ (HDS, @HDScorp) intention to acquire @Pentaho poses some interesting strategic and architectural questions about big data that are far more important than the announcement’s bland declaration about it being “the largest private big data acquisition transaction to date”. We also need to look beyond the traditional acquisition concerns about integrating product lines, as the companies’ products come from very different spaces. No, the real questions circle around the Internet of Things, the data it produces, and how to manage and use that data.
As HDS and Pentaho engaged as partners and flirted with the prospect of marriage, we may assume that for HDS, aligning with Hitachi’s confusingly named Social Innovation Business was key. Coming from BI, you might imagine that Social Innovation refers to social media and other human-sourced information. In fact, it is Hitachi’s Internet of Things (IoT) play. Hitachi, as a manufacturer of everything from nuclear power plants to power tools, from materials and components to home appliances, as well as being involved in logistics and financial services, is clearly positioned at the coalface of IoT. With data as the major product, the role of HDS storage hardware and storage management software is obvious. What HDS lacked was the software and skills to extract value from the data. Enter Pentaho.
Pentaho comes very much from the BI and, more recently, big data space. Empowering business users to access and use data for decision making is their business for over 10 years. Based on open source, Pentaho have focused on two areas. First, they provide BI, analysis and dashboard tools for end-users. Second, they offer data access and integration tools across a variety of databases and big data stores. Both aspects are certainly of interest to HDS. Greg Knieriemen (@Knieriemen), Hitachi Data Systems Technology Evangelist, agrees and adds big data and cloud embedding for good measure. The BI and analytics aspect is straightforward: Pentaho offers a good set of functionality and it’s open source. A good match for the HDS needs and vision, job done. The fun begins with data integration.
Dan Woods (@danwoodsearly) lauds the acquisition and links it to his interesting concept of a “Data Supply Chain… that accepts data from a wide variety of sources, both internal and external, processes that data in various nodes of the supply chain, passing data where it is needed, transforming it as it flows, storing key signals and events in central repositories, triggering action immediately when possible, and adding data to a queue for deeper analysis.” The approach is often called a “data refinery”, by Pentaho and others. Like big data, the term has a range of meanings. In simple terms, it is an evolution of the ETL concept to include big data sources and a wider range of targets. Mike Ferguson (@mikeferguson1) provides perhaps the most inclusive vision in a recent white paper (registration required). However broadly or narrowly we define data refinery, HDS is getting a comprehensive set of tooling from Pentaho in this space.
However, along with Pentaho’s data integration tooling, HDS is also getting the Data Lake concept, through its cofounder and CTO, James Dixon, who could be called the father of the Data Lake, having introduced the term in 2010. This could be more problematical, given the debates that rage between supporters and detractors of the concept. I fall rather strongly in the latter camp, so I should, in fairness, provide context for my concerns by reviewing some earlier discussions. This deserves more space than I have here, so please stay tuned for part 2 of this blog!
Why, oh why does the relationship between analytics, automation, profit and employment seem to elude so many people?
A nicely rounded post by Scott Mongeau, “Manager-machine: analytics, artificial intelligence, and the uncertain future of management”, from last October came to my attention today via James Kobielus’ recent response, “Cognitive Computing and the Indelible Role of Human Judgment”. Together, they reminded me again of a real-world problem that has been bothering me since the publication of my book, “Business unIntelligence”.
Mongeau gives a reasoned analysis of the likely increasing impact of analytics and artificial intelligence on the role of management. His thesis appears very realistic: over the coming few decades, many of the more routine tasks of management will fall within the capability of increasingly powerful machines. From driverless cars to advanced logistics management, many more tasks only recently considered the sole remit of humans can be automated. Mongeau also provides a list of tasks where analytics and automation may never (or perhaps more slowly) encroach: he cites strategic decision making, and tasks requiring leadership and personal engagement, although, even in strategic decisions, IBM’s Watson is already making a play. He also offers some possible new job roles for displaced managers. However, he misses what I believe is the key implication, to which I’ll return in a moment.
Sadly, Kobielus misses the same point, choosing instead to focus on the irrefutable argument (at least for the foreseeable future) that there will always be some tasks where human judgment or oversight is required. Such tasks will remain, of course, with humans. A sideswipe at Luddism also adds nothing to the argument.
So, what is the missed implication? It seems self-evident, to me, at least, that manufacturing and increasingly services can be delivered more cheaply in many cases, using analytics and automation, by machines rather than people. As both analytics and automation improve exponentially according to Moore’s Law, the disparity can only increase. Therefore, industry progressively invests in the capital of hardware and software rather than labor, driven directly by the profit motive. Given that it is through their labor that the vast majority of consumers earn the money needed to buy industry’s goods and services, at what point will consumption be adversely affected by the resulting growing level of unemployment? This is not an argument about when, if ever, machines can do everything a person can do. It is simply about envisaging a tipping point when a sufficient percentage of the population can no longer afford the goods and services delivered by industry, no matter how cheaply.
Hence, the equation implied in the title of this post: analytics and automation, driven by profit, reduce employment. The traditional economic argument is that technology-driven unemployment has always has always been counteracted by new jobs at a higher level of skill for those displaced by the new technology. This argument simply cannot be applied in the current situation; the “skill level” of analytics and automation is increasing far faster (and actually accelerating) than that of humans.
So, I use this first post of 2015 to reiterate the questions I posed in a series of blogs early last year. To be very frank, I do not know what the answers should be. And the politicians, economists and business leaders, who should be leading the thinking in this area, appear to be fully disengaged. In summary, the quest is: how can we reinvent the current economic system in light of the reality that cheaper and more efficient analytics and automation are driving every industry to reduce or eliminate labor costs without consideration for the fact that employment is also the foundation for consumption and, thus, profit?
Image: Nexi. Credit: Spencer Lowell
“Gold is down almost 40% since it peaked in 2011. But it’s still up almost 350% since 2000. Although since 1980, on an inflation-adjusted basis, it’s basically flat. However, since the early-1970s it’s up over 7% per year (or about 3.4% after inflation).” Ben Carlson, an institutional investment manager provides this wonderful example of how statistical data can be abused, in this case by playing with time horizons. Ben is talking about making investment decisions. Let me replay his conclusions, but with a more general view (my changes in bold).
“It’s very easy to cherry-pick historical data that fits your narrative to prove a point about anything. It doesn’t necessarily mean you’re right or wrong. It just means that the world is full of conflicting evidence because the results over most time frames are nowhere close to average. If the performance of everything was predictable over any given time horizon, there would be no risk.”
We have entered a period of history where information has become super-abundant. It would be wise, I suggest, to consider all the ways this information can be misinterpreted or abused. Through ignorance, so-called confirmation bias, intention to deceive, and a dozen other causes, we can mislead, be misled, or slip into analysis paralysis. How can we avoid these pitfalls? Before attempting my own answer, let’s take a look at an example of dangerous thinking that can be found even among big data experts.
Jean-Luc Chatelain, a Big Data Technology & Strategy Executive, recently declared “an end to data torture” courtesy of Data Lakes. Arguing that a leading driver is cost, he says Data Lakes “enable massive amount of information to be stored at a very economically viable point [versus] traditional IT storage hardware”. While factually correct, this latter statement actually nothing about overall cost, with the growth in data volumes probably exceeding the rate of decline in computing costs and, more importantly, the fact that data governance costs grow with increasing volumes and disparity of data stored.
More worryingly, he goes on to say: “the truly important benefit that Data-Lakes bring to the ‘information powered enterprise’ is… ‘High quality actionable insights’”. This conflation of vast stores of often poorly-defined and -managed data with high quality actionable insights flies in the face of common sense. High quality actionable insights more likely stem from high quality, well-defined, meaningful information rather than from large, ill-defined data stores. Actionable insights require the very human behavior of contextualizing new information within personal or organizational experience. No amount of Lake Data can address this need. Finally, choosing actions may be based on the best estimate of whether the information offers a valid forecast about the outcome… or may be based on the desires, intentions, vision, etc. of the decision maker, especially if the information available is deemed to be a poor indicator of the future likely outcome. And Chatelain’s misdirected tirade against ETL (extract, torture and lose, as he labels it) ignores most of the rationale behind the process in order to cherry-pick some well-known implementation weaknesses.
Whether data scientist or business analyst, the first step with data—especially with disparate, dirty data—is always to structure and cleanse it; basically, to make it fit for analytic purpose. Despite a very short history, it is already recognized that 80% or more of data scientists’ effort goes into this data preparation. Attempts to automate this process and to apply good governance principles are already underway from start-ups like @WaterlineData, @AlpineDataLabs as well as long-standing companies like @Teradata and @IBMbigdata. But, as always, the choice of what to use and how to use it depends on human skill and experience. And make no mistake, most big data analytics moves very quickly from “all the data” to a subset that is defined by its usefulness and applicability to the issue in hand. Big data rapidly becomes focused data in production situations. Returning again and again to the big data source for additional “insights is governed by the law of diminishing returns.
It is my belief that our current fascination with collecting data about literally everything is taking us down a misleading path. Of course, in some cases, more data and, preferably, better data can offer a better foundation for insight and decision making. However, it is wrong to assume that more data always leads to more insight or better decisions. As in the past evolution of BI, we are again focusing on the tools and technology. Where we need to focus is on improving our human ability to contextualize data and extract valid meaning from it. We need to train ourselves to see the limits of data’s ability to predict the future and the privacy and economic dangers inherent in quantifying everything. We need to take responsibility for our intentions and insights, our beliefs and intuitions that underpin our decisions in business and in life.
“The data made me do it” is a deeply disturbing rationale.