Toward the end of 2012, Quocirca met with an interesting company called DataSift. DataSift is a social data platform company – it takes feeds of data from the majority of social media sites and can then mine through social conversations for content, trends and insights. This is of obvious interest for organisations that are tracking sentiment of their brand in the market – but may also have other uses as well.
The one obvious target for DataSift is Twitter – the vast majority of Twitter data is available in the public domain (only direct messages (DMs) are hidden from general view). However, DataSift can also track activity around an organisation’s Facebook page, content from blogs and forums – including other semi-private information the organisation accesses via social networks established between itself and the public.
The platform is cloud-based with prices based on a combination of “complexity”, hours and hourly cost along with a data cost. The hourly cost is the simplest to explain. The price is based on the period being analysed – for a week, this would be 168 hours, for a month (nominally) 720 hours. Complexity is more difficult and is based on a calculation that can only be completed once the query has been created. However, the business model does mean that you only pay for what you get: no on-going subscriptions that have to be paid no matter what – everything is on a per use basis. The data cost is based on a small charge per Tweet analysed. For statistical validity, DataSift recommends that a 10% sample rate is used, which lowers the price significantly.
As a test, Quocirca asked DataSift to run a Twitter-only analysis of 2012 Twitter activity for a named set of vendors who are often mentioned in the same breath as big data. The query required just 10 lines of code to be written, and gave a complexity score of 2.1. Without the 10% filter in place, 2.23 million Tweets were analysed.
We selected an interesting topic as the basis for our test and Quocirca will be writing a more detailed piece on the findings, but the highlights below illustrate the potential power of the system:
- Twitter activity around big data grew by 64% over the year. This is not surprising – big data was still an emerging topic back at the beginning of the year, but was being pushed harder and harder by the vendors and the media as the year progressed.
- Nearly three quarters of Tweets contained an active link. People were not just dropping Twitter comments about big data – they were referring people to other content outside of Twitter.
- Apache had the biggest footprint with 9.4% of vendor mentions in Tweets being about it. Apache, with its Hadoop parallel processing engine and Cassandra database, is unsurprisingly the big player here.
- Second placed was 10gen, the commercial entity that looks after MongoDB, with 6.24% of vendor mentions.
- Of the “big guys”, IBM gained a creditable third place with 3.25%, with HP in fourth with 2.38%.
- There were geographic differences – IBM’s strongest country was France; Cloudera’s was Japan. SAP was (unsurprisingly) strong in Germany; DataSift itself was very strong in the UK.
- At a domain level – the sites that people were pointing people to most from their Tweets, Forbes.com was a surprise winner. Behind that, GigaOM.com and Techcruch.com were the next biggest content sources.
As a single point of interest, a look was taken at HP at a sentiment analysis level. Through the first part of the year, people’s views of HP remained fairly level, with a net sentiment score (positive comments minus negative comments) of 0 – not good news in itself, but it could have been worse. However, between 14th November and 10th December, a lot of sentiment activity took place.
On the 21st November, HP’s sentiment score plunged close to -10,000. It recovered back to zero by the 24th, and then went back down to -5,000 on the 28th, rose again and then crashed down to -7,000 on the 1st December.
Why? On November 20th, HP’s CEO Meg Whitman told Wall Street analysts that HP had massively overpaid for software firm, Autonomy, and accused former executives at Autonomy of cooking the books. Financial and technical analysts went into a frenzy – the very people who use social networking the most to get information out as quickly as possible. The ongoing fall-out was what caused the triple-dip poor sentiment scores over the following weeks.
This shows that, although HP got a fourth place in the mentions it had around big data, it was not necessarily positive to HP’s brand. This is why a company such as DataSift is important – it not only can remove the grunt work of dealing with analysing the massive firehose of data that comes from social networks, but also applies solid analytic against this to ensure that what a customer sees as results is there in context.
The managers of any successful business must keep a constant focus on productivity. Well implemented IT helps to achieve this, for example through automating manufacturing processes, improving supply chain efficiency or enabling flexible working. The same managers may assume that the IT departments that help deliver these innovations are themselves productive. In many cases they will be wrong.
A recent Quocirca research report – The wastage of human capital in IT operations – shows that many IT teams could improve their productivity dramatically. As much as 40% of a team’s time can be spent on routine low level tasks, for example patching software, dealing with end user device problems or error checking.
IT managers themselves are well aware of the issues and those in mid-market organisations in particular list such wastage of their team’s time as a top frustration. They have a clear understanding of their staff’s skills, but are not able to use them as effectively as they would like. For the individuals involved, work becomes boring and there is general demotivation.
Whilst the wastage should in itself be major concern, an even bigger concern is that this very issue is holding IT departments back from their raison d’être; helping businesses overall increase their productivity and competitiveness. IT managers admit that if they had 50% more man hours available to them, they would use these to modernise IT infrastructure and deliver new applications.
So what can be done? The truth is that the mundane tasks are not going to go way. IT managers have three options; stick with the status quo and accept the wastage; introduce cheaper, low skilled labour, probably through outsourcing areas of IT operations management; or introduce more automation.
It is estimate that 80% of IT infrastructure is common to most businesses IT operations. So, mundane tasks are being repeated by skilled operators on a huge scale. Outsourcing just displaces the problem, when in reality automating these tasks and repeating them across multiple businesses should be straight forward.
The vendors of automation tools are themselves experts at building the procedures that enable repetitive tasks to be carried out time and time again across different organisations IT infrastructure. Such tools can recognise exceptions and make an intelligent hand over to human operators, be they an internal staff member or an expert from a third party specialist.
Once the investment in the tools has been made, the incremental charge for repeating is negligible compared to outsourcing. Such tools enable the industrialisation of IT; the efficient repetition of certain tasks hundreds or thousands of times over without consuming valuable IT staff time.
There are three options for achieving this:
- Capital investment in new tools installed on-premise from the “big” systems management vendors; namely BMC, HP, CA and IBM (some would add Microsoft’s Systems Centre to this list)
- Freeing budget from operational spending to subscribe to on-demand system management services that support high levels of automation such as IP Soft and ServiceNow
- A hybrid approach with the flexibility to deliver both of the above, which is possible with the IP Soft tools and a few other vendors such as Kaseya
The ineffectiveness of many IT operations will spiral out of control if action is not taken to improve the way they are managed. Putting in place the necessary IT management tools, services and procedures to maximise automation and to industrialise processes will address this and reduce skills wastage. The ultimate value will be the ability to efficiently manage the increasing complexity of IT infrastructure, whilst delivering new applications that will ensure a business remains competitive.
Quocirca recently had an interesting discussion with an off-shore hosting and cloud company. Jersey-based (as in the UK Channel Islands, not the US New Jersey) Calligo is positioning itself as the right place to be for data – and for running the applications that create and consume the data.
Why is this important? Well, organisations are beginning to wake up to the fact that even when a data centre is in a “friendly” country, there is still potentially high risks to the intellectual property (IP) held within the data.
The US Patriot Act and the Foreign Intelligence Surveillance Act (FISA) make those European companies that have looked into their possible impact shudder. That a foreign power can demand – and get – access to their data just because it is hosted by a company in the US – or is in a facility anywhere in the world that is owned by a company in the US – means that many are looking for alternative arrangements with companies that can still offer a broad range of services, but backed with better data security agreements that cannot be ridden roughshod over by the regional government.
Calligo’s view is that Jersey is highly controlled from a data viewpoint. Although it is nominally “in” the UK, it is actually a separate British Crown Dependency. This means that it is autonomous, makes its own laws and operates outside of the reach of other country’s legal systems – including the UK. Sure, EU laws will still apply when push comes to shove – but a European customer may be happier with a Jersey/EU escalation than a <country>/EU/US three-way battle.
This means that data can be stored in a country where the legal system is subject to fewer overall laws, is overseen by fewer people and can be targeted to specific needs. Jersey has pedigree here with the way it has dealt with financial services in its country.
Jersey is also well connected from a data viewpoint to both the UK and the European mainland through multiple cables, and from these to the rest of the world. Therefore, placing applications and data in a commercial, secure facility on an island that is part of the EU but is autonomous has many things going for it.
But, however well Jersey is connected to the rest of the world, it cannot overcome its relative geographic isolation. When fast, low-latency response is needed, e.g. for transactional work in the US or in Japan – the underlying latency can still be an issue. Calligo recognises this, and is looking at where else in the world it can set up similar facilities and meet the needs of organisations that want to be assured of greater security for their data and therefore their intellectual property.
The Cayman Islands are one option – they are well placed for the south of the US, for Central America and for the major markets of the top of South America. Although the Cayman Islands are a British Overseas Territory with their own legal system, they come under the overall control of the UK and have a Governor appointed by the Queen – but can still enact and follow laws that make sense from a commercial viewpoint to the islands.
Calligo also includes a data ownership clause in its agreements – the data always belongs to and is owned by the customer. Many cloud providers make no statements about this – which can cause issues for the actual data owner. On top of this, Calligo says that it has a special clause in its agreements, which make it clear that should the untoward happen, the data has to be turned over to the customer (even by a business administrator) – so making it easier for a customer to regain access to the data and move it to another provider.
Similar approaches in other parts of the world could give Calligo an interesting footprint for a global offering. With small, autonomous island states being more likely to provide laws that are data friendly while still retaining strong audit and overall data security capabilities, Calligo’s offerings of IaaS, PaaS and SaaS (for example, it hosts SugarCRM and other applications) combined with the capability to use external cloud offerings where it makes sense (such as Google Maps) will make sense to many organisations.
Overall, Calligo looks like an interesting company. For those who have worries about how their data is secured not just from the baddies out there, but also from the governments who are enacting ever more threatening laws around data access, the use of Island nations as a home for data could be just as good as using them for financial affairs.
Two back to back events recently saw Quocirca talking to veterans of the software industry; CA and Symantec. The high level message from both is pretty much to same; we help to secure and manage your data and IT infrastructure. Yet, it is rare to find these two head-to-head; because in reality they are more different than they are alike.
True, they are both US headquartered (more or less) pure software companies with annual revenues of a similar order (CA circa $5B, Symantec circa $7B) and both with profits of around $1B. Their current share price and market-cap are similar and their stock market history has followed similar ups and down over the last decade. Both are now 30-something; CA founded in 1976 and Symantec in 1982. Symantec’s higher revenue is reflected in its head count, 20K employees opposed to CA’s 14K, but that gives them remarkably similar productivity of about $350K per head.
Furthermore, both sit on similar piles of cash of about $13B. This ability to accumulate cash has been key to the way each has grown, through aggressive acquisition; both have acquired tens of companies over the years, in Symantec’s case almost doubling its size when it merged with Veritas in 2004 to move into the storage market.
So, for two companies appearing so similar what are the differences that allow them to operate side by side in the IT industry without too many dogfights? The most obvious is their legacy; CA comes from a background of providing software for mainframes (the ultimate in enterprise computing), whilst Symantec’s origin lies in its consumer focussed Norton anti-virus technology (probably still a more recognised brand than Symantec itself). The main target market shared by both vendors is supplying software for mid-market and enterprise businesses to manage and secure Windows and Linux based systems.
Even here, whilst they may still sound similar their products have historically not overlapped much. When it comes to management Symantec’s main focus is end-points (via its 2007 Altiris acquisition) and storage, whilst CA is listed as one of the big 4 systems management companies (along with BMC, IBM and HP – or 5 if you include Microsoft), focussed on broad management of enterprise IT (in CA’s case including those mainframes).
In security, historically the overlap has also been limited. Many still think of Symantec as primarily a security company, but over the years its acquisitions have taken it beyond its roots in anti-virus to included email security, web security, data loss prevention (DLP) and so on. Few think of CA in the first instance as a security company but it also always operated in this space, more focussed on identity and access management (IAM), despite also having its own anti-virus.
However, that is changing – CA has been acquiring more and more security assets, for example it moved in to DLP in 2009 when it acquired Orchestria. And Symantec is now moving into IAM with its O3 platform that includes single sign on (SSO) via a partnership with Symplified, secure web access and compliance enforcement/reporting. Whilst Symantec remains by far the bigger of the two in IT security, it can expect to see more and more of CA going forwards.
Both vendors are keen to be seen as innovators (or keeping up depending on your viewpoint) with the key IT trends; cloud, mobile, social media, big data etc. However, this week they were both as keen to talk about people as products and solutions. Symantec has recently replaced its CEO of the last 3 years, Enrico Salem (whose blood was said to flow yellow, the vendor’s corporate colour) with Steve Bennett who joined the board from Intuit in 2010. In a session on strategy, Symantec had little to say except the new CEO’s pronouncements could be expected in January 2013. John Brigden, Symantec’s head of Europe, Middle East and Africa (EMEA) for the last 7 years will be keen to see what that means for his organisation.
CA has already shaken up its EMEA operations bringing a new head Marco Comastri just over a year ago from Poste Italiane (he has also worked at IBM and Microsoft). Comastri is bringing new faces and trying to get CA EMEA more focussed on solution selling than technology.
Whether it is at the global or European level, these two software juggernauts have a momentum all of their own and management may find is frustrating to change direction. They should not try too hard, both have huge legacy customer bases and healthy finances, shareholders will not be happy to see either compromised.
Energy usage is a focus for many at the moment. For IT, it seems to be a big focus – mainly as organisations become more aware of how much energy is wasted in their data centre facilities. However, it is likely to be brought into even greater focus in the not so far distant future, as the looming energy deficit starts to become more apparent.
A mix of short-sightedness and prevarication by politicians means that the UK is now at a position where it is unlikely that it will be able to meet all its consumers’ energy needs in just a few years – the UK’s energy market overseer, Ofgem predicts that the UK’s current energy generation over-capacity of 14% could fall to 4% in just 3 years. The failure, or the need to take down for even planned maintenance – of only one generation plant could lead to insufficient power being available for all the country’s needs.
Therefore, planned outages will be required to be put in place – and the biggest energy users will be targeted where overall country needs will not be adversely impacted.
So – steel and aluminium production is unlikely to be hit. Retail may be asked to cut down on lighting and heating. But the one place where politicians can really point to is the use of IT – and how many organisations could be asked to reduce their energy usage here – or risk having it cut off for periods of time.
It is widely accepted that data centres are inefficient when it comes to usage of energy – the average utilisation of a server is around 10-20% of cpu, and of storage around 30%. Sure – a move to virtualisation can drive up these utilisation rates and so lower the amount of equipment being used and so lower the energy being needed – but is this the best way to address the overall need?
To take a bigger picture, it is necessary to look at the whole data centre facility and its energy usage. There is a means of gaining a measure of the overall energy efficiency of a facility through the use of power usage effectiveness, PUE. This is a comparison of the total amount of energy used by a facility divided by the amount that is used to power the IT workloads – i.e. that used by servers, storage and network equipment. The rest of the energy is used in peripheral areas, such as lighting, cooling, and uninterruptable power supplies (UPSs).
A theoretical perfect data centre should therefore have a PUE of 1 – all the energy is used in powering IT workloads. However, in practice, the PUE for an “average” facility is around 2.0 – for each Watt of power used for IT workloads, another Watt is used for peripheral items.
So – only 50% of the facility’s total energy is reaching the servers, storage and networking equipment. Running at 20% IT equipment utilisation means that at a rough estimate, around 90% of a facility’s total energy input is essentially going to waste. Upping IT equipment utilisation rates to 40% and getting rid of excess equipment could mean a saving 10% of a data centre’s energy usage – which is wonderful – but still only means that 20% of a data centre’s energy is being used for useful IT work.
However, the majority of data centres utilise UPSs to support pretty much all the energy used across the facility. Unfortunately, many of these devices are pretty old, and will be running at 94% efficiency or less. Modern UPSs run at 98% efficiency or greater. But, is a 4% improvement in energy efficiency at a UPS worth the bother when a 10% improvement at the server and storage layers is possible?
Back to the maths. If all the facility’s energy goes through the UPS, then a 4% improvement across all systems (servers, storage, networking, cooling, lighting) is a 4% savings in energy bill – without having changed anything but the UPS. Now, introduce the virtualisation mentioned above. The server utilisation rates are upped from 20% to 40% as before, and the saving is 10% of the data centre’s energy bill. But, because we have improved the overall data centre’s energy usage as well, we get a greater saving. Every time we improve the equipment in the data centre – IT or support – then we gain that extra energy efficiency as well.
Modern UPSs also provide a host of other capabilities – as battery technology and battery management systems have improved, a well-implemented UPS can help in bridging some breaks in energy provision without the need for auxiliary generators to switch in. They can also better deal with low voltage situations (“brown outs”), ensuring that an optimised energy feed gets to all equipment.
Should Ofgem be right, there will be planned brown outs and power cuts around the country within a few years. Organisations can help in many ways – improving their data centres so that they are more energy efficient could put this back by a few months. However, ensuring that their data centre facilities have newer, more effective UPSs in place can help in not only providing a far more energy efficient facility, but also in dealing with the problems that an energy deficit could present.
Quocirca has written a report on the subject, which can be downloaded for free here: http://quocirca.com/reports/773/powering-the-data-centre