Algorithm image via Shutterstock
By James Kobielus (@jameskobielus)
People have invested the word “algorithm” with some sort of mystic power. In the popular mind, that word seems to stand for the secret sauce–or evil spirit–that animates big data.
Attributing the power of big-data analytics to some magical resource called “algorithms” isn’t terribly enlightening. It takes much more than algorithms–which are as diverse, malleable, and promiscuous as molecules–to extract meaningful insights from big data.
More than mere algorithms, what you need are data scientists who get the data in shape for statistical analysis and exploratory visualization. As I noted in this blog from last year, every step of the data scientist’s working method involves selecting from diverse options: analytic problems, subject populations, sources, samples, model versions, predictive variables, visualizations, and so on.
And, oh yes, of course….the right algorithms. Stepping through the standard methodology, as defined in the cited blog, is a sort of meta-algorithmic discipline at the heart of professional data science. If a data scientist makes the wrong choice at any step–including, but not limited to, selecting the right algorithm(s)–they may never find the underlying correlations they seek. Worse yet, they may “find” spurious correlations and thereby inadvertently deceive themselves and others regarding what’s actually going on in their problem space. There is no foolproof mental algorithm to steer statistical analysts in the right direction as they seek the baseline causal factors in any domain.
If you’re unfamiliar with statistical modeling best practices, you may think that the choice of algorithm is simple: just go with something that everybody talks about called “regression algorithms.” But you would be wrong. Not only are there other types of essential data-science algorithms (e.g., clustering and segmentation), depending on what you’re trying to accomplish, but as Vincent Granville states in this recent blog, even if you focus only on regression, there are hundreds of those algorithms to choose from. And you can blend them in countless permutations. You might even develop your own, if you have an especially astute mathematical mind.
The most enlightening aspect of Granville’s discussion is how he characterizes the statistical modeling scenarios within which each type of algorithm is best suited. For a working data scientist, the trade-offs and optimal blending of diverse algorithmic approaches must always be revisited in every new modeling exercise.
It’s clear that no one uber-algorithm will ever be suitable for illuminating the infinite range of statistical patterns that might inhere within real-world data.
VMware image via Shutterstock
What should you expect at VMworld 2014? Tune into this week’s roundup to find out.
1. Interests go beyond technology at VMworld 2014 – Tom Walat (SearchVMware)
The annual VMware event is expected to draw more than 20,000 but, for some, the company’s products aren’t the only selling point.
2. VMware-AirWatch integration details, new features revealed – Jake O’Donnell (SearchConsumerization)
VMware continues to drop details about its integration with AirWatch for end user computing. Among the new items coming is a mobile container that brings together many different technologies from both companies.
3. Study: Cloud app data sharing growth increases risks – Rob Wright (SearchCloudSecurity)
Netskope’s Cloud Report shows the average number of cloud apps used in the enterprise is growing — but the majority of those apps lack proper security and policy controls.
4. RackWare expands software into cloud disaster recovery – Sonia Lelii (SearchDisasterRecovery)
RackWare turns its cloud application migration software into an automated DR tool by adding failback, failover and building on migration.
5. Android vulnerability enables app impersonation, heightens BYOD risks – Sharon Shea (SearchSecurity)
News roundup: The ‘Fake ID’ flaw on Android devices allows malicious apps to impersonate trusted ones, putting confidential data at risk and reigniting BYOD security concerns.
Mobile image via Shutterstock
Between Android, iOS and Windows Phone, which is the best choice for you? Find out in this week’s roundup.
1. Android, iOS, Windows mobile OS war a positive for customers – Jake O’Donnell (SearchConsumerization)
Which mobile OS is best for your enterprise? IT pro Michael Thomason took a deep dive this week with the three leaders — Android, iOS and Windows Phone — and found pros and cons for all, which in the end means customers have real choices.
2. AWS expansion in Europe likely under data localization pressure – Beth Pariseau (SearchAWS)
An AWS Germany region is expected as part of the cloud behemoths expansion in Europe, along with stronger partnerships between local service providers — but IT pros say data localization is only one piece of the puzzle.
3. IBM SoftLayer a few pieces short of a finished puzzle – Ed Scannell (SearchCloudComputing)
IBM’s heavy investment in SoftLayer over the first 12 months got the attention of many IT shops. But some say IBM needs to deliver more before they can commit.
4. Are BlackBerry security features still an enterprise differentiator – Brandan Blevins (SearchSecurity)
While BlackBerry’s CEO touts the mobile platform’s security features, experts question whether the advantage over iOS and Android still exists.
5. Cloud growth good but SAP should do more, says one analyst – Todd Morrison (SearchSAP)
In this SAP news roundup, one analyst says SAP has to do more to truly become a cloud company despite strong growth, and SAP launches a new effort to help SMBs.
VMware image via Shutterstock
What can we expect from VMworld 2014? Find out in this week’s roundup.
1. VMware Marvin speculation and VMworld expectations – Nick Martin (SearchServerVirtualization)
In this podcast, Nick Martin talks with Christian Mohn about the VMware Marvin speculation and what we’re expecting to see at VMworld 2014.
2. Microsoft disses DaaS with Azure RemoteApp – Bridget Botelho (SearchVirtualDesktop)
The upcoming Azure RemoteApp cloud service from Microsoft bypasses DaaS and delivers apps to mobile devices without Windows. In part one of this two part story, we look at why Microsoft sidestepped Windows.
3. Windows 9 features may address unified apps and the cloud – Robert Sheldon (SearchEnterpriseDesktop)
Based on the Windows 8.1 update, it’s reasonable to expect Windows 9 features for universal apps and cloud integration. Will they entice enterprises?
4. July 2014 Oracle CPU: Java security problems persist – Brandan Blevins (SearchSecurity)
With another round of patches for several serious Java flaws, Oracle’s quarterly CPU showed that Java security problems are not receding.
5. Culture shock: Apple, IBM, Microsoft disrupt themselves – Francesca Sales (SearchCIO)
IBM and Apple’s pact to usher in analytics-enabled mobile apps to enterprises could be the start of a powerful friendship — and spell doom for rivals. Plus, Google Q2 earnings and Oracle tackles Hadoop, all in this week’s Searchlight.
Cloud storage image via Shutterstock
With Microsoft expected to make several cloud storage announcements in the near future, what does that mean for Azure? Find out in this week’s roundup.
1. Microsoft cloud storage may lift Azure skyward – Ed Scannell (SearchWindowsServer)
Microsoft will continue to blare its Azure cloud next week with several cloud storage announcements. Will users listen this time?
2. Amazon’s Dropbox answer leaves IT with big questions – Jake O’Donnell (SearchConsumerization)
Amazon introduced Zocalo into the hot file sync and share market. But questions about encryption keys might make it a tough sell in enterprises.
3. More Office 365 subscription plans, pricing changes ahead – Diana Hwang (SearchEnterpriseDesktop)
Microsoft will replace existing Office 365 SMB plans in October, increasing the user cap for all plans and cutting per-user monthly fees for some plans.
4. New VMware beta program aims to kill vSphere 2015 bugs – Colin Steele and Tom Walat (SearchServerVirtualization)
The vSphere 2015 beta party is no longer invite-only. VMware pros hope the new program will reduce the amount of bugs in the next version of vSphere.
5. July 2014 Patch Tuesday fixes two dozen IE vulnerabilities – Brandan Blevins (SearchSecurity)
Microsoft’s July 2014 Patch Tuesday release addressed two dozen flaws in Internet Explorer. Adobe also provided a critical update for Flash.
Big data image via Shutterstock
By James Kobielus (@jameskobielus)
Hadoop isn’t just about big data. It’s also about big–as in rich, deep, sophisticated, and diverse–algorithm libraries that execute within Hadoop clusters.
Your choice of a Hadoop analytic-application development platform–aka “sandbox”–is an important factor in realizing the aims of your big-data projects. The sandbox is where most big-data application developers–aka data scientists–will spend most of their productive hours. If you fail to provide them with a common sandboxing platform with a rich library of algorithms and models, you’ll make it difficult for them to pool their expertise on common projects using shared tools.
Developer productivity depends on having rich algorithm libraries that can tap into petabytes of data in HDFS and other storage resources, as well as into the MapReduce, YARN, and other execution engines in Hadoop platforms. For example, IBM PureData System for Hadoop integrates our BigInsights Hadoop analytics software platform and tooling. Key among its features is an extensible, built-in library of machine learning, statistical modeling, data mining, predictive analytics, text analytics, and spatial analytics functions.
As Andrew Oliver notes in this recent post, machine learning libraries are essential to the success of many Hadoop projects. In particular, Apache Mahout is the principal machine-learning library that is optimized for Hadoop, and it has wide adoption. Mahout includes algorithms for K-means clustering, fuzzy K-means clustering, K-means, latent Dirichlet allocation, singular value decomposition, logistic regression, naive Bayes, random forests, and other popular machine-learning approaches.
It’s important to note that Mahout algorithms don’t always need to be run in conjunction with MapReduce (or YARN, for that matter) on Hadoop clusters, so they can conceivably run faster and more efficiently. However, Mahout is by no means the only library that can work with Hadoop clusters or that has been optimized for this big-data platform. For example, you can also execute the algorithms in the IBM Netezza Analytics library directly on BigInsights without invoking the platform’s MapReduce engine.
Regardless of the merits of Mahout or alternatives, this discussion points to the fact that Hadoop is a versatile development platform that is not constrained to one library, one language, or approach for doing machine learning or statistical modeling in general. As Apache Spark takes hold in the Hadoop arena, we can expect its principal machine-learning library, MLlib, to take residence alongside Mahout in many data scientists’ sandboxes.
As you evolve your big data environment toward Spark and other new approaches, you should be protecting your investments in big-data analytic libraries. If you implement new big-data platforms but can’t leverage the rich trove of algorithms and models that you’ve implemented on older platform, you will have squandered intellectual property that may be the key to the success of future analytic initiatives.
Rackspace image via Shutterstock
Will Rackspace move into the private market? Tune into this week’s roundup to find out.
1. Rackspace’s public struggles may lead to private move – Adam Hughes and Trevor Jones (SearchCloudComputing)
Rackspace is expected to go private this week, rather than continue its search for a suitor. But that move may not help the company compete in the cloud market.
2. New unified workspace delivers apps for $1 per user, per day – Jake O’Donnell (SearchVirtualDesktop)
A Chicago-based psychology school is testing NComputing’s oneSpace to deliver apps and files to remote users at a price far below Citrix XenApp.
3. Mobile security market moves away from FUD – Colin Steele (SearchConsumerization)
Citrix’s chief security strategist says the lock-everything-down mentality can hinder mobile productivity.
4. Samsung Galaxy S5 Mini shrinks the flagship, keeps its features – Jeff Dunn (Brighthand)
Samsung has quietly confirmed the Galaxy S5 mini, which packs less power than the larger Galaxy S5 but carries over many of its trademark features.
5. Physical location of data will become irrevalant by 2020, says Gartner – Archana Venkatraman (ComputerWeekly)
The physical location of data will be irrelevant by 2020, replaced by a combination of location criteria such as legal, political and logical concerns.
Microsoft Office image via Shutterstock
Office 365 users have a surprise coming to them very soon. What could it be? Find out in this week’s roundup.
1. Office 365 users get 1TB storage boost – Diana Hwang (SearchEnterpriseDesktop)
IT pros concerned about shrinking cloud storage for Office 365 have gotten a 1TB lifeline from Microsoft. One major city stands to benefit through a 100,000-user migration.
2. Google throws enterprise IT a bone with Android L – Jake O’Donnell (SearchConsumerization)
Google has finally made good on long-awaited Android enterprise features. But skepticism remains on how the tech giant will implement the changes.
3. Red Hat’s OpenStack strategy progresses with eNovance buy – Trevor Jones (SearchCloudComputing)
Red Hat once again looks to productize an open source platform. This time it turns to OpenStack, but the jury is still out on how it will fare in the cloud.
4. Google beings complying with European takedown requests – Warwick Ashford (ComputerWeekly)
Google has begun removing search results in response to takedown requests from European citizens.
5. Microsoft leads unified communications market in Q1 2014 – Tessa Parmenter (SearchUnifiedCommunications)
Q1 2014 revenue reported by Infonetics Research puts Microsoft ahead as a unified communications market leader, but Cisco is just barely behind.
Amazon image via Shutterstock
Amazon’s smartphone is finally here! What can we expect? Find out in this week’s roundup.
1. Dell PCs to live on as vector for software, services – Diana Hwang (SearchEnterpriseDesktop)
Dell shops needn’t worry about the future of PCs; the company needs its computer hardware to cross-sell software and services, including its EMM platform.
2. BlackBerry BBM Protected secures IM, for a price – Jake O’Donnell (SearchConsumerization)
BlackBerry continues its enterprise focus with a more secure BBM platform, but whether IT will pay for secure mobile messaging remains to be seen.
3. Researchers find critical Android security problem in Google Play – Warwick Ashford (ComputerWeekly)
Researchers have discovered a critical security problem in Google Play, the official Android app store.
4. Target hires CISO as more retail breaches surface – Brandan Blevins (SearchSecurity)
The Target CISO’s first week on the job comes as more retail breaches continue to pile up, highlighting the ongoing risk to such organizations.
5. The Amazon phone is here, and it’s called the Fire Phone – Jeff Dunn (Brighthand)
The long-awaited Amazon Phone has arrived. It’s called the Fire Phone, it’s got lots of cameras, and it’s exclusive to AT&T. Read on to get the rundown on Amazon’s great mobile hope.
Open data image via Shutterstock
By James Kobielus (@jameskobielus)
Openness is the hallmark of a democratic society. Visibility into the workings of our government is utterly essential if elected officials and government agencies are to be held accountable to the public for their actions.
So it stands to reason that government data of all sorts–apart from top secret and other sensitive information–needs to be made freely accessible to the public. The democratic nations of the world have taken up the “open data” imperative in earnest in recent years. In many nations, open data is a key program in reformers’ anti-corruption and transparency initiatives.
In addition, open data is also a key tool for shining light on wasteful government spending, inefficient bureaucracies, and ineffective programs. As US Senator Mark Warner stated about our country’s newly enacted Digital Accountability and Transparency Act: “Right now, federal spending data is not always readily available and, if it is, it’s often in a format that is not very useful. This new [DATA] law requires federal agencies to account for every dollar they spend (and report it) on a single website, in an easy-to-read format. It will help us to identify duplication, waste, and fraud.”
If you think about it, lack of a readily available, standard digital format for public data is the sign of inefficient government in this new era. In other words, open-data standards can help informed citizenry to root out government malfeasance (e.g., the corrupt) and misfeasance (e.g., the incompetents) in one fell swoop. Standard formats for open-data taxonomies, glossaries, metadata, timestamping, tamperproofing, and reporting are fundamental to this promise.
At least one country, South Korea, has also identified open public data as a resource for improving delivery of e-services to its citizens. As stated in this article, a South Korean official envisions open data and predictive analytics driving more proactive and personalized service delivery.
Politicians everywhere make empty promises and pass hollow legislation all the time. So why should we believe that they will deliver on the promise of their open-data initiatives?
I believe the global movement toward open government data is unstoppable because civic watchdogs everywhere will continue to lobby and apply other pressure where it counts. I’m encouraged by the advocacy and public education programs that have been instituted by transnational groups such as the Open Data Institute, Open Government Partnership, Data Transparency Coalition, the United Nations’ Public Administration Programme, and the World Bank. In addition to serving as watchdogs on disparate nations’ open-data initiatives, they are fostering a climate that encourages standardization among diverse open-data standards and practices.
If open data is to fulfill its role in civic governance, it needs to be managed in every country through standards procedures for data governance. One of the valuable components of the new US law is that it requires federal agency inspectors general to report on the quality and accuracy of the financial data provided to open-data portal: USASpending.gov. The law also requires the US Government Accountability Office to report on the data quality and accuracy and to create a Government-wide assessment of the financial data reported.
Will the democratic nations of the world amend their constitutions to enshrine open data as a core value? It’s not a far-fetched thought. Personally, I’d put open access to public data in the category of basic civil rights.