Big data image via Shutterstock
By James Kobielus (@jameskobielus)
Hadoop isn’t just about big data. It’s also about big–as in rich, deep, sophisticated, and diverse–algorithm libraries that execute within Hadoop clusters.
Your choice of a Hadoop analytic-application development platform–aka “sandbox”–is an important factor in realizing the aims of your big-data projects. The sandbox is where most big-data application developers–aka data scientists–will spend most of their productive hours. If you fail to provide them with a common sandboxing platform with a rich library of algorithms and models, you’ll make it difficult for them to pool their expertise on common projects using shared tools.
Developer productivity depends on having rich algorithm libraries that can tap into petabytes of data in HDFS and other storage resources, as well as into the MapReduce, YARN, and other execution engines in Hadoop platforms. For example, IBM PureData System for Hadoop integrates our BigInsights Hadoop analytics software platform and tooling. Key among its features is an extensible, built-in library of machine learning, statistical modeling, data mining, predictive analytics, text analytics, and spatial analytics functions.
As Andrew Oliver notes in this recent post, machine learning libraries are essential to the success of many Hadoop projects. In particular, Apache Mahout is the principal machine-learning library that is optimized for Hadoop, and it has wide adoption. Mahout includes algorithms for K-means clustering, fuzzy K-means clustering, K-means, latent Dirichlet allocation, singular value decomposition, logistic regression, naive Bayes, random forests, and other popular machine-learning approaches.
It’s important to note that Mahout algorithms don’t always need to be run in conjunction with MapReduce (or YARN, for that matter) on Hadoop clusters, so they can conceivably run faster and more efficiently. However, Mahout is by no means the only library that can work with Hadoop clusters or that has been optimized for this big-data platform. For example, you can also execute the algorithms in the IBM Netezza Analytics library directly on BigInsights without invoking the platform’s MapReduce engine.
Regardless of the merits of Mahout or alternatives, this discussion points to the fact that Hadoop is a versatile development platform that is not constrained to one library, one language, or approach for doing machine learning or statistical modeling in general. As Apache Spark takes hold in the Hadoop arena, we can expect its principal machine-learning library, MLlib, to take residence alongside Mahout in many data scientists’ sandboxes.
As you evolve your big data environment toward Spark and other new approaches, you should be protecting your investments in big-data analytic libraries. If you implement new big-data platforms but can’t leverage the rich trove of algorithms and models that you’ve implemented on older platform, you will have squandered intellectual property that may be the key to the success of future analytic initiatives.
Rackspace image via Shutterstock
Will Rackspace move into the private market? Tune into this week’s roundup to find out.
1. Rackspace’s public struggles may lead to private move – Adam Hughes and Trevor Jones (SearchCloudComputing)
Rackspace is expected to go private this week, rather than continue its search for a suitor. But that move may not help the company compete in the cloud market.
2. New unified workspace delivers apps for $1 per user, per day – Jake O’Donnell (SearchVirtualDesktop)
A Chicago-based psychology school is testing NComputing’s oneSpace to deliver apps and files to remote users at a price far below Citrix XenApp.
3. Mobile security market moves away from FUD – Colin Steele (SearchConsumerization)
Citrix’s chief security strategist says the lock-everything-down mentality can hinder mobile productivity.
4. Samsung Galaxy S5 Mini shrinks the flagship, keeps its features – Jeff Dunn (Brighthand)
Samsung has quietly confirmed the Galaxy S5 mini, which packs less power than the larger Galaxy S5 but carries over many of its trademark features.
5. Physical location of data will become irrevalant by 2020, says Gartner – Archana Venkatraman (ComputerWeekly)
The physical location of data will be irrelevant by 2020, replaced by a combination of location criteria such as legal, political and logical concerns.
Microsoft Office image via Shutterstock
Office 365 users have a surprise coming to them very soon. What could it be? Find out in this week’s roundup.
1. Office 365 users get 1TB storage boost – Diana Hwang (SearchEnterpriseDesktop)
IT pros concerned about shrinking cloud storage for Office 365 have gotten a 1TB lifeline from Microsoft. One major city stands to benefit through a 100,000-user migration.
2. Google throws enterprise IT a bone with Android L – Jake O’Donnell (SearchConsumerization)
Google has finally made good on long-awaited Android enterprise features. But skepticism remains on how the tech giant will implement the changes.
3. Red Hat’s OpenStack strategy progresses with eNovance buy – Trevor Jones (SearchCloudComputing)
Red Hat once again looks to productize an open source platform. This time it turns to OpenStack, but the jury is still out on how it will fare in the cloud.
4. Google beings complying with European takedown requests – Warwick Ashford (ComputerWeekly)
Google has begun removing search results in response to takedown requests from European citizens.
5. Microsoft leads unified communications market in Q1 2014 – Tessa Parmenter (SearchUnifiedCommunications)
Q1 2014 revenue reported by Infonetics Research puts Microsoft ahead as a unified communications market leader, but Cisco is just barely behind.
Amazon image via Shutterstock
Amazon’s smartphone is finally here! What can we expect? Find out in this week’s roundup.
1. Dell PCs to live on as vector for software, services – Diana Hwang (SearchEnterpriseDesktop)
Dell shops needn’t worry about the future of PCs; the company needs its computer hardware to cross-sell software and services, including its EMM platform.
2. BlackBerry BBM Protected secures IM, for a price – Jake O’Donnell (SearchConsumerization)
BlackBerry continues its enterprise focus with a more secure BBM platform, but whether IT will pay for secure mobile messaging remains to be seen.
3. Researchers find critical Android security problem in Google Play – Warwick Ashford (ComputerWeekly)
Researchers have discovered a critical security problem in Google Play, the official Android app store.
4. Target hires CISO as more retail breaches surface – Brandan Blevins (SearchSecurity)
The Target CISO’s first week on the job comes as more retail breaches continue to pile up, highlighting the ongoing risk to such organizations.
5. The Amazon phone is here, and it’s called the Fire Phone – Jeff Dunn (Brighthand)
The long-awaited Amazon Phone has arrived. It’s called the Fire Phone, it’s got lots of cameras, and it’s exclusive to AT&T. Read on to get the rundown on Amazon’s great mobile hope.
Open data image via Shutterstock
By James Kobielus (@jameskobielus)
Openness is the hallmark of a democratic society. Visibility into the workings of our government is utterly essential if elected officials and government agencies are to be held accountable to the public for their actions.
So it stands to reason that government data of all sorts–apart from top secret and other sensitive information–needs to be made freely accessible to the public. The democratic nations of the world have taken up the “open data” imperative in earnest in recent years. In many nations, open data is a key program in reformers’ anti-corruption and transparency initiatives.
In addition, open data is also a key tool for shining light on wasteful government spending, inefficient bureaucracies, and ineffective programs. As US Senator Mark Warner stated about our country’s newly enacted Digital Accountability and Transparency Act: “Right now, federal spending data is not always readily available and, if it is, it’s often in a format that is not very useful. This new [DATA] law requires federal agencies to account for every dollar they spend (and report it) on a single website, in an easy-to-read format. It will help us to identify duplication, waste, and fraud.”
If you think about it, lack of a readily available, standard digital format for public data is the sign of inefficient government in this new era. In other words, open-data standards can help informed citizenry to root out government malfeasance (e.g., the corrupt) and misfeasance (e.g., the incompetents) in one fell swoop. Standard formats for open-data taxonomies, glossaries, metadata, timestamping, tamperproofing, and reporting are fundamental to this promise.
At least one country, South Korea, has also identified open public data as a resource for improving delivery of e-services to its citizens. As stated in this article, a South Korean official envisions open data and predictive analytics driving more proactive and personalized service delivery.
Politicians everywhere make empty promises and pass hollow legislation all the time. So why should we believe that they will deliver on the promise of their open-data initiatives?
I believe the global movement toward open government data is unstoppable because civic watchdogs everywhere will continue to lobby and apply other pressure where it counts. I’m encouraged by the advocacy and public education programs that have been instituted by transnational groups such as the Open Data Institute, Open Government Partnership, Data Transparency Coalition, the United Nations’ Public Administration Programme, and the World Bank. In addition to serving as watchdogs on disparate nations’ open-data initiatives, they are fostering a climate that encourages standardization among diverse open-data standards and practices.
If open data is to fulfill its role in civic governance, it needs to be managed in every country through standards procedures for data governance. One of the valuable components of the new US law is that it requires federal agency inspectors general to report on the quality and accuracy of the financial data provided to open-data portal: USASpending.gov. The law also requires the US Government Accountability Office to report on the data quality and accuracy and to create a Government-wide assessment of the financial data reported.
Will the democratic nations of the world amend their constitutions to enshrine open data as a core value? It’s not a far-fetched thought. Personally, I’d put open access to public data in the category of basic civil rights.
SAP image via Shutterstock
Are more organizations moving their SAP platform to the cloud? Tune into this week’s roundup to find out.
1. HP cloud encryption gives IT pros data security control – Ed Scannell (SearchCloudComputing)
HP’s split-key cloud encryption technology could be just what IT shops concerned about cloud security need to feel comfortable to make the move.
2. Microsoft issues critical fixes for Internet Explorer, graphics – Toni Boger & Jeremy Stanley (SearchWindowsServer)
Microsoft’s June Patch Tuesday contains for a large number of vulnerabilities within Internet Explorer. Plus, the company issued fixes for Word 2007.
3. Microsoft admits running out of IP addresses for Azure – Warwick Ashford (ComputerWeekly)
Microsoft has assured US Azure customers data remains in the US, despite running out of US-registered IP addresses at times.
4. Pandemiya banking malware emerges as Zeus-level threat – Brandan Blevins (SearchFinancialSecurity)
RSA researchers say the costly Pandemiya banking malware was written entirely from scratch, a dangerous oddity in the world of malware.
5. SAP landscape cloud migrations increasing, survey finds – Todd Morrison (SearchSAP)
A new survey by HCL Technologies sheds light on just how fast companies are moving their SAP landscape to the cloud.
Surface Pro image via Shutterstock
Is the Surface Pro 3 the tablet of the future for companies worldwide? That might be the case as the TechTarget writers have the scoop in this week’s roundup.
1. Surface Pro 3 may leave Windows RT Surface in the dust – Diana Hwang (SearchEnterpriseDesktop)
Indications are showing that large organizations across multiple industries are committed to the Surface Pro 3 for their mobile workers. But the future of Windows RT-based Surface devices remains in question.
2. Apple opens APIs, adds more IT capabilities in iOS 8 – Jake O’Donnell (SearchConsumerization)
Apple iOS 8 will bring many mobile management capabilities along with cloud file sharing, but security and cross-compatibility limitations remain.
3. EBay breach response missteps: What other organizations can learn – Brandan Blevins (SearchSecurity)
The mishandled eBay breach response effort showed that even enterprises with mature information security programs can fumble the ball.
4. New HTC One sheds price, ditches metal for plastic – Michael Epstein (Brighthand)
HTC has confirmed that a less expensive version of their hit HTC One smartphone will be coming to select markets in early June.
5. What does it take to be a CIO: Passion and coding skills – Emily McLaughlin (SearchCIO)
What does it take to be a CIO? In this Searchlight, MIT CIO Symposium speakers share their journeys, while WWDC 2014 says coding skills are a must.
Surface Pro image via Shutterstock
Want to know what IT pros were thinking about the new Surface Pro 3 or XenMobile? Check out this week’s roundup.
1. IT pros sound off on new XenMobile, Surface Pro 3 – Alyssa Wood (SearchConsumerization)
IT pros and analysts on Twitter sound off on Microsoft’s Surface Pro 3, the new Citrix XenMobile and BlackBerry’s interesting MDM move.
2. DRaaS provides peace of mind for accounting firm – Trevor Jones (SearchCloudComputing)
Renee Mengali was 3,000 miles away when Hurricane Sandy hit, but the aftermath hit home and made her realize her accounting firm needed DRaaS.
3. TrueCrypt shutdown: Little warning, explanation given by developers – Brandan Blevins (SearchSecurity)
For enterprises, the sudden shuttering of the disk-encryption utility TrueCrypt highlights the risk of using open source security tools.
4. Aorus X3, X3 Plus and X7 change the face of laptop gaming – Jerry Jackson (NotebookReview)
Gigabyte announced three all-new gaming notebooks at Computex 2014 in Taipei and we were there to take a closer look at what makes these gaming laptops more interesting than a typical gaming rig.
5. AWS attends to cloud security with EBS encryption – Beth Pariseau (SearchAWS)
The encryption of EBS volumes is welcome news for cloud customers and security experts, but key management may be an issue for some customers.
eBay image via Shutterstock
While most people had a relaxing Memorial Day weekend, eBay was in full disaster recovery mode after its recent data breach. Read the whole story and more in this week’s roundup.
1. eBay under fire over handling of data breach – Warwick Ashford (ComputerWeekly)
eBay is coming under increasing criticism over its handling of the data breach that exposed millions of user records.
2. Rackspace’s cloud future in question – Trevor Jones (SearchCloudComputing)
Many IT shops and professionals that rely on Rackspace cloud wonder what will come of the company, which is now actively seeking new partnerships or a sale.
3. Business VoIP services market to reach $35 billion by 2018 – Tessa Parmenter (SearchUnifiedCommunications)
According to a recent report from Infonetics Research, VoIP and Unified Communications services are forecasted to grow from $68 billion to $88 billion by 2018.
4. Surface Pro 3 may stop IT from writing off Microsoft mobile devices – Diana Hwang (SearchConsumerization)
IT pros wondering whether Microsoft will return to its PC roots with its mobile devices have their answer with the launch of the latest Surface Pro tablet.
5. Google Android could get EMM with Divide acquisition – Jake O’Donnell (SearchConsumerization)
Google could be readying a preloaded EMM platform for Android devices after its purchase of mobile container startup Divide.
Data analytics image via Shutterstock
By James Kobielus (@jameskobielus)
Life’s just a rolling calculation grounded in odds. What you know about the world, you pretty much think you sort of know for sure. If Rene Descartes hadn’t been in such a rush to certainty, he might have admitted that his inner voice really told him “I think therefore I probably am.”
Having confidence in your knowledge means that the probabilities for what you believe are so high that they are practically indistinguishable from certainties. For example, we all tend to believe the evidence of our eyes, ears, and other senses. However, everyone knows that appearances can deceive. Memory is a faulty gauge of factuality, even for sensory impressions that happened a split-second ago and remain in working memory. And, of course, the art of magic demonstrates the infinite range of intentional illusions that can put the senses to shame.
Real cognition involves organically reckoning and hammering the probabilities that surround us down to manageable near-certainties. Humans are not computers that perform deterministic cognitive processing under stored-program control. Instead, our nervous systems are built on probabilistic principles that sift through impressions, heuristics, and odds so that we can get on with the business of living.
Cognitive computing systems should incorporate probabilistic analytic models in order to capture the irreducible uncertainties that inform rational thought. Anybody who wishes to plant cognitive computing in a more solid scientific foundation should check out the research presented in this MIT wiki. As discussed in the wiki, a probabilistic model of cognition should proceed from two axioms.
First, cognition is a process of trial-and-error hypothesis testing and confirmation. In other words, one confirms or rejects an apriori “working model” of a knowledge domain (i.e., cause-and-effect logic) through evaluation of probability-driven empirical observations.
And, second, cognition is a process of learning by conditional inference from confirmed working models. In other words, one’s confidence in any statement about the world rides on the extent to which it derives from a cause-effect model that was confirmed through probabilistic trial-and-error testing.
These axioms define the extent to which we can trust deterministic approaches to cognitive computing. To the extent that a probabilistic cognitive model has been confirmed over and over through empirical evidence, we can justify coding its cause-effect model into deterministic processing rules. And to the extent that fresh empirical data continues to validate probabilistic models describing those same working models, we can continue to execute those models deterministically.
In other words, we can’t have full-fledged cognitive computing without predictive models, on the one hand, and business rules management systems on the other.