Big data image via Shutterstock
By James Kobielus (@jameskobielus)
Even as Apache Spark pushes more deeply into big-data environments, it won’t substantially change this trend. Yes, of course Spark is on the fast track to ubiquity in big-data analytics. This is especially true for the next generation of machine-learning applications that feed on growing in-memory pools and require low-latency distributed computations for streaming and graph analytics. But those use cases aren’t the sum total of big-data analytics and never will be.
As we all grow more infatuated with Spark, it’s important to continually remind ourselves of what it’s not suitable for. If, for example, one considers all the critical data management, integration, and preparation tasks that must be performed prior to modeling in Spark, it’s clear that these will not be executed in any of the Spark engines (Spark SQL, Spark Streaming, GraphX). Instead, they’ll be carried out in the data platforms and elastic clusters (HDFS, Cassandra, HBase, Mesos, cloud services, etc.) upon which those engines run. Likewise, you’d be hardpressed to find anyone who’s seriously considering Spark in isolation for data warehousing, data governance, master data management, or operational business intelligence.
Above all else, Spark is the new power tool for data scientists who are pushing boundaries in the emerging era of in-memory big data analytics in low-latency scenarios of all types. In a recent column, I commented on the likely sweet-spot deployment roles—fog, stream, and cloud–where Spark will prove its value as a development tool for the new generation of data scientists building the in-memory statistical models upon which it all will depend.
Let’s not fall into the delusion that everything is converging toward Spark, as if it were the ravenous maw that will devour every other big-data analytics tool and platform. Spark is just another approach that’s being fitted to and optimized for specific purposes.
And let’s resist the hype, such as in the headline of this recent article, that treats Spark as Hadoop’s “successor.” This implies that Hadoop and other big-data approaches are “legacy,” rather than what they are, which is foundational. For example, no one is seriously considering doing “data lakes,” “data reservoirs,” or “data refineries” on anything but Hadoop or NoSQL.
After all, you can’t spark an analytic combustion engine if there’s no data fuel in the tank.
Data center image via Shutterstock
What data center trends are leading the charge in 2015? Find out in this week’s roundup.
1. Ten data center trends driving change in 2015 – Stephen Bigelow (SearchDataCenter)
There’s technology hype and then there are true trends that affect data centers long-term. These 10 trends are sure to have an impact beyond 2015.
2. Enterprises struggle with SDN development, culture – Antone Gonsalves (SearchSDN)
Enterprises say retraining and cultural changes stemming from SDN development are more difficult to solve than technological challenges.
3. Apple, EMMs team up to deploy mobile apps for business – Jake O’Donnell (SearchConsumerization)
VMware’s AirWatch and MobileIron will work with iOS to deploy secured business apps in a different way than Apple’s IBM partnership.
4. Samsung vulnerability affects up to 600 million Android devices – Michael Heller (SearchSecurity)
A flaw in the default keyboard found on many Samsung Galaxy Android devices may leave as many as 600 million devices at risk for a man-in-the-middle attack.
5. C-level relationships, engagement key to CIO success – Sue Troy (SearchCIO)
CIOs spend 40% of their time engaged with the CEO, CMO, COO and other non-IT peers, according to MIT research. C-level execs explain why that time is so important.
HP image via Shutterstock
Do you think HP is making a wise decision splitting into two? Find out in this week’s roundup.
1. HP Enterprise to focus on agility, efficiency after split – Robert Gates (SearchDataCenter)
HP will soon be split into two, with an enterprise business that IT pros expect will deliver faster to catch up in the quickly changing IT market.
2. IBM and Cisco acquire OpenStack providers – Trevor Jones (SearchCloudComputing)
IBM and Cisco both made acquisitions this week to shore up their cloud portfolios and consolidate the private cloud and OpenStack market.
3. Dropbox for Business adds Active Directory integration – Jake O’Donnell (SearchConsumerization)
With an Active Directory connection, added admin features and shared folder APIs, Dropbox has designed its enterprise offering to be part of IT’s big picture.
4. Facebook, Google, Mozilla raise the bar with new user privacy controls – Sharon Shea (SearchSecurity)
News roundup: New settings and options to boost user privacy and security are emerging on major websites, but is it enough?
5. HP, Alcatel venture targets data center connections – Gina Narcisi (SearchNetworking)
HP and Alcatel-Lucent announced new cloud and data center offerings to help service providers and large enterprises connect their disparate environments.
Internet of Things image via Shutterstock
Will Google’s new IoT operating system make waves in the market? Find out in this week’s roundup.
1. New Google IoT OS to connect Android devices – Jake O’Donnell (SearchConsumerization)
Google Project Brillo is a scrubbed-down IoT OS that could entice enterprises thanks to its connections to existing Android and iOS devices.
2. IBM hopes to Power its way to the hybrid cloud – Ed Scannell (SearchDataCenter)
IBM tries the gumbo approach to creating a hybrid cloud: Some Power Systems with a dash of software reorg and a pinch of revamped software licensing.
3. IRS breach shows the importance of PII security – Maxim Tamarov (SearchSecurity)
A breach of the IRS’ Internet tax form service “Get Transcript” exposed the personal information and tax filings of thousands of people.
4. Toshiba joins list of Ethernet hard drive makers – Carol Sliwa (SearchStorage)
Toshiba joins drive makers testing new devices that combine storage, compute resources and Ethernet ports to scale object stores, big data analytics.
5. Salesforce ties Wave analytics tool to big data platforms – Ed Burns (SearchBusinessAnalytics)
Salesforce is getting more involved in big data analytics, with new partnerships that open up its cloud-based Wave analytics technology to Hadoop and other big data systems.
Government security image via Shutterstock
Will President Obama act on the issue of government backdoors? Find out in this week’s roundup.
1. Government backdoor security concerns prompt letter to President – Sharon Shea (SearchSecurity)
As privacy and security concerns rise, President Obama is urged to dismiss the call for government backdoors.
2. Alcatel-Lucent boosts capabilities of service provider SDN platform – Antone Gonsalves (SearchSDN)
Alcatel-Lucent introduces a service provider SDN platform designed to improve response times in making changes to corporate networks.
3. Cloud threatens traditional IT jobs, forces changes – Robert Gates (SearchDataCenter)
Cloud computing is making corporate-owned data centers less common, and companies continue to reduce IT staff.
4. OpenStack containers, PaaS tie-ins give users a leg up – Trevor Jones (SearchCloudComputing)
OpenStack containers can come in many forms as the Foundation has taken an agnostic approach to the emerging orchestration technologies around it.
5. IoT will force CIOs to enter the realm of operational technology – John Moore (SearchCIO)
IoT is mature enough for enterprise adoption, said a panel of experts at the 2015 MIT Sloan CIO Symposium, but successful deployments require CIOs to engage with operational technology.
VMware image via Shutterstock
What does VMware’s forecast look like after its first-quarter results? Find out in this week’s roundup.
1. VMware forecast: partly cloudy with chance of market gain – Ed Scannell and Tom Walat (SearchServerVirtualization)
VMware has made significant investments in network virtualization, end-user computing and the cloud. By the first-quarter results, those efforts are paying off.
2. Rackspace: Expect more of a leadership role in OpenStack community – Trevor Jones (SearchCloudComputing)
In this Q&A, Rackspace’s Private Cloud VP and GM discusses the state of the OpenStack community and the company’s plan to strengthen its role in it.
3. AMD roadmap redrawn for data center destination – Meredith Courtemanche (SearchDataCenter)
AMD expects to add some processor choice into the Wintel-dominated x86 server market as soon as next year.
4. Bugs, lack of support lead to Tor Cloud Project shutdown – Maxim Tamarov (SearchCloudSecurity)
Tor Project shuts down its AWS bridge effort, Tor Cloud, but encouraged developers to set up their own Tor bridges to promote anonymous cloud usage.
5. Security ethics survey shows honesty is a tricky business – Michael Heller (SearchSecurity)
A security ethics survey conducted at the 2015 RSA Conference indicates that infosec professionals may be wary of media attention in breach and vulnerability reporting.
Windows Update image via Shutterstock
Have we seen the end of Patch Tuesday? Check out the changes Microsoft announced at Ignite 2015 in this week’s roundup.
1. Microsoft debuts password-free Windows Hello, Patch Tuesday changes – Robert Richardson (SearchSecurity)
Microsoft Ignite 2015 showed that Microsoft may have rethought the Tuesday part of Patch Tuesday, but Windows Update is stronger than ever.
2. Outlook for iOS, Android gets MAM, fulfills IT wish list – Jake O’Donnell (SearchConsumerization)
IT has more controls than ever for Microsoft’s new Outlook mobile apps, yet only those with Intune can use its new MAM.
3. All-in-one systems simplify OpenStack private clouds – Robert Gates (SearchDataCenter)
OpenStack has been an option for advanced users for many years, but new combinations match server hardware with enterprise support.
4. Windows Server 2016 preview 2 hits without containers – Ed Scannell (SearchWindowsServer)
Microsoft has re-factored Windows Server around containers, but IT pros must wait a little longer before they can actually test them out.
5. Sapphire attendees take long view despite S/4HANA roadmap gaps – Jim O’Donnell (SearchSAP)
SAP Sapphire attendees expressed long-term confidence in the company’s S/4HANA platform, but also said the transition will not be simple to make.
Data Science image via Shutterstock
By James Kobielus (@jameskobielus)
What minimal qualifications do you actually need to call yourself a data scientist these days?
The cynical answer would be: whatever you can get away with. But the fuller answer would be: whatever the market can bear.
By the latter I’m referring to the core principle of a dynamic free-market economy: supply will emerge, by hook or crook, to satisfy demand.
A realist would admit that–even if you have no experience, qualifications, certification, track record, or any other objective evidence that you are on some level a competent data scientist—you can plausibly, without fraudulent intent, call yourself one if you feel that you’re up to the challenge. That’s called “marketing.” And if someone else accepts your self-definition and opts to engage you in an initiative in which you’re expected to deliver on your offer of data scientific services, you are doubly entitled to call yourself one. That’s called “sales.” And if you can indeed, in spite of all appearances, provide data scientific services that they find acceptable, and if you can collect monetary compensation in the process, you’re indeed a legitimate, professional data scientist.
Those are the minimal qualifications for calling yourself a professional in any line of work in a free economy. Of course, there may, in various lines of work and various jurisdictions, be plenty of formal degrees, certifications, union memberships, and other hoops you may need to jump through before you’re entitled to call yourself, say, a doctor, lawyer, certified public accountant, or air-traffic controller. There are few such hoops in the data-science profession so far.
Likewise, there are in various professions codes of conduct that are widely accepted and constrain the extent to which anybody can pass off shoddy work as the valid output of a competent professional. As I noted here, various individuals and groups in the data science profession have proposed such codes of conduct, though none is universally recognized. And none of the ones that have been proposed is being used in any concerted fashion to limit who may or may not market themselves as a data scientist.
To the extent that employers adopt hiring practices that incorporate minimal qualification criteria from these or other codes of conduct, you wouldn’t be able to market your so-called data science bonafides to any of them until such time as you conform to those. Clearly, the self-taught, self-appointed data scientists among us must face up to that challenge if they wish to make steady careers in the profession.
Given the persistent undersupply of qualified data scientists to meet growing demand, the autodidacts (who can actually deliver the goods) will be able to prosper in today’s big-data-besotted economy. Many of them will avail themselves of the ample free resources to boostrap themselves into this “sexy” profession on the cheap.
All power to them. Being an aspiring data scientist in this day and age is a bit like being an aspiring soldier in a wartime emergency. The fact that you’ve enlisted or been conscripted doesn’t make you up the challenge. The fact that somebody handed you a gun and a uniform doesn’t mean you can be trusted to defend your country. The fact that you can follow orders and march in a straight line doesn’t mean that you’re cut out for this line of work.
But the fact that somebody’s giving you orders and that they’re apparently satisfied enough with what you’re doing means you’re indeed a soldier now. You’re as much a soldier as the major general…or your drill sergeant.
And nobody can tell you you’re not.
RSA Conference image via Shutterstock
Do you believe the RSA Conference is getting too big for San Francisco? Find out in this week’s roundup.
1. RSA Conference 2015 recap: Record attendance, record stakes – Eric Parizo (SearchSecurity)
This year’s RSA Conference once again broke the previous year’s attendance record. Is the show getting too big for San Francisco? Plus key takeaways and final words from our executive editor.
2. Apache CloudStack marches on in OpenStack’s shadow – Trevor Jones (SearchCloudComputing)
What CloudStack lacks in corporate sponsorships it makes up for in user appeal and isn’t going away any time soon.
3. Government agencies struggling with security data analytics – Maxim Tamarov (SearchSecurity)
Security data analytics are a must-have for government agencies to stay one step ahead of cyber attackers, according to a study conducted by MeriTalk.
4. EMC World 2015: Expect backup, flash, future directions – Dave Raffo (SearchStorage)
EMC will roll out products in its Data Domain, XtremIO and ViPR platforms at EMC World 2015, but what about a CEO succession plan?
5. Universal apps, holographic magic at Microsoft Build 2015 – Francesca Sales (SearchCIO)
At Microsoft Build 2015, the software giant showed it’s determined to do for mobile devices what it did for PCs — only this time it’s the services, not the OS, that it hopes will be the cash cow.
Security attack image via Shutterstock
Are long-duration security attacks the norm now? Find out in this week’s roundup.
1. Long-duration advanced persistent threats now the norm, say experts – Michael Heller (SearchSecurity)
Threat experts at RSA Conference 2015 say today’s most dangerous attack techniques reflect a shift toward long-duration attacks that are often nearly impossible to detect.
2. Spectra Logic Corp. upgrades BlackPearl with more cache, object support – Garry Kranz (SearchDataBackup)
Spectra Logic has tweaked S3 commands on BlackPearl archiving gateway to enable bulk transfer of objects to backend LTFS libraries.
3. Report shows increased failure rates for ERP implementations – Jim O’Donnell (SearchManufacturingERP)
The 2015 ERP report from Panorama Consulting Solutions also shows SAP outpacing Oracle and Microsoft on buyer shortlists.
4. New data center cooling systems slowly displace CRAC – Robert Gates (SearchDataCenter)
As CRAC remains a top data center cooling option, cheaper and more environmentally friendly options will continue to emerge.
5. Google reports 12% revenue increase – Kayleigh Bateman (ComputerWeekly)
Google misses analysts’ $17.5bn target but reports 3.7% profit increase through growth in mobile ad sales.