Big data image via Shutterstock
By James Kobielus (@jameskobielus)
Fending off industry hype requires that we stay focused on the maturity, or lack thereof, of any new technology. Just because pundits, developers, and venture capitalists are currently jazzed by this or that new tech doesn’t mean the bubble is robust enough to withstand full-bore commercialization. Enthusiasm withers fast when people start to realize that the quick riches they expected from promising new technologies may never materialize.
Let’s hope that commercialized Apache Spark offerings start to live up to the incessant hype that touts it as the evolutionary advance beyond Hadoop. It’s a promising technology, but, as we’ve seen with Hadoop, development into an enterprise-grade big-data platform takes years, requires substantial investments across the ecosystem, and may not happen unless the new approach hits does something better and/or cheaper than alternatives. Since the beginning of this decade, the Hadoop industry has steadily addressed those challenges and developed into a substantial and robust big-data platform.
Spark isn’t quite there yet, so we should give it time to come into its own. Almost a year ago, I started to toss my thoughts on Spark into the general pool of big-data punditry. I devoted my first Spark-centric post to a fairly detailed overview of what Spark is, does, and supports. A few months later, I looked at Spark’s evolving role in the hybridized ecosystem of big-data platforms. A few weeks ago, I commented on the arbitrariness of Spark’s inclusion in the Apache Hadoop project’s core scope. On the latter point, Spark’s focus on real-time, streaming, in-memory, and graph-centric machine-learning applications makes it quite distinct from “traditional” Hadoop, though both leverage HDFS as a storage subsystem.
Just as Hadoop’s issues have occasionally eclipsed its strengths in the minds of enterprise IT professionals, Spark’s immaturity is coming into clearer focus. For Hadoop professionals, this recent article reads like déjà vu. It cites the following growing pains with Spark on the road to becoming a robust enterprise-grade platform:
- Lack of long-time, broad, or deep experience with Spark within the IT and big-data professions
- Lack of detailed documentation on Spark that includes in-depth guidance on the toughest technical issues and advanced application scenarios
- Lack of comprehensive tools for managing, monitoring, securing, tuning, optimizing, and recovering Spark jobs and clusters
- Lack of Spark integration with a wide range of middleware and databases
- Lack of broad range of commercial Spark solutions and technical support resources
- Lack of broad API coverage for Spark that includes languages beyond the core of Scala
All of this sounds very much like the Hadoop market 2-3 years ago. Few industry observers doubt that the Spark industry will address each of these issues as the market matures. But, first, Spark must gain mainstream adoption at a reasonably brisk pace in order for that maturation to rise to the level at which Hadoop is now.
Hybrid cloud image via Shutterstock
Do you think IBM can make a mark in the hybrid cloud market? Find out in this week’s roundup.
1. IBM guns for the hybrid cloud market — again – Ed Scannell (SearchCloudComputing)
IBM is going after the hybrid computing market – again — but this time it is armed for bear. Will IT pros actually take notice?
2. Another AWS reboot planned for March – Beth Pariseau (SearchAWS)
For AWS shops, news of another reboot in the EC2 fleet shows the cloud provider is staying on top of its security responsibilities.
3. HP: Threat intelligence sources need vetting, regression testing – Michael Heller (SearchSecurity)
According to HP Security Research, threat intelligence best practices can be difficult to implement, and even the most trustworthy sources must be tested for fidelity.
4. FCC approves net neutrality rules – Katherine Finnell (SearchTelecom)
Internet service providers will be regulated as common carriers under Title II as the FCC approves net neutrality rules.
5. PernixData FVP adds RAM compression to server-side storage cache – Dave Raffo (SearchVirtualStorage)
PernixData Flash Virtualization Platform’s latest version compresses data on RAM, and its server-side caching software pools flash and memory resources.
Google image via Shutterstock
Did you notice Google’s Compute Engine was down last week? Find out all the details in this week’s roundup.
1. Google Compute Engine experiences global cloud outage – Trevor Jones (SearchCloudComputing)
An apparent network connection failure led to a two-hour, cross-region cloud outage for Google Compute Engine customers this week.
2. DH2i first off the line with Windows Server containers – Nick Martin (SearchWindowsServer)
With DH2i’s release of DxEnterprise, IT pros can manage enterprise deployments of Windows Server containers.
3. Carbanak bank malware attack causes nearly $1 billion in losses – Michael Heller (SearchFinancialSecurity)
A malware attack on more than 100 banks around the globe has led to one of the largest bank heist schemes in history, with losses potentially reaching $1 billion.
4. New Cisco collaboration certs expand expertise beyond voice, video – Gina Narcisi (SearchUnifiedCommunications)
The new Cisco collaboration certifications address the need to expand collaboration training for IT professionals beyond voice and video training.
5. EMC Isilon dives deeper into analytics, Hadoop, data lakes – Dave Raffo (SearchStorage)
EMC added a NAS array that can scale to 50 PB and upgraded its operating system to support the latest version of HDFS, plus OpenStack Swift.
CIO image via Shutterstock
What cloud projects do CIOs have in the works? Find out in this week’s roundup.
1. CIOs focus on Office 365, hybrid cloud 2015 – Kristen Lee (SearchCIO)
With 2015 underway, SearchCIO checked in with CIOs to see what cloud projects they have in the works. Many said they are migrating to Office 365; hybrid cloud is also on the horizon.
2. Microsoft issues fixes for Internet Explorer, Group Policy – Toni Boger and Jeremy Stanley (SearchWindowsServer)
Microsoft issued patches across nine bulletins for February’s Patch Tuesday update. The company fixed issues within Group Policy and Internet Explorer.
3. Cisco targets large SDN deployments with Nexus 9000 improvements – Antone Gonsalves (SearchSDN)
Cisco plans to add an industry standard controller to the Nexus 9000 to give customers the option of using the switch for large-scale SDN deployments utilized by carriers and cloud service providers.
4. Box introduces BYOK encryption key management service – Rob Wright (SearchCloudSecurity)
Box will give enterprise cloud data storage customers the ability to control and store their own encryption keys through its new Enterprise Key Management service.
5. Seagate adds EVault backup appliance for on-premises or cloud – Sonia Lelii (SearchDataBackup)
The EVault backup target appliance expands Seagate’s market reach into an area dominated by EMC’s Data Domain products.
Social media image via Shutterstock
Which cloud expert should you follow on social media? Find out in this week’s roundup.
1. Five cloud experts to follow on social media – Nicholas Rando (SearchCloudComputing)
The cloud computing market is growing and evolving at lightning speed. To keep up, follow five of the top cloud experts in 2015 on social media.
2. Same-origin policy IE vulnerability may signal new attack trend – Michael Heller (SearchSecurity)
A new IE vulnerability has led to a proof-of-concept same-origin policy exploit, and some experts say it highlights a technique that may soon become popular among attackers.
3. Cisco unveils high-speed IE switch to drive industrial IoT – Antone Gonsalves (SearchNetworking)
Cisco introduced its first 40 gigabit per second IE switch for manufacturers, energy companies and government organizations that need higher bandwidth on industrial networks.
4. ExaBlox speeds up backups for the Hunger Task Force – Sonia Lelii (SearchDataBackup)
The Hunger Task Force was dealing with long backup windows and an archaic tape-based archiving solution. Exablox’s OneBlox solved those problems for the non-profit.
5. Cisco’s cloud networking play targets hybrid cloud shops – Trevor Jones (SearchCloudComputing)
Cisco’s cloud networking strategy continues to focus on its strengths with a new bundled suite for automation and a shift in licensing.
Programming language image via Shutterstock
By James Kobielus (@jameskobielus)
Data scientists are key programmers in the new era of big-data and cognitive-computing applications. They specialize in those business problems that are addressed in whole or in part through with statistical analysis.
As with any programmer, a data scientist’s core job is to specify the structured, repeatable logic that drives business computing applications. The key practical difference between data scientists and other programmers is that the former specify execution logic that is grounded in probabilistic application patterns. By contrast, traditional programmers specify deterministic application logic, such as if/then/else, case-based and other rules that were deduced from functional analysis of some problem domain.
Data scientists do statistical analysis, which is all about probabilities and uncertainties. An application instantiates a probabilistic pattern when its execution rules incorporate statistical models that are grounded in uncertain inputs (e.g., customer behavioral propensities revealed from historical data) and/or uncertain outcomes (e.g., customer likelihood of accepting specific offers over others within various circumstances).
In keeping with this professional focus, most data scientists use statistically oriented languages, such as especially R, and other analytic modeling tools such as SAS, SPSS and Matlab. In addition, some data scientists may also use probabilistic programming languages, such as those discussed in this recent article.
Probabilistic programming is an emerging new approach that is still unfamiliar to many working data scientists. These specialized languages facilitate the specification of Bayesian reasoning in the programming of machine-learning models for applications with uncertain data or outcomes. To enable this, the languages include operators for inferring probability distributions from uncertain data sets. The languages may support estimation of distributions via sampling; direct computation of them via value flow analysis and other techniques; and/or inference of distribution in spite of the absence of key variables, via machine learning and other approaches.
In a world where more application logic is derived–aka “learned”–at run time from probabilistic patterns found in multistructured data, probabilistic programming is indispensable. Cognitive computing applications, in particular, depend on probabilistic programming to specify, for example, how user experience (UX) interfaces should dynamically adjust to reflect changes in users’ browsing behavior, sentiments, intentions, locations, and myriad other situational variables. Every one of these variables is probabilistic in isolation, and in combination their shifting mosaic may render it build apriori UX logic that optimizes each user’s satisfaction under ever possible dynamic circumstance.
If you’re a working data scientist, you need to incorporate probabilistic programming into your core repertoire. Here’s a good technical paper on the topic for data scientists and other programmers who want to bootstrap their understanding without delay.
Apple image via Shutterstock
Do you think Apple’s biometric data will be secure? Find out in this week’s roundup.
1. Apple eyes cloud storage for Touch ID biometric data – Rob Wright (SearchCloudSecurity)
According to a new patent application, Apple is looking to expand its Touch ID biometric verification system through the cloud. But will the biometric data be secure?
2. Video key to future of Web conferencing services future – Katherine Finnell (SearchUnifiedCommunications)
Millennials, mobile workers driving changes in organizations’ attitudes and usage of Web conferencing services, study shows.
3. FTC urges vendors to create Internet of Things security and privacy controls – Michael Heller (SearchSecurity)
An FTC report urges vendors to be proactive in creating Internet of Things security and privacy controls, while a Tripwire survey shows IoT devices are a growing corporate risk.
4. Private Docker repositories add to Google containers push – Trevor Jones (SearchCloudComputing)
Private Docker repositories are available through the Google Container Registry — a move to help secure and deploy private container images.
5. Survey: Big data projects sneak up on basic BI on IT priority list – Ed Burns (SearchBusinessAnalytics)
TechTarget’s 2015 IT Priorities Survey shows that while businesses are still investing in basic BI and data warehousing capabilities, big data initiatives are becoming almost as prevalent.
Microsoft Windows image via Shutterstock
What has been the biggest success in Microsoft Windows history? Find out in this week’s roundup.
1. Microsoft Windows history: A 30-year timeline – Diana Hwang (SearchEnterpriseDesktop)
Microsoft celebrates 30 years of the Windows operating system when it ships Windows 10 this fall. Here’s a look at Windows history — bumps and all.
2. IBM’s revenues continue their journey south – Ed Scannell (SearchDataCenter)
IBM’s financial woes continue as the company reports down revenues for 2014 with its server hardware business leading the downward trend.
3. Report: Popularity of biometric authentication set to spike – Michael Heller (SearchSecurity)
Juniper Research claims that the popularity of biometric authentication will rise dramatically in the next five years, incorporating innovative technology beyond today’s fingerprint sensors and voice authentication systems.
4. Polycom RealPresence updates include new audio and visual features – Gina Narcisi (SearchUnifiedCommunications)
Polycom has announced new audio and video enhancements to improve the user experience of video conferencing.
5. 2015 outlook in information technology: Growth and more cloud services – Mark Schlack (SearchCIO)
The 2015 outlook for information technology includes higher budgets and an emphasis on cloud, according to TechTarget’s annual IT priorities survey.
IBM image via Shutterstock
Can IBM’s SoftLayer lure customers away from AWS? Check out this week’s roundup to find out.
1. IBM SoftLayer IaaS stands up to AWS with free support, networking – Beth Pariseau (SearchCloudComputing)
IBM’s SoftLayer IaaS offers low-cost networking and free support, tempting some customers away from AWS.
2. Microsoft patches one critical flaw, rolls out new notification process – Toni Boger and Jeremy Stanley (SearchWindowsServer)
January saw a light Patch Tuesday, but Microsoft’s move to discontinue its advance notification service has rankled security researchers.
3. Riverbed appliance heads to the cloud – Antone Gonsalves (SearchNetworking)
Riverbed’s new WAN optimization appliance is aimed at companies with hybrid environments in which applications stretch from the data center to the cloud.
4. Preview of 2015 Verizon PCI report hints at firewall compliance issues – Eric Parizo (SearchSecurity)
In a sneak preview of its 2015 PCI Compliance Report, Verizon says improper firewall maintenance is among the leading causes of PCI DSS compliance failures.
5. CIOs beef up security tools in wake of 2014 data breaches – Dina Gerdeman (SearchCIO)
What’s different about security strategies in the aftermath of the 2014 data breaches? More money, more monitoring, more employee training, and that’s just for starters.
Verizon image via Shutterstock
Will future customers be turned off by Verizon Cloud’s downtime this past weekend? Find out in this week’s roundup.
1. Verizon Cloud off to rocky start with 48-hour downtime – Trevor Jones (SearchCloudComputing)
Verizon Cloud will be down for up to 48 hours this weekend. But with a relatively small customer base, the biggest impact could be on future customers.
2. Sony Pictures hack recap: Experts debate North Korea’s role – Sharon Shea (SearchSecurity)
News roundup: The FBI maintains North Korea was behind the Sony Pictures hack, in spite of naysayers. Plus: Malware campaign attributed to Russia; new Mac OS X bootkit; cyberattack causes physical damage.
3. AWS Spot Instances get two-minute warning – Beth Pariseau (SearchAWS)
The new two-minute warning is a positive move for IT pros using Spot Instances, but some would like to see more changes to the bidding system.
4. International CES 2015: IoT, wearables and robots ready for takeoff – Francesca Sales (SearchCIO)
Will 2015 be the year the Internet of Things takes hold in the enterprise? International CES 2015 attendees took to Twitter to share their observations and predictions about IoT, wearables and even robots.
5. CES 2015: How Intel aims to power the tech revolution – Clare McDonald (ComputerWeekly)
Intel CEO Brian Krzanich told the International CES audience that 2015 will be the year of the next wave of consumer technology.