The Apache Hadoop distributed file processing system has benefits and is gaining traction. However, it can have drawbacks. Some organizations find that starting up with Hadoop requires rethinking software architecture and that acquiring new data skills is necessary.
For some, a problem with Hadoop’s batch-processing model is that it assumes there will be downtime to run the batch in between bursts of data acquisition. This is the case for many businesses that operate locally and have a large number of transactions during the day, but very little (if any) at night. If that nightly window is large enough to process the accumulation of data from the previous day, everything goes smoothly. For some businesses though, that window of downtime is small or non-existent and even with Hadoop’s high-powered processing, they still get more data in one day than they can process every 24 hours.
For organizations with small windows of acceptable, an approach that adds components of stream-based data processing may help, writes GigaSpaces CTO Nati Shalom in a recent blog on making Hadoop faster. By constantly processing incoming data into useful packets and removing static data that does not need to be processed (or reprocessed) enterprise organizations can significantly accelerate their big data batch processes. – James Denman
Later this month, experts and authors from around the globe will gather in London for the fifth annual SOA, Cloud and Service Technology Symposium. This year’s conference agenda reflects aspects of the progress of SOA – both subtle and profound.
In reviewing this year’s submissions, some vivid trends emerged, said Thomas Erl, prominent SOA author, educator and conference chair. “Many sessions are about the convergence of different areas,” said Erl, noting the original event covered SOA, then it covered SOA and cloud computing, and now it has broadened further.
“As you go through all the submissions, you kind of witness an evolution in the industry. It is a reflection as to where the industry itself is going,” he said. As the naming of the event suggests, Erl sees an emerging field that can be called “service technology.”
“In the early days of SOA, people associated SOA with Web services. There was a communications barrier [with people] who thought it was just a way of implementing Web services,” he said. “Now we are seeing many more sessions that look at how [cloud, SOA and services] are applied together, and what the implications are.”
The Symposium, set for Sept 24 – 25 at Imperial College, is slated to cover a broad variety of SOA and cloud-related topics as well. Among scheduled sessions are “Lightweight BPM and SOA,” “Moving Applications to the Cloud: Migration Options,” and “The Rise of the Enterprise Service Bus.” Also on the agenda is a series of on-site training and certification workshops. Billed as “bootcamp-style training sessions,” the workshops will provide preparation for a number of industry-recognized certifications, including SOA architect and cloud technology professional programs.
A key aim of the conference is to offer SOA, cloud computing and service technologies practitioners a look at real-world implementations and field-tested industry practices. However, the event will also cover emerging trends and innovations in the space. Continued »
As more enterprises set their sights on Hadoop’s capabilities, new products aim to ease Hadoop integration. Progress DataDirect’s Connect XE for ODBC driver for Hadoop Hive is an example. It boasts scalable connectivity for multiple distributions of Hadoop.
Enterprises looking to carry out additional analysis of data contained in the Hadoop-based store need a reliable connection to their existing predictive analytic and business intelligence tools. That can prove challenging, especially when dealing with multiple versions of Hadoop—distributions include Apache Hadoop, MapR Apache Hadoop, Cloudera’s distribution of Apache Hadoop and others.
“If I’m an [independent software vendor] and I want to onboard Hadoop as a supportive platform, I can either write a bunch of custom code for each specific flavor of Hadoop that I want to talk to—which has massive cost to it, massive complexity and issues related to support—or I can try to piece together some support matrix with the existing technology that’s out there for connectivity,” said Michael Benedict, vice president and business line manager, Progress DataDirect.
The company’s newest driver provides enterprises with another option. “Customers can plug in our driver under their normal code maps [to] applications that already support ODBC today, and they are able to take advantage of Hadoop for all of their customers,” Benedict explained.
The driver offers support for several common Hadoop distribution frameworks, including Apache, Cloudera, MapR, and Amazon EMR. At the same time it provides Windows, RedHat, Solaris, SUSE, AIX, and HP-UX platform support. According to Benedict, the release of this driver reflects a growing need to analyze and process big data.
“[Enterprises are] consuming, analyzing and taking action on a much larger set of data than they have in the past,” he explained. “The reason why that’s changed is that, while you could store that data in the past, you just couldn’t really do it cost effectively. Big data/Hadoop allows you to do it in a slightly more cost-effective manner. Plus you’ve got a lot of technology that’s being built around this to enable you to better monetize and take action on data.”
By offering one unified driver, Progress DataDirect says it is filling demand for better connectivity to all the major platforms supporting the major distributions of Hadoop. Set to ship at the end of October, preview access to the product is now available on a limited basis to current customers. -Stephanie Mann
In a recent report on the state of Java, IDC analyst Al Hilwa notes that the Java ecosystem is healthy and on a growing trajectory, with more programming languages than ever now hosted on the Java Virtual Machine (JVM). Hilwa, program director for application development software at IDC, gives credit to Oracle for a mostly successful custodianship of Java, since its acquisition of Sun Microsystems two years ago.
There are some clouds on the horizon, as could be expected for a language and architecture that has been atop the heap of enterprise middleware for so many years. Writes Hilwa: “Java is under pressure from competing developer ecosystems, including the aggressively managed Microsoft platform and ecosystem and the broader Web ecosystem with its diverse technologies and lightweight scripting languages and frameworks.”
While looming lightweight languages, frameworks and runtimes do portend a new state of Java , Java’s ability to evolve to absorb new technologies has indeed proved remarkable to date. There is reason to believe there is still more to come.
Q: What is a data scientist? A: It’s a DBA from California. The joke belies the fact that the world of big data skills right now is pretty much topsy-turvy. If you would you like to look at a short list of skills associated with big data initiatives, you are out of luck. Try a long list instead.
The skills list – courtesy of the IT skills specialists at Foot Partners, LLC – includes Apache Hadoop, MapReduce, Hbase, Pig, Hive, Cassandra, MongoDB, CouchDB, XML, Membase, Java, .NET, Ruby, C++ and more.
Further, the ideal candidate needs to be familiar with sophisticated algorithms, analytics, ultra-high-speed computing and statistics – even artificial intelligence. The needs of big data, which arise in part from modern computing’s ability to produce more and more bits and bytes, mean that developers have to hone their skills significantly. Suddenly, SQL-savvy developers have to obtain NoSQL skills.
New technology like Hadoop is so raw that the developer is often forced to create his or her own software tools, which is a skill in itself. Writes the Foote crew:
Hadoop is an extremely complex system to master and requires intensive developer skills. There is a lack of an effective ecosystem and standards around this open source offering and generally poor tools available for using Hadoop.
Foote warns that there is only more of the same to come, especially as unstructured data from sources such as sensors and social media pile up in the in-bin. Note to big data scientists of tomorrow: get ready for the deluge! – Jack Vaughan
Summer is known for vacation and relaxation, as many TV commercials attest. It also can be a time of unrest and revolution, as U.S. and French history attest. Maybe the season explains the timing of some upheaval in the fledgling field of open APIs.
Recent weeks have seen clamor in the ranks of the OAuth API standardization effort, as well as a high-visibility launch of an alternative to Twitter APIs. In the first case, an OAuth originator took exception at changes proposed for Version 2.0. In the other case, a West Coast start-up took on Twitter, promising a non-ad-supported social media platform based on an open Web API. A sidebar to all this is the earlier craigslist mini-brouhaha surrounding its attempts to close up its data listing URLs that are being repurposed by Web API-wielding third parties.
Over the weekend, the potential Twitter alternative known as App.net garnered considerable attention by enlisting developers at as much as $100 a pop to sign up for its paid mobile app service. The company had well exceeded its $500,000 seed goal as of August 13. On one level it can be seen as an effort to enter the void caused by Twitter’s recent back-tracking on some of its API openness. On another level App.net can be seen as an affront to Twitter’s growing reliance on advertising for revenue.
It was in the wake of Twitter’s efforts to ensure that its APIs maintain a “consistent set of products and tools” that App.net co-creator Dalton Caldwell blogged about what Twitter could have been. (He’d early attacked Facebook.) He saw the Twitter API originally as a real-time protocol, one that became tainted by Twitter’s advertising model. Subsequently, App.net launched its online promotion, which seemed somewhat akin to crowd-funding undertakings such as Kickstarter.
Dalton Caldwell, who began his career at SourceForge, has seen the upside and downside of technology. His present company, Mixed Media Labs has focused increasingly on its App.net developer store, now pitched as a social platform, as backing has run out for its Picpiz picture sharing site, now shut down. In effect, he has ridden the swells of the open API trend, and found a way to get mobile app developers to pay to be part of the App.net effort.
These doings – both Twitter’s and Mixed Media’s – don’t much clarify the trajectory of that recently born technology known as the open, Web or public API.
An era of an open, programmable Web may come about if non-commercial standards can be agreed to. Oauth 2.0 will provide a testing ground for that. But, Caldwell’s App.net does not forgo commerce altogether – his business plan merely pledges to forgo advertising commerce.
It is early for open APIs. Companies that use Web APIs as part of their business will no doubt take a one-step-forward/one-step-backward approach. They will be eyeing the open API effort but continuing to use Twitter APIs where appropriate. What do you think?
The once staid and steady field of banking is looking for greater agility in rolling out new software and services. As a result, Enterprise Service Buses (ESBs) are being deployed as part of efforts to streamline operations. By way of example there is Federal Bank of Bombay, India.
The large commercial bank will be using the Fiorano ESB in an attempt to modernize its operations, according to Fiorano. Before deploying the Fiorano ESB, Federal Bank was already using over 30 retail banking related applications from various vendors, including Infosys’ Core banking platform FINACLE, running on mixture of hardware including IBM AIX servers. The bank’s deployment of the Fiorano ESB is part of a further plan to expand the number of value added services available to its customers.
In a statement, K.P. Sunny, head of IT at Federal Bank, explained that the Fiorano ESB was chosen due to its “architectural simplicity which allows the Bank to put in place a flexible architecture that will scale linearly and allow business decisions to be speedily implemented at the IT level.”
The bank hopes the choice will result in savings in maintenance of their current integration code, as well as increased reliability and security. The ESB is expected to power deployment of out a variety of value added services through multiple delivery channels. Those channels include ATMs, kiosks, hand-held devices, mobile and Web. -Stephanie Mann
While much discussion these day centers on APIs, some players suggest that the Web is all the API you really need. That could be said to be part of the thinking behind the Kapow Katalyst Application Integration Platform 9.0 from Kapow Software.
Be prepared to add another “as a” to the list of Software as a Service, Platform as a Service and so on. Kapow describes its latest edition as the first software platform of its kind to feature “Integration-as-a-Self-Service” through the introduction of lightweight end-user apps. These apps are dubbed ”Kapplets.”
Kapow was early to field a software type that was often described as the “enterprise mashup.” Using XML, it created a Web data extraction tool for getting catalog and other data from the Web, and performing useful transformations on that data. Kapow continues to add to its product base.
Customers such as car maker Audi are able to use the software to generate real-time feeds to an in-car multimedia system without creating dependencies on individual information providers’ custom APIs, according to Rick Kawamura, vice president, marketing, Kapow.
Big consultancies have a tendency to code to APIs, but this doesn’t scale in an era of ‘get it done quick,’ said Kawamura, who sees mobile, social, cloud and big data changing the workplace dynamic. However, IT still does have an important role, at least for now.
Kapow Kapplets are said to put big data more directly into the hands of business users, but the business leaders are required to describe what they need to their IT department, which then uses Kapow Katalyst 9.0 to integrate data and applications. IT then makes Kapplets – displayed as clickable icons – available to workers as part of automated workflow. This lets employees, customers and partners control and run “the automation and integration of disparate systems and data sources.”
Clearly we are in the era of the programmable Web. It may take many forms. Will it be so easily programmable that any interested business person someday could develop against it? What do you think? – Jack Vaughan
Update – Integrated process, rules and event capabilities are something of a holy grail for today’s high-powered corporations. Such capabilities are being pursued in open source software. For example, over the years, Red Hat’s Middleware division has expanded its JBoss server line, adding software to enable these advanced efforts. Last month, Red Hat went live with JBoss Enterprise BRMS 5.3, supporting business process automation and intelligent decision and event processing for applications running on the JBoss Enterprise Application Platform 6 and JBoss Enterprise SOA Platform 5.3.
The new release, as discussed in Boston at Red Hat’s recent JBoss World event by Ken Johnson, director of product management, also adds Apache Camel integration, improved data services and additional messaging support. Combining BPM software, rules and complex event processing is a worthwhile goal, indicated Maureen Fleming, vice president, BPM and middleware research programs, IDC, in a prepared statement.
For some time, SearchSOA.com research has indicated that making SOA, BPM and event processing work together is a challenge. [Ed. Note: See “State of SOA 2010” a downloadable PDF) Recently, we asked James Taylor how important connecting these disciplines was today. Taylor, an independent consultant specializing in decision making strategies, has been a long-time observer of business rules trends.
“Using business rules to automate decisions and integrating decisions with events and processes as decision services is essential for organizations trying to add agility to their systems,” Taylor said in an e-mail. “Bringing business rules management to the heart of the enterprise architecture stack is increasingly critical,” he continued.
Red Hat/JBoss has pulled a lot of tools together of late. Are there areas that need more attention still, we asked Taylor, who responded, “Decisions also involve analytics and it would be great to see some more options for bringing [in] analytic techniques.”
“RedHat has some interesting work going on with optimization and business rules, which is great, and I would like to see this extended to include data mining and predictive analytics,” said Taylor. – Stephanie Mann and Jack Vaughan
An odd phenomenon these days is the consumerization of IT, which WhatIs describes as the “blending of personal and business use of technology devices and applications.” Today’s armies of mobile device wielding business users are the most striking symbol of IT consumerization. But it is really not so new. People old enough to remember that Mork came from Ork can recall when the PC and the software spreadsheet were smuggled into the office to end the mainframe’s dominance of corporate computing.
Few application development managers are not affected by the mobile tsunami. They are now sorting through the costs and benefits of a new category known as mobile middleware, which has arisen to deal with mobile device diversity. As it turns out, mobile apps are a bigger problem for application development managers than was mobile email. They have to support every conceivable type of endpoint, and select between HTML5, native and hybrid programming schemes.
The PC was a game changer. The same appears to be true of the smartphone, which recently crossed an inflection point, surpassing the desktop PC in unit sales. Equal as influences are social media, open APIs and app stores.
Social media applications that aggregate news and information have caused a big boost in use of integration middleware using “REST” and RSS-style services. SOA laid the foundation, but it is the simple REST version of SOA that is carrying integration development forward today, as seen in social media and mobile application development. Basically, REST underlies the big digital consumer success stories called Amazon, Google, Facebook and eBay, and their style of development is now penetrating the established enterprises, and software architects must understand how to build these modern style systems.
Now we are seeing a type of consumerization of IT integration that resembles the open APIs of the big e-commerce and social media sites. The idea is that you publish out APIs that let outsiders hook into your Web versions of your enterprise applications. Some SOA houses are building API management tool sets in response. They want their APIs for B2B to fly off the virtual shelves as the MP3s do at the iTunes store with which consumers are familiar.
Consumerization of SOA integration could be taken more broadly still. Seldom when you are calling, are operators actually standing by. The Web has enabled – some might say ‘condemned’ – the consumer to take over the role of key operator of yore. This requires teams to design and deliver much better applications and application interfaces than ever before. This is becoming more and more true as mobile devices flourish.
Again, aspects of the “new” consumerization of IT can sound like an old story. The notion that end users can, with the right tools, manage to meet the bulk of their programming own needs was heard in the days of the original Visual Basic, Lotus Notes and PowerBuilder. To a point it was true. We hear that now about open APIs. Is it more likely to be more true this time? What do you think? -Jack Vaughan