Apache flink is the powerful open source platform which can address following types of requirements efficiently:
- Batch Processing
- Interactive processing
- Real-time stream processing
- Graph Processing
- Iterative Processing
- In-memory processing
Data scientists are big data wranglers. They take huge amount of messy data points (unstructured and structured) and clean, massage and organize them with their formidable skills in math, statistics and programming. Then they apply all their analytic powers to uncover hidden solutions to business challenges and present it to the business. In other words, Data scientists utilize their knowledge of statistics and modelling to convert data into actionable insights about everything from product development to customer retention to new business opportunities.
Data Scientist need to have both technical and non-technical skills to perform their job in an effective manner. Technical skills are involved at 3 stages in Data Science. They include:
- Data Capture & pre-processing
- Data Analysis & pattern recognition
- Presentation & visualization
This Hadoop tutorial provides thorough introduction of Hadoop. The tutorial covers what is Hadoop, what is the need of Hadoop, why hadoop is most popular, Hadoop Architecture, data flow, Hadoop daemons, different flavours, introduction of Hadoop componenets like hdfs, MapReduce, Yarn, etc.
Hadoop is an open source tool from the ASF – Apache Software Foundation. Open source project means it is freely available and even its source code can be changed as per the requirements. If certain functionality does not fulfill your requirement, you can change it according to your need. Most of Hadoop code is written by Yahoo, IBM, Facebook, Cloudera.
It provides an efficient framework for running jobs on multiple nodes of clusters. Cluster means a group of systems connected via LAN. Hadoop provides parallel processing of data as it works on multiple machines simultaneously.
It is inspired by Google, which has written a paper about the technologies it is using like Map-Reduce programming model as well as its file system (GFS). Hadoop was originally written for the Nutch search engine project when Doug cutting and his team were working on it but very soon, it became a top-level project due to its huge popularity.
Hadoop is an open source framework which is written in Java. But this does not mean you can code only in Java. You can code in C, C++, perl, python, ruby etc. You can code in any language but it is recommended to code in java as you will have lower level control of the code.
It efficiently processes large volumes of data on a cluster of commodity hardware. Hadoop is developed for processing of huge volume of data. Commodity hardware are the low end hardware, they are cheap devices which are very economic. So hadoop is very economic.
Hadoop can be setup on single machine (pseudo distributed mode), but real power of Hadoop comes with a cluster of machines, it can be scaled to thousand nodes on the fly ie, without any downtime. We need not make any system down to add more systems in the cluster.
Hadoop consists of three key parts – Hadoop Distributed File System (HDFS), Map-Reduce and YARN. HDFS is the storage layer, Map Reduce is the processing layer and YARN is the resource management layer.
Let us now understand why Hadoop is very popular, why Hadoop has captured more than 90% of big data market.
Hadoop is not only a storage-system but is a platform for data storage as well as processing. It is scalable (more nodes can be added on the fly), Fault tolerant (Even if nodes go down, data can be processed by other node) and Open source (can modify the source code if required).
Looking back at the DevOps Enterprise Summit in San Francisco, there was a wealth of speakers representing a wide range of organizations from vendors and enterprise users to subject matter experts. The varied panel of guests spoke about how DOES has evolved over the past few years, offered industry and technical insights into how DevOps is intersecting with the enterprise, and revealed what’s on the cutting edge of this concept. Here are some tidbits from four of the popular speakers at the conference.
Cloud and DevOps march forward together
Trace3 Principal Technologist George Kobari pointed out the rather obvious reason why DOES is becoming ever more popular. “A lot of enterprises today are realizing they must be in the DevOps Space or their businesses will not survive.” They are making this decision in concert with some other big changes. “From a technology standpoint, a lot of them are just getting to the point of using cloud. That’s relevant for DevOps because you have to deploy to a cloud. That’s the foundational layer they are sitting on. To capture growth, it’s necessary to take advantage of infrastructure on demand.”
Is DevOps impossible without the cloud? Not everyone agrees. Some would say that it is indeed feasible to implement DevOps on premises utilizing many of the same tools such as Puppet and Chef. But there’s certainly agreement that Cloud enables the process in a way that’s difficult to achieve otherwise. For enterprises that often have a mix of on-premise and cloud resources, the goal should be to implement DevOps principles across the organization, leveraging the additional benefits of cloud where possible.
The database is the new application
Robert Reeves, CTO at Datical brought up an interesting point about where DevOps stands to make the greatest strides in the next few years. “The application is the first place to implement DevOps since it involves the most people and gets the most attention. But once you automate that and bring DevOps to it and are moving the entire code from Dev to Test to Production, then you look for the next thing.”
According to Robert, that next thing is the database. “The database does become the bottleneck once you have brought DevOps to the application.” Ideally, it should be possible to bring automation and efficiency to the database using similar principles. However, applications don’t have state to worry about. With continuous deployment to an app server, it is fine to simply blow away the old version or roll back to a previous version as needed. It doesn’t matter so much what the app did yesterday, it matters that it is doing the job right now.
This approach isn’t possible with a database since consistency and accuracy of the data itself over time is critical. Datical aims to provide better tools for DB DevOps. These include a forecast feature that allows developers to preview a change without actually making it, a rules engine that automates without anyone watching and enforces standards such as naming conventions, and a deployment packager.
Tooling for DevOps
Electric Cloud CEO Steve Brodie spoke about the increased interest of large enterprises in the latest approaches to development and deployment. “If you look at the enterprise, they have some legacy apps that are still monoliths and some things they are starting to do with microservices only—and hybrids that they are refactoring with some traditional architecture paired with microservices and containers.” They need plenty of flexibility in tooling to accomplish everything on this continuum.
To enable DevOps in this space, Electric Cloud seeks to model containers as first class citizens and orchestrate them through the pipeline on their own or with other components. Adding an abstraction layer also allows enterprises to deploy to Kubernetes, Amazon, or Docker Swarm with equal ease. Just as with other aspects of infrastructure, allowing Dev and Ops to focus solely on the app without worrying too much about configuration helps streamline DevOps for the enterprise.
Additional industries are showing interest in DevOps
Electric Cloud Author Chris Fulton mentioned financial services as one example of a vertical that is showing increased interest in DevOps. Requests for consultations from these prospective clients is leading to some interesting discussions. The scope of the conversation has to range far beyond software and into very specific business processes. “We haven’t really thought a lot before about how DevOps works with processes. When you’ve got all these legacy processes that you follow along with a bunch of government restrictions, how do you do DevOps in that environment?”
The speed of DevOps may never be as lightning fast in FinServ as it is in other, less regulated industries. But the fact that the underlying principles and tooling promotes better quality of code, easier rules enforcement, consistency in processes, and more visibility into what’s going on with code, it may well end up being an excellent match. In fact, next year’s DevOps may include some interesting stories and case studies from an even wider range of clients in unexpected industries.
Why do DevOps initiatives sometimes fail, and how can they be more successful? Gene Kim, author of The Phoenix Project, admitted that most of the stories that get told and retold about DevOps transitions are glowing successes. This survivor bias means that the fiascos don’t get all the attention they deserve. Glossing over the disasters makes it hard to assess the overall state of DevOps. Some degree of failure is actually very common on the road to DevOps. Kim admitted, “That may actually be a more telling indicator of the movement than the ones that actually survived.” Of course, there have been enough problems for experts to have a good idea of what typically goes wrong.
What makes DevOps implode?
Scott Wilson, Product Marketing Director of Release Automation, touched on why and how DevOps initiatives fail during his presentation. Despite the idealistic notion of Dev and Ops traipsing through a meadow hand in hand in a united utopia, the reality is quite different. The primary reason DevOps fails is because Ops gets left out. Dev is very Agile and has all the cool tools and processes. But they are still throwing code over the wall for deployment. According to Wilson, “We need to focus on making Ops more Agile.” Failing to respect and invest in the role of Ops is a fatal error. When Dev gets all the attention, “You reinforce the wall, especially if Ops is using different deployment tooling or automation mechanics.”
Open source addiction is also to blame
Another key reason that DevOps can fail in a typical enterprise is because of an enthrallment with open source. It’s easy to become dependent on open source in the DevOps world because it is such an integral part of the overall culture. But the DIY effort and security failings that come along with open source don’t always make it a good fit for every business model—even in an era where “every company is a software company.”
In practical terms, “If you are an insurance company, you generate revenue by selling insurance policies. Do you really want to install a bunch of open source software that you have to maintain, that you have to write the glue for and do the API plugwork and then make certain to update to the latest libraries to shield against vulnerabilities?” Scott advocated a balance of open source and vendor supplied code to relieve some of this unnecessary burden on internal DevOps teams.
Predictors of success in DevOps
Although getting teams to “buy in” and support this type of transition is important, popular enthusiasm is clearly not sufficient to effect change at the enterprise level. Wilson pointed to Heather Mickman’s account of transitioning to DevOps at Target. “The real progress was when the new CIO came in.” This senior executive had the clout and vision to roll out DevOps across the entire organization. This seems to be a typical story.
The IT Skeptic, Rob England, agreed. As a consultant, he has noticed that it usually takes some kind of moment of reckoning for upper management to step in and claim ownership of change. Then, Rob recommended pointing to the DevOps efforts that have been happening in small teams at the grassroots level as an example of how to do things better on the big stage. “You can use those quick wins to drive change.” For the enterprise, DevOps may start at the bottom, but it gets its staying power only when it gains support from the top. When an enterprise fully commits as an organization, that’s when things really start to work.
Enter PostOn 8th Nov-16, when honourable Prime Minister of India Narendra Modi began his first ever televised address to the nation, there was great curiosity among the people to know what it was all about. But suddenly there was a shock to the nation when PM announced that from the same day the Rs.500 and Rs.1000 currency notes would be discontinued to track black marketers and the black money they carry. He also announced that anyone having money in these denominations can get them exchanged or deposit them in their accounts with some limitations that were also declared
Now the question arises – How is the government or IT department going to track the black money that has been deposited and how will they segregate black money holders from genuine tax payers? With more than 1.25 Billion populations and 100s of millions bank accounts, it is a big question that how IT department will find out discrepancies? Similar to Software industry Income Tax department is also going to use latest and hottest technology Big Data