Coffee Talk: Java, News, Stories and Opinions

Page 1 of 1512345...10...Last »

May 19, 2017  2:12 PM

“AI First” the mantra for Google I/O 2017

BarryBurd Profile: BarryBurd
Uncategorized

Google’s annual worldwide developer conference (Google I/O) kicked off at the Shoreline Auditorium in Mountain View, California on Wednesday morning. Seven thousand people are attending live, and others are viewing the event online at 400 Google I/O Extended events in 85 countries.

This year’s mantra is “AI first,” integrating machine learning with each of Google’s software and hardware products. The keynote was a rapid-fire delivery of announcements:

  • A new initiative named Google Lens integrates visual recognition with user help. Point your phone’s camera to a label containing a network password. Google Assistant enters the password automatically and connects you to the network. Point the camera to a phone number and Google Assistant dials that number. Point to a concert advertisement. Google Assistant plays a sample of the band’s music and offers to book concert tickets.
  • Developing artificially intelligent apps requires two phases — a training phase and an inference phase. In the training phase, the software learns about the problem domain. In the inference phase, the software applies the learning to new situations.
    The training phase is computationally intensive. To address this issue, Google announced its new Cloud TPU, which is available on the Google Compute Engine immediately. The Cloud TPU hardware is optimized for both training and inference, and can deliver a whopping 180 teraflops of computing power. Developers can visit http://g.co/tpusignup to sign up.
  • Google announced its new Google.ai initiative to coordinate AI efforts and teams. The initiative has three parts: research, tools, and applied AI. The research part includes AutoML, in which neural nets design other neural nets. This task is computationally challenging, and Cloud TPU is making it possible.
  • Starting immediately, Google Assistant will accept commands that are typed or tapped as well as spoken. Typing is advantageous because, in public venues, people may not want to speak commands. Typing, tapping and speaking to Google Assistant are all integrated, so an interaction with the Assistant may use all three interaction modes.
  • Google Assistant is now available on the iPhone!
  • Google Assistant is now available in French, German and several other languages. Google Home will launch in Canada, Australia, France, Germany and Japan.
  • Effective immediately, Actions on Google handles purchase transactions. With voice interaction and fingerprint scan, you can use Google Pay. You don’t have to enter an address or credit card number.
  • A new feature in Google Home is called Proactive Assistance. Here’s how it works: Google Home knows about an upcoming event on your calendar, knows where the event takes place, and calculates the travel time to the event given the current traffic conditions. When you say “What’s up?” to Google Home, the device reminds you that it’s time to leave for the upcoming event.
  • In the next few months, Google Home will make no-cost, hands-free calls to any landline within the United States or Canada. Google Home recognizes up to six different voices in a household. So if you say “Call Mom,” the device determines which member of the household is making the request, and calls that person’s mother.
  • Spotify will offer free music service to Google Home.
  • Google Home will have Bluetooth support, so you’ll be able to play music from any Bluetooth enabled device on the Google Home speaker.
  • In addition to its voice responses, Google Home will display information on your phone’s screen and, through Chromecast, on your TV.
  • Google Photos will have three new features. With Suggested Sharing, Photos identified the people in your images and offers to share the images with those people. With Shared Libraries, Photos automatically shares images with certain characteristics to people you select. With Photo Books, you can purchase a hard copy of your best images based on criteria that you specify.
  • In the next few weeks, YouTube will provide 360 degree video on your Android TV. You’ll issue voice commands to request a certain video. You’ll use your remote to move from side to side within the video scene. Live 360 content will be available.
  • Earlier this year, YouTube launched Super Chat where users pay to pin comments on live streams. Users can now trigger physical actions using Super Chat. During the keynote, users paid to drench two fellows known as the Slow Mo Guys with 500 water balloons. All proceeds went to charitable causes.
  • TensorFlow is Google’s machine intelligence software library. With the newly created TensorFlow Lite version of that library, developers can add deep learning capabilities to apps that run on small, mobile devices. Smartphones will become even smarter.
  • Samsung’s Galaxy S8 and S8+ will add virtual reality features using Google Daydream.
  • HTC and Lenovo will use Google Daydream in their standalone VR headsets. All the processing power will be in the headsets. You’ll experience virtual reality without having to attach a cable or a smartphone.
  • The new Android Go initiative optimizes the Android system to run on entry level phones. In this context, an entry level phone is one with between half-a-gigabyte and one gigabyte of memory. As one part of this initiative, a Data Saver feature economizes on the use of network resources by compressing the data that’s being sent. Another part named YouTube Go Offline Sharing saves videos for viewing when the network isn’t available.
  • With Google Expeditions, students experience things that they can’t ordinarily experience in the safety and comfort of their own classrooms. Students move around a room while they look at a tablet device’s screen. The tablet show anything from the terrain in a far away land to a view from inside the human body.
    Later this year, the Expeditions platform will add augmented reality to its repertoire. The tablet’s display will be able to superimpose virtual images onto real objects in the room.
  • A new Android App Directory helps user discover new apps. Users can try an app before buying the app.
  • Google’s Instant Apps API is now available to all Android developers.
  • Many Firebase SDKs will soon be made open-source.
  • The new Play Console Dashboards summarize app diagnostics to help developers analyze and improve their apps. In addition, a developer can add Firebase Performance Monitoring to an app with only one line of code. In addition,
  • For enhanced security, Firebase will include phone number authentication.

Beta availability of Android O

I write books about Android development, so for me, the most interesting announcement was the availability of a beta for the next Android version – codenamed Android O. In this version of Android, developers will be able to write code in the Kotlin programming language. This is a big deal for developers because it’s a departure from Android’s long-standing Java-only tradition.

Kotlin is completely interoperable with Java, so existing Java code will work without modification. New apps can be built using Java, using Kotlin, or using any combination of the two languages. JetBrains (the company that created Kotlin) will work alongside Google to help the language evolve as a language for mobile platform development. Best of all, Kotlin is available immediately in the new Android Studio 3.0.

May 10, 2017  3:46 PM

Java modularity’s future takes a hit as Project Jigsaw (JPMS) is voted down

cameronmcnz Cameron McKenzie Profile: cameronmcnz

Can you believe all of this drama surrounding Project Jigsaw and the Java modularity debate? I was so thankful yesterday when President Donald Trump fired the director of the FBI, drowning out all of the Java Jigsaw finger pointing tweets and usurping them with delightful, 140-character opinion pieces on American politics.

Voting on JSR-376, the Java Platform Module System (JPMS), thirteen JCP members voted ‘no,’ while 10 voted ‘yes.’ Unlike US elections, the Java Community Process (JCP) does not employ an ‘electoral college’ system, so votes are won or lost using an archaic ‘majority wins’ type of system.

JCP votes down JPMS JSR-376

JPMS Java modularity JSR-376 is voted down.

Java Jigsaw’s missing pieces

Part of the Project Jigsaw melodrama came from the fact that both IBM and Red Hat had announced before the JCP vote on Java modularity that they were not going to support JPMS, which is bit of a break from traditional decorum. JCP members don’t typically announce their intentions before a vote happens. Having said that, very few JCP projects are as contentious as Java’s Jigsaw.

“There is still work required to bring the community closer to an agreement on the proposed standard,” said IBM’s Tim Ellision in an April 28th, community email reply to Mark Reinhold, the JCP lead and the highly respected Chief Architect of the Java Platform Group at Oracle. “IBM is also voting ‘no’ which reflects our position that the JSR is not ready at this time to move beyond the Public Review stage and proceed to Proposed Final Draft.” The word ‘also’ in Ellision’s quote refers to Red Hat’s prior announcement that they were not satisfied with the way the Java modularity puzzle was coming together.

“Project Jigsaw’s implementation will eventually require millions of users and authors in the Java ecosystem to face major changes to their applications and libraries,” was Scott Stark’s April 14th bloodletting on what Red Hat perceived as some of the Java Platform Module System’s shortcomings. “Jigsaw’s implementation will eventually require millions of users and authors in the Java ecosystem to face major changes to their applications and libraries, especially if they deal with services, class loading, or reflection in any way.”

All of this public consternation resulted in an impassioned plea from Reinhold to move the Java modularity project forward, despite some of the existing hesitation. “What we have now does not solve every practical modularity-related problem that developers face, but it meets the agreed goals and requirements and is a solid foundation for future work” said Reinhold. “It is time to ship what we have, see what we learn, and iteratively improve. Let not the perfect be the enemy of the good.”

Ulterior motives and Project Jigsaw

Seldom spoken in these public discussions is that there are often very private motives behind the JCP wranglings of the big players. Red Hat has their own, open source modularity project which they use in their Wildfly server. JBoss modules has always competed with Jigsaw. IBM’s WebSphere has always had a long history of supporting OSGi. Who knows how these private interests compete with a company’s public ones.

Falling into the ‘no good deed goes unpunished’ category, many of the JCP committee members who voted in favor of Java modularity found themselves in the unusual position of having to defend the fact that they wanted to move Project Jigsaw forward. Azul’s CTO Gil Tene took to Twitter to defend his company’s ‘yes’ vote. “Can a better module system be built? Yes. So can a much better generics systems. And a better way to do Lambda expressions. And…(sic),” tweeted Tene. “Some will adopt JPMS as it is. Many others won’t until it gets better. And that’s OK.”  Personally, I’m kicking myself a little bit because I had Tene on the phone a few weeks ago talking about the Azul’s Falcon LLVM compiler, and I should have asked him more about the JCP vote. We did speak about project Jigsaw, but it was more in terms of JVM performance and improved startup times, as opposed to the upcoming vote.

Gil Tene defends Azul's JCP vote.

Gil Tene defends Azul’s ‘yes’ vote on JPMS

Java modularity and the OSGi spec

A number of years ago, way back in 2011 to be exact, Peter Kriens was OSGi’s Technical Director, and he penned a few interesting articles about implementing modular systems in Java, many of which sparked intense debate in the TSS forums. We spoke a few times about Project Jigsaw, and while our conversations took place far too far back in the past for me to quote him accurately on anything, the impression he always gave me was that tackling the classloader issue in Java was an incredibly complicated task, that people who try to implement modularity in Java will run into far more unanticipated complications than they could ever have imagined, and that the purveyors of Project Jigsaw were just a little bit naive about how easy or hard it would be to build a system of modularity right inside the JDK. Six years later, it would appear that many of Kriens’ consternations have been borne out.

Classloaders are a mess in Java. Their existence is understandable given the evolution of the JDK, but what was fine back in 1996 is a bit of an embarrassment as we do software development in 2017. I’ve got complete faith in a guy like Mark Reinhold to get the Java Platform Module System back on track. Hopefully that will happen sooner rather than later.

You can follow Cameron McKenzie on Twitter: @cameronmcnz

Interested in more opinion pieces? Check these out:


May 9, 2017  6:12 PM

IBM’s Watson is a joke, and Oracle won’t be ‘winning’ for long

cameronmcnz Cameron McKenzie Profile: cameronmcnz

Hedge Fund manager Kyle Bass used to be my favorite industry analyst, but after watching a short, fire-breathing, three minute clip from CNBC’s Closing Bell, I think Social Capital CEO, Chamath Palihapitiya, may have wrestled away Bass’ title.

“IBM’s Watson is a joke, just to be completely honest” said Palihapitiya when asked about IBM’s hedgeway into the world of artificial intelligence and machine learning, asserting that while Big Blue has done a great job marketing the Jeopardy champion, the fact of that matter is, Google and Amazon have done a far better job amassing reams of big-data, processing it on their systems, and fundamentally understanding it, which after all, is the whole point of artificial intelligence.

IBM no longer an innovator

Given, the Bay Area based search engine giant has never won a game show, but there are indeed other, less scientific metrics that can be used to evaluate AI systems. “Companies advancing machine learning and AI don’t brand it with some nominally specious name, naming it after a Sherlock Holmes character,” said a laughing Palihapitiya. CNBC’s Brutus then stuck the knife deeper into Endicott’s Caesar saying “IBM is a services business. They aren’t building anything. They aren’t innovating.”

For those who love to hate Oracle, the stewards of the JDK weren’t spared the commentator’s scorn either. “Oracle is not a business you can short today, but it is also not a business that is going to win tomorrow,” said Palihapitiya. In the short term, the assertion is that companies like Oracle will keep the coffers full simply by squeezing income out of their existing customers. “It has an unbelievable sales and marketing machinery that will figure out how to tax its existing customers in umpteen numbers of Byzantine ways.”

Profiling the IBM customer

According to Palihapitiya, the problem lies in the fact that the marketing machines of companies like IBM and Oracle are more intelligent and organized than the clients who have to choose between them. “What IBM is excellent at is using their sales and marketing to convince people who have asymmetrically less knowledge to buy something,” said Palihapitiya. “Can you fundamentally be long these two business over the next decade? I think the answer is no.”

It’s a short little three minute video, but the guy simply rains brimstone down on Oracle and IBM. If you like that sort of thing, it’s worth a watch. If not, fee free to go search ‘cat videos’ on YouTube.

IBM’s Watson ‘is a joke,’ says Social Capital CEO Palihapitiya

You can follow Chamath Palihapitiya on Twitter: @chamath
You can follow Cameron McKenzie too: @cameronmcnz

Interested in more opinion pieces? Check out these:


May 8, 2017  3:10 PM

The 12-Factor App is cloud-native development for dummies

cameronmcnz Cameron McKenzie Profile: cameronmcnz

Yegor Bugayenko wrote an amusing blog the other day entitled “SOLID is OOP for Dummies.” Well, if SOLID is OOP for dummies, I wonder if he’d agree with my assertion that the 12-factor app mantra is the dummies equivalent for cloud-native development?

I enjoyed Bugayenko’s article, although it seems like he took quite a bit of flack in the comments for it. But I completely agree with his premise. To me, telling software developers that their apps should follow SOLID principles is like telling a marathon runner that the best strategy for moving forward is to cyclically move one leg in front of the other. Sure, the statement is true, but does something so completely self-evident in its nature actually count as advice?

Revisiting the SOLID principles

Quite quickly, the SOLID principles are as follows:

· Have single responsibilities (S)
· Be object-oriented (O)
· Use polymorphism, or the Liskov Substitution Principle (L)
· Leverage interfaces (I)
· Abstract your code using the dependency inversion principle (D)

So I will see Bugayenko’s criticism of the five tenets of SOLID and raise him a similar criticism of the cloud-native world’s 12-factor app. (By the way, if you’re not familiar with all of the latest catch-phrases, Ken Owens provides a great definition of cloud-native in the article Tying Agile, DevOps, 12-factor apps and cloud native computing together)

Revisiting the 12-Factor App

For the uninitiated, these are the dozen tenets of cloud-native computing’s 12-factor App:

1. Codebase: One codebase tracked in revision control, many deploys
2. Dependencies: Explicitly declare and isolate dependencies
3. Config: Store config in the environment
4. Backing services: Treat backing services as attached resources
5. Build, release, run: Strictly separate build and run stages
6. Processes: Execute the app as one or more stateless processes
7. Port binding: Export services via port binding
8. Concurrency: Scale out via the process model
9. Disposability: Maximize robustness with fast startup and graceful shutdown
10. Dev/prod parity: Keep development, staging, and production as similar as possible
11. Logs: Treat logs as event streams
12. Admin processes: Run admin/management tasks as one-off processes

Self-evident truths

Seriously, do we really need to tell software developers to keep production, dev and staging environments as similar as possible, as app factor ten, dev/prod parity, instructs? Honestly, I can’t ever remember working on a project where the team said ‘hey, let’s make DEV and PROD completely different. Like, let’s use MongoDB in DEV, and DB2 in production.’

App factor one, using one codebase tracked in revision control, hardly seems like a revolutionary concept either, nor does it seem like a principle that any rational, cloud-native software development team would violate. Maybe if they were using MSD, the Masochistic Software Development methodology, they might spread their code across GIT, CVS, Clearcase and PVCS, but I can’t see anyone who wasn’t a sadist doing so.

Factor nine is outright comical. Has anyone ever actually sat down and written out a user-story or non-functional requirement describing how they wanted it to take a long time for the application to load, and they wanted general havoc to ensue when an application gets shut down? The 12-factor app’s factor nine, the disposability principle of maximizing robustness with fast startups and graceful shutdowns would imply that some software development teams weren’t aware that extended start-up times and hanging threads at shutdown were a bad thing.

Breaking the unbreakable

Quite honestly, there are tenets that I don’t even know how to violate. App factor four says backing services should be treated as attached resources. I don’t even know how I would write a cloud-native app that didn’t treat backing services as an attached resource. Isn’t that statement tautological in that just by definition of the fact that it’s a backend resource that it isn’t attached to your cloud-native application?

Maybe it’s because I’ve been developing on Spring and the Java EE platform for the last twenty years that some of these points seem rather superfluous. App factor three instructs cloud-native developers to store configuration details in the environment and not as a set of constants or if-then-else statements peppered into a variety of different classes throughout the code base. I honestly can’t see an experienced professional adding Java code that brackets every class with conditional statements that change the runtime behavior based on which environment is currently hosting the code. Furthermore, storing configuration outside of the application and abstracting away dependencies has always been a basic principle of Spring and Java EE. It’s exactly why resource references and JNDI bindings exist.

Half-baked ideas

And if it’s not Spring or Java EE enforcing these best practices, it’s the application server itself doing so. App factor eleven states that logs should be captured by the execution environment and collated together with all of the other logging streams used by the cloud-native app. I honestly can’t remember a time when WebSphere didn’t do that. Maybe IBM can swallow up all of the little players in the cloud-native computing industry, create a new, cloud-native application server and show the industry how logging is done? That type of functionality has always been baked right into the application server runtime, even if you’re just using System.out calls instead of a proper logging framework like slf4j.

The 12-factor app does provide some food for thought, with most of what’s intellectually edible coming from the insistence that using a process, as opposed to threading, is the best way to scale. Although I would assert that this is really just a semantic argument, as opposed to one grounded in practice. While it’s true that traditional Java application servers ran one process with many threads, Java microservices tend to be single processed, and even within a microservice, there will be multiple functions that leverage threading, so it’s not really an either-or type of thing. If anything, factors six and eight, executing apps as processes and scaling out using the process model, really comes down to the assertion that monoliths are bad, and breaking up a monolith into smaller pieces is good. I think we’ve all heard that mantra being chanted enough lately. Yes, we get it.

Facebook shares and Internet memes

Each maxim of the 12-factor app credo reads to me like something that you’d put on one of those annoying, inspirational posters you find hanging in offices across the country. I can almost visualize a poster of an Italian sports car with the word DISPOSABILITY at the top, and the phrase “Maximize robustness with fast startup and graceful shutdowns” at the bottom. We do live in an age where in order to be consumed, every message must be delivered as a meme, or as something that can be shared as a Facebook post or delivered in a 140 character tweet. Maybe it’s the new developer, the social-media millennial, who the 12-factor app mantra is catering to?

12-factor app and cloud-native meme

The 12-factor app as an Internet meme

Telling me to eat less and exercise more in order to lose weight is pretty useless advice. Telling me to pay my taxes on time and avoid interest and penalties is useless advice too. I mean, the statements are true, but it’s nothing I don’t already know, which makes stating these tautological statements a waste of time. You can frame these statements as advice, but they really aren’t advice if they provide nothing new, nothing of added value and nothing that’s particularly actionable. I can’t help but feel as though the entire discussion about building a 12-factor app falls into the same category. I wonder if Yegor Bugayenko would agree with me?

Here’s Bugayenko’s article: SOLID Is OOP for Dummies

You can follow Yegor Bugayenko on Twitter: @yegor256
You can follow Cameron McKenzie too: @cameronmcnz

Tweet this article!

Interested in more opinion pieces? Check out these:


May 5, 2017  4:50 PM

Can JVM performance in the cloud really compete with bare-metal metrics?

cameronmcnz Cameron McKenzie Profile: cameronmcnz

I always like talking to Gil Tene, the CTO of Azul Systems.

Before we jump on the phone or sit down for a talk, his team usually sends me a bunch of boring slides to go through, typically boasting about their forthcoming Zing or Zulu release, along with their latest JVM performance benchmarks. But it’s been my experience that if I can jump in early with technical questions before Tene starts force-feeding me a PowerPoint presentation, I can hijack the call and get some interesting answers to some tough technical questions about Java and JVM performance. The CTO title often infers that you’ll be chatting with a suit without substance,  but Tene is a Chief Technical Officer with some real ‘down-in-the-weeds,’ erudite knowledge of how high-performance computing works.

A clever marketing opportunity missed

The reason for our latest talk was Azul System’s 17.3 release of Zing, which includes the LLVM based Just-in-Time compiler. Code-named Falcon, in my estimation, Azul Systems really dropped the ball here in terms of marketing. Instead of calling it Falcon and doing the release on May 2nd, they could have given it the moniker The Millennium Falcon and released it on May the 4th. Now that would have been clever, but sadly, that opportunity has been missed.

falcon

Prognosticating the fate of the JVM

Before being subjected to Tene’s PowerPoint presentation, which would invariably extol Zing and Falcon’s latest JVM performance metrics, I figured I’d bear-bate Tene a bit by suggesting that it must be tough working in a dying industry.

Anyone who’s been paying attention to the cloud computing trend knows that everyone is now architecting serverless applications written in either Golang, Node.js, Python or some other hot non-JVM language, eliminating the need for a product like Zing or Zulu. And all of these serverless applications are getting deployed into pre-provisioned containers running in cloud based environments where there’s no longer a need to install a high-performance JVM. After all, nobody’s going out and buying bare-metal servers anymore, so there must be a decline in people purchasing Zing licenses or downloading Zulu JVMs, right? Tene wasn’t biting.

“A lot of what we see today is virtualized, but we do see a bunch of bare-metal, either in latency sensitive environments, or in dedicated throughput environments.”
-Gil Tene, Azul Systems CTO

“Where the hardware comes from, or whether it’s a cloud environment or a public cloud, a private cloud, a hybrid cloud, a data center, or whatever they want to call it, we’ve got a place to sell our high-performance JVM,” said Tene. “And that doesn’t seem to be happening less, that seems to be happening more.”

To be honest, I was hoping Tene would inadvertently slip and reveal some closely guarded secret about how his team had discovered a trick to unlocking infinite performance capabilities by using Zing and the cloud in some weird and unusual way. Conversely, it seems that Zing in the cloud isn’t that much different from Zing in the local data center. “Most of what is run on Amazon today is run as virtual instances running on a public cloud,” said Tene. “And they end up looking like normal servers running Linux on x86 machines, but they run on Amazon. And they do it very efficiently, every elastically and they are very operationally dynamic. Zing and Zulu run just fine in those environments. Whether people consume them on Amazon or Azure or on their own servers, to us it all looks the same.”

Squaring the JVM performance circle

Of course, when I talk to Tene about things like Falcon, Zing and Zulu, it’s JVM performance that tends to be the central theme of the talks, which is a concept that would appear to run counter to the concept of containers and virtualization. To me, that’s a difficult circle to square because on the one hand, you are selling JVM performance, yet on the other hand the deployment model incorporates various layers of performance eating abstraction with Docker, VMWare, hypervisors and all of the other obnoxious layers of indirection that cloud computing entails. After all, if peak JVM performance is the ultimate goal, why not just buy a massive mainframe, or even clustered commodity based hardware, and deploy your LLVM based JIT compiler to some beautiful bare-metal?

“A lot of what we see today is virtualized, but we do see a bunch of bare-metal, either in latency sensitive environments, or in dedicated throughput environments,” said Tene. A hybrid cloud environment leveraging both public and private cloud might use a dedicated, bare-metal machine for their database. Low-latency trading systems and messaging infrastructure are also prime candidates for bare-metal deployments. In these instances, JVM performance is a top priority, and a Zing instance would run on what historians refer to as “a host operating system” without any abstraction or virtualization. “They don’t want to take the hit for what the virtualized infrastructure might do to them. But having said that, we are seeing some really good results in terms of consistency, latency and JVM performance just running on the higher end Amazon instances.”

JVM performance and bare-metal computing

Maybe it’s because my teeth were cut on enterprise systems that weren’t virtualized, but the discussion of Azul’s high performance JVMs running on bare-iron warms my cold heart. It’s good to know that when it comes to JVM performance, there are still places in the world where cold steel trumps the hot topic of containers and cloud-native computing. And I refuse to believe Tene’s assertion that some of the high-end Amazon instances provide JVM performance metrics that approach what can be done with bare-metal. Assertions like that simply don’t fit with my politics, so I refute them.

You can follow Gil Tene on Twitter: @giltene
You can follow Cameron McKenzie too: @cameronmckenzie

Tweet this article!

Interested in more opinion pieces? Check these out:


April 30, 2017  11:58 PM

List of Best Hadoop Administration Books.

AmanMaheshwari Profile: AmanMaheshwari
Uncategorized

Best Hadoop administration Books

So let us see various books being suggested by experts for learning
Hadoop admin tasks to land in your dream company and perform all Hadoop admin roles and responsibilities.

1) Expert Hadoop Administration: Managing, Tuning, and Securing Spark, YARN, and HDFS By Sam R. Alapati

This book provides complete knowledge for creating, configuring,
securing, managing, and optimizing production Hadoop clusters in any
environment.  Here you will learn how to work in complex Hadoop
environments, by understanding exactly what happens behind the scenes
when you administer your Hadoop cluster. You will learn how to build
Hadoop clusters from scratch and configure high availability,
performance, security, encryption, and other key attributes.  You will
also learn how to run MapReduce and Apache Spark applications in a Hadoop cluster, manage and protect Hadoop data and high availability, work with HDFS commands,
file permissions, and storage management, manage job workflows with
Oozie and Hue and Benchmarking and troubleshooting in Hadoop.

expert hadoop admin books

2)  Hadoop Operations- A Guide for Developers and Administrators By Eric Sammer

If you want to learn how to maintain large and complex Hadoop clusters,
this book is a must. Here the author shows you the particulars of
running Hadoop in production, from planning, installing, and configuring
the system to providing ongoing maintenance.

Here you will learn Hadoop deployment, from hardware and OS selection
to network requirements, ways to manage resources by sharing a Hadoop
cluster across multiple groups, ways to monitor Hadoop clusters and
Hadoop troubleshooting with the help of real-world Hadoop use cases.

Hadoop administration books

3) Cloudera Administration Handbook – By Rohit Menon

This book fully prepares you to be a Big data Hadoop administrator, with special emphasis on Cloudera Administration to clear Cloudera certification as well. It provides step-by-step instructions on setting up and managing a robust Hadoop cluster running CDH5.
It will also help you in understanding tools such as Cloudera Manager,
to manage Hadoop clusters with hundreds of nodes. You will learn how to
set up Hadoop security using Kerberos and ways for troubleshooting
cluster issues.

Cloudera Hadoop admin book

Read the complete article>>


April 30, 2017  11:57 PM

Fault Tolerance in Apache Spark.

VijaySharma Profile: VijaySharma
Uncategorized

Introduction to Fault Tolerance in Apache Spark

Before we start with learning what is fault tolerance in Spark, let us revise concepts of Apache Spark for beginners.

Now let’s understand what is fault and how Spark handles fault tolerance.

Fault refers to failure, thus fault tolerance is the capability to
operate and to recover loss after a failure occurs. If we want our
system to be fault tolerant, it should be redundant because we require a
redundant component to obtain the lost data. The faulty data is
recovered by redundant data.

Spark RDD fault tolerance

Let us firstly see how to create RDDs in Spark.
Spark operates on data in fault-tolerant file systems like HDFS or S3.
So all the RDDs generated from fault tolerant data is fault tolerant.
But this does not set true for streaming/live data (data over the
network). So the key need of fault tolerance in Spark is for this kind
of data. The basic fault-tolerant semantic of Spark are:

  • Since Apache Spark RDD
    is an immutable dataset, each Spark RDD remembers the lineage of the
    deterministic operation that was used on fault-tolerant input dataset to
    create it.
  • If due to a worker node failure any partition of an RDD is lost,
    then that partition can be re-computed from the original fault-tolerant
    dataset using the lineage of operations.
  • Assuming that all of the RDD transformations are deterministic, the data in the final transformed RDD will always be the same irrespective of failures in the Spark cluster.

To achieve fault tolerance for all the generated RDDs, the achieved
data is replicated among multiple Spark executors in worker nodes in the
cluster. This results in two types of data that needs to be recovered
in the event of failure: 1) data received and replicated. 2) data received but buffered for replication. 

  • Data received and replicated: In this, the data gets replicated on one of the other nodes thus the data can be retrieved when a failure.
  • Data received but buffered for replication: The data is not replicated thus the only way to recover fault is by retrieving it again from the source.

Failure also occurs in worker as well as driver nodes.

  • Failure of worker node: The node which runs the application code on the Spark cluster
    is Spark worker node. These are the slave nodes. Any of the worker
    nodes running executor can fail, thus resulting in loss of in-memory If
    any receivers were running on failed nodes, then their buffer data will
    be lost.
  • Failure of driver node: If there is a failure of
    the driver node that is running the Spark Streaming application, then
    SparkContent is lost and all executors with their in-memory data are
    lost.

Apache Mesos
helps in making the Spark master fault tolerant by maintaining the
backup masters. It is open source software residing between the
application layer and the operating system and makes it easier to deploy
and manage applications in large-scale clustered environment.
Executors are relaunched if they fail. Post failure, executors are
relaunched automatically and spark streaming does parallel recovery by
recomputing Spark RDD’s on input data. Receivers are restarted by the
workers when they fail.

Fault Tolerance with receiver – based sources

For input sources based on receivers, the fault tolerance depends on both- the failure scenario and the type of receiver. There are two types of receiver:

  • Reliable receiver: Once it is ensured that the
    received data has been replicated, the reliable sources are
    acknowledged. If the receiver fails, the source will not receive
    acknowledgment for the buffered data. So, the next time the receiver is
    restarted, the source will resend the data, and no data will be lost due
    to failure.
  • Unreliable Receiver: Due to the worker or driver failure, the data can be lost since such receiver does not send an acknowledgment.

If the worker node fails, and the receiver is
reliable there will be no data loss. But in the case of unreliable
receiver data loss will occur. With the unreliable receiver, data
received but not replicated can be lost.

Read the complete article>>


April 30, 2017  11:57 PM

Project Amber: The Future of Java Exposed

OverOps Profile: OverOps
If all goes according to plan (Project Jigsaw we’re looking at you), Java 9 is set to launch in less than 100 days. You can join the countdown to its release right here. It will come packed with a long list of new and upgraded features, some we can’t wait to see in action.
 
However, there are a few features that weren’t ready for Java 9, and that’s where Project Amber comes in, so these features could become a reality. What does it mean? Let’s find out.


April 19, 2017  6:15 PM

Java 9 will finally give the term ‘deprecated’ meaning

cameronmcnz Cameron McKenzie Profile: cameronmcnz
Java

I’m not sure if I’m alone on this opinion, but it sure seems to me that the deprecation of the finalize method has been given way too much press. I can’t recall this much fervor over method deprecation since they blacklisted the java.util.Date() constructor and told everyone to start using a GregorianCalendar instead.

Java methods deprecated meaning

One does not simply stop calling deprecated methods

Giving the term ‘deprecated’ meaning

Whenever deprecated Java methods become news, I always like to troll the language gods over the fact that even though methods get deprecated, the underlying code never actually gets removed from the API, and as a result, lazy developers just keep using it, deprecation warnings be damned.

Previous rantings from TheServerSide about deprecated meaning nothing 

In a recent blog post, entitled Deprecation of Object.finalize(), Oracle Technical Staff Principal Stuart Marks set the record straight not only on what was being deprecated in Java 9, but which deprecated methods were actually being pruned from the API as well. Here’s the pertinent excerpt from his article:

The following six APIs were deprecated in Java SE 8, and they have been removed from Java SE 9:

  1. java.util.jar.Pack200.Packer.addPropertyChangeListener
  2. java.util.jar.Pack200.Unpacker.addPropertyChangeListener
  3. java.util.logging.LogManager.addPropertyChangeListener
  4. java.util.jar.Pack200.Packer.removePropertyChangeListener
  5. java.util.jar.Pack200.Unpacker.removePropertyChangeListener
  6. java.util.logging.LogManager.removePropertyChangeListener

In addition, in Java SE 9, about 20 methods and six modules have been deprecated with forRemoval=true, indicating our intent to remove them from the next major Java SE release. Some of the classes and methods to be removed include:

  • java.lang.Compiler
  • Thread.destroy
  • System.runFinalizersOnExit
  • Thread.stop(Throwable)

The modules deprecated for removal are the following:

  1. java.activation
  2. java.corba
  3. java.transaction
  4. java.xml.bind
  5. java.xml.ws
  6. java.xml.ws.annotation

So yes, we are getting serious about removing stuff!

Deprecated value != deprecated meaning

So I guess that will shut me up for a while. They’re getting around to pruning the API and getting rid of the deprecated methods. All I ask is that they don’t prune away that deprecated java.util.Date constructor. I’m still writing code that uses it.

You can follow Stuart Marks on Twitter: @stuartmarks
You can follow Cameron McKenzie too: @cameronmckenzie


April 14, 2017  1:53 PM

How Disney organized for a DevOps transition

George Lawton Profile: George Lawton

Conversations about architecting the enterprise for agility often start with a consideration of new technologies. However, this only works when enterprises processes and policies that support them are in place, said Jason Cox, director of system engineering at The Walt Disney Company.

One of the tragedies of the traditional notion of agility lies in the drive for speed without processes that nourish the developers and ops people that support them, said Cox. When Cox joined the group they were in non-stop firefighting mode. They began glorifying the “heroes,” that worked marathon hours without sleep. But over time, these heroes burnt out. The problem was that development engineers looked down on the operations team. Walt Disney management renamed the operations team members ‘System Engineers.’

“This had a profound effect,” said Cox. The developers saw these people as fellow engineers and invited them to discussions about changes to software and infrastructure on equal footing. “The developers were constructing the software together with the systems engineering teams.” This eventually led to collapsing the teams together, which helped to continually improve the applications and architecture.

Architect a common core with distributed engineering

In 2011, they began a more aggressive pursuit of a DevOps strategy. This was as much about rethinking about the communication infrastructure and organization hierarchy as the technology.

This lead to a transition from functional teams to a matrix team organizational model. In the functional team model, developers and infrastructure were tasked with supporting one line of business. In the matrix team model, Walt Disney established a core DevOps team for providing IT services to its four main business groups.

This team is responsible for providing core IT services, and also embeds managers, staff, and engineers across the different Walt Disney Business units. “The benefit is that because we have a centralized connection point, we can be a cross business conduit,” said Cox. “We can take new tools, processes, and lessons learned and share those across the different segments because of the model we have adopted through this embedded matrix organization.”


Page 1 of 1512345...10...Last »

Forgot Password

No problem! Submit your e-mail address below. We'll send you an e-mail containing your password.

Your password has been sent to: