• Hadoop: What’s the difference between Pig and Hive?

    I'm pretty new to the Hadoop world (been using it for about a month) and I've started to get into Hive, Pig and Hadoop using Cloudera's Hadoop VM. Is there a difference between Pig and Hive? I understand they have similar commands so I'm trying to figure out the big differences.

    ITKE1,124,655 pointsBadges:
  • What’s the difference between S3 and S3N in Hadoop?

    When we recently connected our Hadoop cluster to our Amazon storage and downloaded a file to HDFS, we noticed that s3:// didn't work but when we tried out S3N, it worked. Why didn't it work with S3? Is there a difference between the two?

    ITKE1,124,655 pointsBadges:
  • Hadoop: Safemode recovery is taking too long

    We have a Hadoop cluster with 18 data nodes. We recently restarted the name node about three hours ago and it's still in safe mode! We're not sure if we should try to restart it. We looked online and found this to try: dfs.namenode.handler.count 3 true Should we try this? If not, has anyone seen...

    ITKE1,124,655 pointsBadges:
  • Hadoop: How to handle data streams in real-time

    I've recently been working with Hadoop and now I'm using it to handle data streams in real-time. For this, I would like to build a meaningful POC around it so I could showcase it. I'm pretty limited in resources so any help would be appreciated.

    ITKE1,124,655 pointsBadges:
  • How to run Hadoop job without JobConf

    I'm trying to submit a Hadoop job that doesn't use the deprecated JobConf class. But my friend told me that JobClient only supports methods that take a JobConf parameter. Does anyone know how I can submit a Hadoop job using only the configuration class? Is there a Java code for it?

    ITKE1,124,655 pointsBadges:
  • Big data: How to get started

    We've been using R for several years and now we're starting to get into Python. We've been using RDBMS systems for data warehousing and R for number-crunching. Now, we think it's time to get more involved with big data analysis. Does anyone know how we should get started (basically how to use...

    ITKE1,124,655 pointsBadges:
  • How to compress large files in Hadoop

    I need to process a huge file and I'm looking to use Hadoop for it. From what my friend has told me, the file would get split into several different nodes. But if the file is compressed, then the file won't be split and would need to be processed a single node (and I wouldn't be able to use...

    ITKE1,124,655 pointsBadges:
  • Free space in HDFS

    Would there be a HDFS command to see if there's available free space in HDFS. I'm able to see it through the the browser using master:hdfsport. But unfortunately, I can't access it and I need a command. I can see disk usage but not free space. Appreciate the help.

    ITKE1,124,655 pointsBadges:
  • What framework should I use for fast Hadoop real-time data analysis?

    I'm trying to do some real-time data analysis on data in HDFS but I'm not sure which framework I should use. I'm deciding between Cloudera, Apache and Spark. Which one would best suite me? Thanks!

    ITKE1,124,655 pointsBadges:
  • Pass mapped data to multiple reduce functions in Hadoop

    I currently have a large datasest that I need to analyze with multiple reduce functions. What I would like to do is read the dataset only once and then pass the mapped data to multiple reduce functions. Is there a way I can do this in Hadoop? Thank you!

    ITKE1,124,655 pointsBadges:
  • Getting warning message when starting Hadoop cluster

    I just started a Hadoop cluster but I keep getting this warning message: $HADOOP_HOME is deprecated. But when I add export HADOOP_HOME_WARN_SUPPRESS="TRUE" into hadoop-env.sh, I don't get the message anymore (when I start the cluster). When I run this: hadoop dfsadmin -report, I see the message...

    ITKE1,124,655 pointsBadges:
  • How to install Mahout on Hadoop cluster

    We recently created a Hadoop cluster (that has 3 slaves and 1 master using Ambari server/Hortonworks). Now we're trying to install mahout 0.9 in the master machine so we can run mahout jobs in the cluster. Is there a way to do that?

    ITKE1,124,655 pointsBadges:
  • Hadoop: The difference between jobconf and job objects

    I'm currently working in Hadoop but I'm having difficulty finding the difference between jobconf and job objects. This is how I'm submitting my job as of today: JobClient.runJob(jobconf); But then my friend send me this for submitting jobs: Configuration conf = getConf(); Job job = new Job(conf,...

    ITKE1,124,655 pointsBadges:
  • How do I produce big data in Hadoop?

    I've been working with Hadoop and Nutch over the past few weeks and I need the a massive amount of data. I'm trying to start with 20 GB would like to reach between 1-2 TB at some point. But, as of right now, I don't have that much data but would like to produce it. The data could be anything...

    ITKE1,124,655 pointsBadges:
  • Getting error message when running a job on Hadoop: Mkdirs failed to create /some/path

    I'm trying to run a job in Hadoop but I keep getting this weird exception: Exception in thread "main" java.io.IOException: Mkdirs failed to create /some/path at org.apache.hadoop.util.RunJar.ensureDirectory(RunJar.java:106) at org.apache.hadoop.util.RunJar.main(RunJar.java:150) Has anyone seen this...

    ITKE1,124,655 pointsBadges:
  • Is mapred-site.xml included with Hadoop 2.2.0?

    I recently learned that the latest build of Hadoop provides mapred-site.xml.template but do I have to create a new mapred-site.xml using that? Appreciate the help.

    ITKE1,124,655 pointsBadges:
  • Is there anything similar to Hadoop in C++?

    I apologize for the short / newbie question would anyone happen to know if there's anything like Hadoop in C++? I'm trying to use distributed computing using MapReduce but not sure of the best way to do it. Thank you.

    ITKE1,124,655 pointsBadges:
  • Rename files in Hadoop/Spark

    A friend of mine (he would ask this question but his computer is down) has an input folder that contains over 100,000 files. He wants to do a batch operation and is trying to use Spark but when he tried this piece of code: final org.apache.hadoop.fs.FileSystem ghfs =...

    ITKE1,124,655 pointsBadges:
  • Hadoop: When do reduce tasks start?

    I'm using Hadoop and I can't figure out when reduce tasks start up. Do they actually start after a percentage of mappers complete? Would there a fixed threshold? Thank you for the help.

    ITKE1,124,655 pointsBadges:
  • How does secondary sorting work in Hadoop

    I'm pretty new to the Hadoop/big data industry but I'm trying to figure how secondary sorting works in Hadoop. Why would I have to use GroupingComparator when doing this? Does anyone know how this works?

    ITKE1,124,655 pointsBadges:

Forgot Password

No problem! Submit your e-mail address below. We'll send you an e-mail containing your password.

Your password has been sent to:

To follow this tag...

There was an error processing your information. Please try again later.

Thanks! We'll email you when relevant content is added and updated.

Following