• How to install Mahout on Hadoop cluster

    We recently created a Hadoop cluster (that has 3 slaves and 1 master using Ambari server/Hortonworks). Now we're trying to install mahout 0.9 in the master machine so we can run mahout jobs in the cluster. Is there a way to do that?

    ITKE376,460 pointsBadges:
  • Hadoop: The difference between jobconf and job objects

    I'm currently working in Hadoop but I'm having difficulty finding the difference between jobconf and job objects. This is how I'm submitting my job as of today: JobClient.runJob(jobconf); But then my friend send me this for submitting jobs: Configuration conf = getConf(); Job job = new Job(conf,...

    ITKE376,460 pointsBadges:
  • How do I produce big data in Hadoop?

    I've been working with Hadoop and Nutch over the past few weeks and I need the a massive amount of data. I'm trying to start with 20 GB would like to reach between 1-2 TB at some point. But, as of right now, I don't have that much data but would like to produce it. The data could be anything...

    ITKE376,460 pointsBadges:
  • Big data books to start a career

    I apologize if this isn't the right area to ask but I'm looking to get into the big data field (I would like to work in the industry) so would anyone happen to know of some great books on big data? I'm looking for anything on Hadoop or HBase. Thanks so much!

    ITKE376,460 pointsBadges:
  • Getting error message when running a job on Hadoop: Mkdirs failed to create /some/path

    I'm trying to run a job in Hadoop but I keep getting this weird exception: Exception in thread "main" java.io.IOException: Mkdirs failed to create /some/path at org.apache.hadoop.util.RunJar.ensureDirectory(RunJar.java:106) at org.apache.hadoop.util.RunJar.main(RunJar.java:150) Has anyone seen this...

    ITKE376,460 pointsBadges:
  • Dropping MongoDB database from the command line

    Would anyone know how to drop a MongoDB database from the command line? Is there a way to do it through the bash prompt? Thanks so much.

    ITKE376,460 pointsBadges:
  • Ran out of 32-bit address space in Python

    When I was taking the covariance of a large matrix using numpy.gov, I got this weird error message: Python(22498,0xa02e3720) malloc: *** mmap(size=1340379136) failed (error code=12) *** error: can't allocate region *** set a breakpoint in malloc_error_break to debug Process Python bus error So what...

    ITKE376,460 pointsBadges:
  • Limitations to adding NDB cluster on MySQL

    I'm trying to implement a NDB cluster for MySQL 6. I'm trying to do it with a very large data structure (2 million databases). Are there any limitations of implementing a NDB cluster? RAM size perhaps? Appreciate the help.

    ITKE376,460 pointsBadges:
  • Is mapred-site.xml included with Hadoop 2.2.0?

    I recently learned that the latest build of Hadoop provides mapred-site.xml.template but do I have to create a new mapred-site.xml using that? Appreciate the help.

    ITKE376,460 pointsBadges:
  • How to read big data that’s formatted with a fixed width

    Is there a way I can read big data that's formatted with a fixed width? My data has roughly 558 MB and I'm not sure how many lines there are. Anyone have some suggestions for me?

    ITKE376,460 pointsBadges:
  • Is there anything similar to Hadoop in C++?

    I apologize for the short / newbie question would anyone happen to know if there's anything like Hadoop in C++? I'm trying to use distributed computing using MapReduce but not sure of the best way to do it. Thank you.

    ITKE376,460 pointsBadges:
  • Rename files in Hadoop/Spark

    A friend of mine (he would ask this question but his computer is down) has an input folder that contains over 100,000 files. He wants to do a batch operation and is trying to use Spark but when he tried this piece of code: final org.apache.hadoop.fs.FileSystem ghfs =...

    ITKE376,460 pointsBadges:
  • How to know which region server to write to in HBase

    Would there be a way in HBase to get operations to know which region server the row should be written to? Just in case that several rows need to be read, how multiple region servers are contacted and results are retrieved. Thanks so much

    ITKE376,460 pointsBadges:
  • How to sort large text data in Python

    We currently have a large file (that's over 100 million lines of tab separated values, which is about 1.5 GB in size). Does anyone know of a fast way we can sort this file through one of the fields. We already tried Hive but that was too slow. Would Python be able to do this?

    ITKE376,460 pointsBadges:
  • Is HBase a better choice than Cassandra for big data?

    We're trying to decide which software would be best for us when it comes to our big data. We're currently between HBase and Cassasndra (with Hadoop) and we're learning more towards HBase. Do you guys think HBase is a better choice for us? Is there really any difference between the two?

    ITKE376,460 pointsBadges:
  • How to get the last N records in MongoDB

    I'm currently using MongoDB and I'm trying to figure how I can get the last N records. I know that, be default, the find() process will get all the records from the beginning. I would appreciate any help available.

    ITKE376,460 pointsBadges:
  • How to sort big data in C

    I've recently begun to start working with big data. Today, I started a project that has a file with 10,000,000 ints. I'm trying to perform a number of sorts on the data / time the sorts but I'm sure how to go about it. The code below is what I would like to do: ./mySort < myDataFile >...

    ITKE376,460 pointsBadges:
  • Can R handle data that’s bigger than RAM?

    I've been using R as open source...but it's not letting me handle data sets that are bigger than RAM memory. Would it be possible to handle big data sets applying PL/R functions inside PostgreSQL? Does anyone know?

    ITKE376,460 pointsBadges:
  • Fetch data from HBase table in Spark

    We have this huge table in HBase that's named UserAction. It has three different column families. We're trying to fetch all of the data from one column family as a JavaRDD object. We've tried using the code below but it's not working. What else can we do? static SparkConf sparkConf = new...

    ITKE376,460 pointsBadges:
  • Hadoop: When do reduce tasks start?

    I'm using Hadoop and I can't figure out when reduce tasks start up. Do they actually start after a percentage of mappers complete? Would there a fixed threshold? Thank you for the help.

    ITKE376,460 pointsBadges:

Forgot Password

No problem! Submit your e-mail address below. We'll send you an e-mail containing your password.

Your password has been sent to:

To follow this tag...

There was an error processing your information. Please try again later.

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

Thanks! We'll email you when relevant content is added and updated.