• Load small random sample of a large CSV file into R data frame

    We have a CSV file that needs to be processed but doesn't fit into the memory. Is there a way we can read 20K+ random lines of it to do basic stats on our selected date frame?

    ITKE364,120 pointsBadges:
  • Should I learn MongoDB or CouchDB for NoSQL?

    I'm pretty new to everything NoSQL related but I've heard a ton of things on MongoDB and CouchDB but I'm not sure of the differences. Can anyone help me? Which one do you recommend as a first step to learning NoSQL? Thank you very much!

    ITKE364,120 pointsBadges:
  • How to delete all data in MongoDB

    I'm currently working in development on MongoDB. Every once in a while, I need to delete pretty much everything in that database. Does anyone know if there's a single line of code to do this? I'm going through every single collection right now and it's taking way too long.

    ITKE364,120 pointsBadges:
  • How to install Mahout on Hadoop cluster

    We recently created a Hadoop cluster (that has 3 slaves and 1 master using Ambari server/Hortonworks). Now we're trying to install mahout 0.9 in the master machine so we can run mahout jobs in the cluster. Is there a way to do that?

    ITKE364,120 pointsBadges:
  • Hadoop: The difference between jobconf and job objects

    I'm currently working in Hadoop but I'm having difficulty finding the difference between jobconf and job objects. This is how I'm submitting my job as of today: JobClient.runJob(jobconf); But then my friend send me this for submitting jobs: Configuration conf = getConf(); Job job = new Job(conf,...

    ITKE364,120 pointsBadges:
  • How do I produce big data in Hadoop?

    I've been working with Hadoop and Nutch over the past few weeks and I need the a massive amount of data. I'm trying to start with 20 GB would like to reach between 1-2 TB at some point. But, as of right now, I don't have that much data but would like to produce it. The data could be anything...

    ITKE364,120 pointsBadges:
  • Big data books to start a career

    I apologize if this isn't the right area to ask but I'm looking to get into the big data field (I would like to work in the industry) so would anyone happen to know of some great books on big data? I'm looking for anything on Hadoop or HBase. Thanks so much!

    ITKE364,120 pointsBadges:
  • Getting error message when running a job on Hadoop: Mkdirs failed to create /some/path

    I'm trying to run a job in Hadoop but I keep getting this weird exception: Exception in thread "main" java.io.IOException: Mkdirs failed to create /some/path at org.apache.hadoop.util.RunJar.ensureDirectory(RunJar.java:106) at org.apache.hadoop.util.RunJar.main(RunJar.java:150) Has anyone seen this...

    ITKE364,120 pointsBadges:
  • Dropping MongoDB database from the command line

    Would anyone know how to drop a MongoDB database from the command line? Is there a way to do it through the bash prompt? Thanks so much.

    ITKE364,120 pointsBadges:
  • Ran out of 32-bit address space in Python

    When I was taking the covariance of a large matrix using numpy.gov, I got this weird error message: Python(22498,0xa02e3720) malloc: *** mmap(size=1340379136) failed (error code=12) *** error: can't allocate region *** set a breakpoint in malloc_error_break to debug Process Python bus error So what...

    ITKE364,120 pointsBadges:
  • Limitations to adding NDB cluster on MySQL

    I'm trying to implement a NDB cluster for MySQL 6. I'm trying to do it with a very large data structure (2 million databases). Are there any limitations of implementing a NDB cluster? RAM size perhaps? Appreciate the help.

    ITKE364,120 pointsBadges:
  • Is mapred-site.xml included with Hadoop 2.2.0?

    I recently learned that the latest build of Hadoop provides mapred-site.xml.template but do I have to create a new mapred-site.xml using that? Appreciate the help.

    ITKE364,120 pointsBadges:
  • How to read big data that’s formatted with a fixed width

    Is there a way I can read big data that's formatted with a fixed width? My data has roughly 558 MB and I'm not sure how many lines there are. Anyone have some suggestions for me?

    ITKE364,120 pointsBadges:
  • Is there anything similar to Hadoop in C++?

    I apologize for the short / newbie question would anyone happen to know if there's anything like Hadoop in C++? I'm trying to use distributed computing using MapReduce but not sure of the best way to do it. Thank you.

    ITKE364,120 pointsBadges:
  • Rename files in Hadoop/Spark

    A friend of mine (he would ask this question but his computer is down) has an input folder that contains over 100,000 files. He wants to do a batch operation and is trying to use Spark but when he tried this piece of code: final org.apache.hadoop.fs.FileSystem ghfs =...

    ITKE364,120 pointsBadges:
  • How to know which region server to write to in HBase

    Would there be a way in HBase to get operations to know which region server the row should be written to? Just in case that several rows need to be read, how multiple region servers are contacted and results are retrieved. Thanks so much

    ITKE364,120 pointsBadges:
  • How to sort large text data in Python

    We currently have a large file (that's over 100 million lines of tab separated values, which is about 1.5 GB in size). Does anyone know of a fast way we can sort this file through one of the fields. We already tried Hive but that was too slow. Would Python be able to do this?

    ITKE364,120 pointsBadges:
  • Is HBase a better choice than Cassandra for big data?

    We're trying to decide which software would be best for us when it comes to our big data. We're currently between HBase and Cassasndra (with Hadoop) and we're learning more towards HBase. Do you guys think HBase is a better choice for us? Is there really any difference between the two?

    ITKE364,120 pointsBadges:
  • How to get the last N records in MongoDB

    I'm currently using MongoDB and I'm trying to figure how I can get the last N records. I know that, be default, the find() process will get all the records from the beginning. I would appreciate any help available.

    ITKE364,120 pointsBadges:
  • How to sort big data in C

    I've recently begun to start working with big data. Today, I started a project that has a file with 10,000,000 ints. I'm trying to perform a number of sorts on the data / time the sorts but I'm sure how to go about it. The code below is what I would like to do: ./mySort < myDataFile >...

    ITKE364,120 pointsBadges:

Forgot Password

No problem! Submit your e-mail address below. We'll send you an e-mail containing your password.

Your password has been sent to:

To follow this tag...

There was an error processing your information. Please try again later.

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

Thanks! We'll email you when relevant content is added and updated.

Following