• Free space in HDFS

    Would there be a HDFS command to see if there's available free space in HDFS. I'm able to see it through the the browser using master:hdfsport. But unfortunately, I can't access it and I need a command. I can see disk usage but not free space. Appreciate the help.

    ITKE355,660 pointsBadges:
  • MongoDB: Checking to see if an array field contains an unique value

    I've been using MongoDB for over a month now and I have a blog post collection and each one has tags filled (which is an array). This is what it looks like: blogpost1.tags = ['tag1', 'tag2', 'tag3', 'tag4', 'tag5'] blogpost2.tags = ['tag2', 'tag3'] blogpost3.tags = ['tag2', 'tag3', 'tag4', 'tag5']...

    ITKE355,660 pointsBadges:
  • Running a big data process in R

    I've recently collected data from the Twitter Streaming API and now the JSON sits in a 10 GB text file. I'm looking to have R handle all the big data but I'm not sure if it could do a few things, such as: Read / process the data into a data frame Descriptive analysis Plotting Can R do this? Or do I...

    ITKE355,660 pointsBadges:
  • SANDISK data life span

    Is it possible to get data corruption on a sandisk compact flash if it has been stored without power for 4 or 5 years.

    A23857410 pointsBadges:
  • Delete million of rows by ID in SQL

    We're trying to delete roughly 2 million rows from our PG database. We already have a list of IDs that need to be deleted but it's turning into a slow process. This is what I tried: DELETE FROM tbl WHERE id IN (select * from ids) So basically, this is taking about 2 days to finish. Is there a...

    ITKE355,660 pointsBadges:
  • What framework should I use for fast Hadoop real-time data analysis?

    I'm trying to do some real-time data analysis on data in HDFS but I'm not sure which framework I should use. I'm deciding between Cloudera, Apache and Spark. Which one would best suite me? Thanks!

    ITKE355,660 pointsBadges:
  • How to cluster keys in Cassandra

    I'm pretty new to Cassandra and from what I've learned, a physical node has rows for a given partition key that are stored in the order induced by the clustering keys. This makes the retrieval of the rows in the order easy to do. But I'm not sure of what kind of ordering is induced by clustering...

    ITKE355,660 pointsBadges:
  • How to create a funnel in MongoDB

    In MongoDB, I have a collection that's named event (it basically tracks events from mobile applications). Here's what the structure of the document is: { eventName:"eventA", screenName:"HomeScreen", timeStamp: NumberLong("135698658"), tracInfo: { ..., "userId":"user1", "sessionId":"123cdasd2123",...

    ITKE355,660 pointsBadges:
  • How to get a random record in MongoDB

    I have roughly a 100 million records and I need to get a random record in MongoDB. What's the best way to do this? I already have the data ready to go but there's no field from which I can generate a random number / obtain and random row. I would appreciate the help.

    ITKE355,660 pointsBadges:
  • Pass mapped data to multiple reduce functions in Hadoop

    I currently have a large datasest that I need to analyze with multiple reduce functions. What I would like to do is read the dataset only once and then pass the mapped data to multiple reduce functions. Is there a way I can do this in Hadoop? Thank you!

    ITKE355,660 pointsBadges:
  • Getting warning message when starting Hadoop cluster

    I just started a Hadoop cluster but I keep getting this warning message: $HADOOP_HOME is deprecated. But when I add export HADOOP_HOME_WARN_SUPPRESS="TRUE" into hadoop-env.sh, I don't get the message anymore (when I start the cluster). When I run this: hadoop dfsadmin -report, I see the message...

    ITKE355,660 pointsBadges:
  • Load small random sample of a large CSV file into R data frame

    We have a CSV file that needs to be processed but doesn't fit into the memory. Is there a way we can read 20K+ random lines of it to do basic stats on our selected date frame?

    ITKE355,660 pointsBadges:
  • Should I learn MongoDB or CouchDB for NoSQL?

    I'm pretty new to everything NoSQL related but I've heard a ton of things on MongoDB and CouchDB but I'm not sure of the differences. Can anyone help me? Which one do you recommend as a first step to learning NoSQL? Thank you very much!

    ITKE355,660 pointsBadges:
  • How to delete all data in MongoDB

    I'm currently working in development on MongoDB. Every once in a while, I need to delete pretty much everything in that database. Does anyone know if there's a single line of code to do this? I'm going through every single collection right now and it's taking way too long.

    ITKE355,660 pointsBadges:
  • How to install Mahout on Hadoop cluster

    We recently created a Hadoop cluster (that has 3 slaves and 1 master using Ambari server/Hortonworks). Now we're trying to install mahout 0.9 in the master machine so we can run mahout jobs in the cluster. Is there a way to do that?

    ITKE355,660 pointsBadges:
  • Hadoop: The difference between jobconf and job objects

    I'm currently working in Hadoop but I'm having difficulty finding the difference between jobconf and job objects. This is how I'm submitting my job as of today: JobClient.runJob(jobconf); But then my friend send me this for submitting jobs: Configuration conf = getConf(); Job job = new Job(conf,...

    ITKE355,660 pointsBadges:
  • How do I produce big data in Hadoop?

    I've been working with Hadoop and Nutch over the past few weeks and I need the a massive amount of data. I'm trying to start with 20 GB would like to reach between 1-2 TB at some point. But, as of right now, I don't have that much data but would like to produce it. The data could be anything...

    ITKE355,660 pointsBadges:
  • Big data books to start a career

    I apologize if this isn't the right area to ask but I'm looking to get into the big data field (I would like to work in the industry) so would anyone happen to know of some great books on big data? I'm looking for anything on Hadoop or HBase. Thanks so much!

    ITKE355,660 pointsBadges:
  • Getting error message when running a job on Hadoop: Mkdirs failed to create /some/path

    I'm trying to run a job in Hadoop but I keep getting this weird exception: Exception in thread "main" java.io.IOException: Mkdirs failed to create /some/path at org.apache.hadoop.util.RunJar.ensureDirectory(RunJar.java:106) at org.apache.hadoop.util.RunJar.main(RunJar.java:150) Has anyone seen this...

    ITKE355,660 pointsBadges:
  • Dropping MongoDB database from the command line

    Would anyone know how to drop a MongoDB database from the command line? Is there a way to do it through the bash prompt? Thanks so much.

    ITKE355,660 pointsBadges:

Forgot Password

No problem! Submit your e-mail address below. We'll send you an e-mail containing your password.

Your password has been sent to:

To follow this tag...

There was an error processing your information. Please try again later.

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

Thanks! We'll email you when relevant content is added and updated.

Following