• JAVA_HOME is not set correctly when installing Hadoop on Ubuntu

    I've been trying to install Hadoop on Ubuntu 11.10. I just set the JAVA_HOME variable in the file conf/hadoop-env.sh to: # export JAVA_HOME=/usr/lib/jvm/java-1.6.0-openjdk Then I tried to execute these commands: $ mkdir input $ cp conf/*.xml input $ bin/hadoop jar hadoop-examples-*.jar grep input...

    ITKE829,010 pointsBadges:
  • Case-insensitive query in MongoDB

    Does anyone know if it's possible to make a case-insensitive query in MongoDB? Something like this: > db.stuff.save({"foo":"bar"}); > db.stuff.find({"foo":"bar"}).count(); 1 > db.stuff.find({"foo":"BAR"}).count(); 0 Thanks!

    ITKE829,010 pointsBadges:
  • Hadoop: What’s the difference between Pig and Hive?

    I'm pretty new to the Hadoop world (been using it for about a month) and I've started to get into Hive, Pig and Hadoop using Cloudera's Hadoop VM. Is there a difference between Pig and Hive? I understand they have similar commands so I'm trying to figure out the big differences.

    ITKE829,010 pointsBadges:
  • What’s the difference between S3 and S3N in Hadoop?

    When we recently connected our Hadoop cluster to our Amazon storage and downloaded a file to HDFS, we noticed that s3:// didn't work but when we tried out S3N, it worked. Why didn't it work with S3? Is there a difference between the two?

    ITKE829,010 pointsBadges:
  • Hadoop: Safemode recovery is taking too long

    We have a Hadoop cluster with 18 data nodes. We recently restarted the name node about three hours ago and it's still in safe mode! We're not sure if we should try to restart it. We looked online and found this to try: dfs.namenode.handler.count 3 true Should we try this? If not, has anyone seen...

    ITKE829,010 pointsBadges:
  • MongoDB: Find documents that have name array size greater than one

    We have a MongoDB collection that have documents in this format: { "_id" : ObjectId("4e8ae86d08101908e1000001"), "name" : ["Some Name"], "zipcode" : ["2223"] } { "_id" : ObjectId("4e8ae86d08101908e1000002"), "name" : ["Another ", "Name"], "zipcode" : ["2224"] } { "_id" :...

    ITKE829,010 pointsBadges:
  • Data set processing and machine learning in R

    I've been using R over the past year and I know it's designed to handle data sets that it can pull from memory. Are there any R packages that are recommended for signal processing / machine learning on data sets that can't be pulled from memory? If R can't do it, is there another software than can?

    ITKE829,010 pointsBadges:
  • Hadoop: How to handle data streams in real-time

    I've recently been working with Hadoop and now I'm using it to handle data streams in real-time. For this, I would like to build a meaningful POC around it so I could showcase it. I'm pretty limited in resources so any help would be appreciated.

    ITKE829,010 pointsBadges:
  • How to run Hadoop job without JobConf

    I'm trying to submit a Hadoop job that doesn't use the deprecated JobConf class. But my friend told me that JobClient only supports methods that take a JobConf parameter. Does anyone know how I can submit a Hadoop job using only the configuration class? Is there a Java code for it?

    ITKE829,010 pointsBadges:
  • Tell MongoDB to pretty print output

    Is there a way to tell MongoDB to pretty print output? Right now, everything is output to a single line and it's pretty difficult to read (especially with arrays and documents). I appreciate the help.

    ITKE829,010 pointsBadges:
  • Query MongoDB with LIKE

    I'm using MongoDB but I need a query like SQL's like. Something along the lines of this: select * from users where name like '%m%' Is there a way to do the same in MongoDB? I would appreciate any help.

    ITKE829,010 pointsBadges:
  • Process large text file with ByteStrings and lazy texts

    I'm looking to process a large unicode text file that has over 6 GB. I need to count the frequency of each unique word. I'm currently using Data.Map to track the count of each word but it's taking way too much time and space. Here's the code: import Data.Text.Lazy (Text(..), cons, pack, append)...

    ITKE829,010 pointsBadges:
  • Big data: How to get started

    We've been using R for several years and now we're starting to get into Python. We've been using RDBMS systems for data warehousing and R for number-crunching. Now, we think it's time to get more involved with big data analysis. Does anyone know how we should get started (basically how to use...

    ITKE829,010 pointsBadges:
  • How to compress large files in Hadoop

    I need to process a huge file and I'm looking to use Hadoop for it. From what my friend has told me, the file would get split into several different nodes. But if the file is compressed, then the file won't be split and would need to be processed a single node (and I wouldn't be able to use...

    ITKE829,010 pointsBadges:
  • Free space in HDFS

    Would there be a HDFS command to see if there's available free space in HDFS. I'm able to see it through the the browser using master:hdfsport. But unfortunately, I can't access it and I need a command. I can see disk usage but not free space. Appreciate the help.

    ITKE829,010 pointsBadges:
  • MongoDB: Checking to see if an array field contains an unique value

    I've been using MongoDB for over a month now and I have a blog post collection and each one has tags filled (which is an array). This is what it looks like: blogpost1.tags = ['tag1', 'tag2', 'tag3', 'tag4', 'tag5'] blogpost2.tags = ['tag2', 'tag3'] blogpost3.tags = ['tag2', 'tag3', 'tag4', 'tag5']...

    ITKE829,010 pointsBadges:
  • Running a big data process in R

    I've recently collected data from the Twitter Streaming API and now the JSON sits in a 10 GB text file. I'm looking to have R handle all the big data but I'm not sure if it could do a few things, such as: Read / process the data into a data frame Descriptive analysis Plotting Can R do this? Or do I...

    ITKE829,010 pointsBadges:
  • Delete million of rows by ID in SQL

    We're trying to delete roughly 2 million rows from our PG database. We already have a list of IDs that need to be deleted but it's turning into a slow process. This is what I tried: DELETE FROM tbl WHERE id IN (select * from ids) So basically, this is taking about 2 days to finish. Is there a...

    ITKE829,010 pointsBadges:
  • What framework should I use for fast Hadoop real-time data analysis?

    I'm trying to do some real-time data analysis on data in HDFS but I'm not sure which framework I should use. I'm deciding between Cloudera, Apache and Spark. Which one would best suite me? Thanks!

    ITKE829,010 pointsBadges:
  • How to cluster keys in Cassandra

    I'm pretty new to Cassandra and from what I've learned, a physical node has rows for a given partition key that are stored in the order induced by the clustering keys. This makes the retrieval of the rows in the order easy to do. But I'm not sure of what kind of ordering is induced by clustering...

    ITKE829,010 pointsBadges:

Forgot Password

No problem! Submit your e-mail address below. We'll send you an e-mail containing your password.

Your password has been sent to:

To follow this tag...

There was an error processing your information. Please try again later.

Thanks! We'll email you when relevant content is added and updated.

Following