• Hadoop error when accessing application through HTML

    We have installed Hadoop on our cluster and now we're installing HTTPFS to access the HDFS content using HTTP protocol. We're able to access the normal page but when we tried to access HDFS, we're getting an error: {"RemoteException":{"message":"User: ubantu is not allowed to impersonate ubantu",...

    ITKE361,885 pointsBadges:
  • What’s faster for big data processing: MongoDB or Redis

    I'm currently working on my big data project and I'm trying to decide between Redis or MongoDB. Which one would be faster for processing (from a performance standpoint)? I would appreciate any advice available.

    ITKE361,885 pointsBadges:
  • Output results of Hive query into CSV file

    We're trying to put the results of a Hive query into a CSV file. Here's the command we came up with: insert overwrite directory '/home/output.csv' select books from table; So, when it's done, it says completed but we can't find the file. Where is it? Or should we extract it in a different way?

    ITKE361,885 pointsBadges:
  • How to parse big data JSON file

    I have a JSON file that has roughly 36 GB and I need to access it more efficiently. I've been using rapidjsons SAX-style API in C++ but it takes about two hours to parse. Now here's my question: Should I split the big file into millions of small files? Is there any other approach I should take?...

    ITKE361,885 pointsBadges:
  • Speed up data processing in MongoDB

    I've been using MongoDB to get every document in a collection. It's working but with so many small documents (there are over a 100 million), it's very slow. Here's what I'm using: count <- mongo.count(mongo, ns, query) cursor <- mongo.find(mongo, query) name <- vector("character", count)...

    ITKE361,885 pointsBadges:
  • Out of memory error when installing Hadoop

    I recently tried to install Hadoop following a document my friend gave me. When I tried to execute this: bin/hadoop jar hadoop-examples-*.jar grep input output 'dfs[a-z.]+' I got this exception: java.lang.OutOfMemoryError: Java heap space Has anyone seen this before? Like I said, I'm pretty new to...

    ITKE361,885 pointsBadges:
  • JAVA_HOME is not set correctly when installing Hadoop on Ubuntu

    I've been trying to install Hadoop on Ubuntu 11.10. I just set the JAVA_HOME variable in the file conf/hadoop-env.sh to: # export JAVA_HOME=/usr/lib/jvm/java-1.6.0-openjdk Then I tried to execute these commands: $ mkdir input $ cp conf/*.xml input $ bin/hadoop jar hadoop-examples-*.jar grep input...

    ITKE361,885 pointsBadges:
  • Available material to help teach Data Management

    I work at Houston Community College and am scheduled to teach a Data Management course, this Summer. Is there any available material that I can use as resource I would be very appreciative.

    enriquej5 pointsBadges:
  • Case-insensitive query in MongoDB

    Does anyone know if it's possible to make a case-insensitive query in MongoDB? Something like this: > db.stuff.save({"foo":"bar"}); > db.stuff.find({"foo":"bar"}).count(); 1 > db.stuff.find({"foo":"BAR"}).count(); 0 Thanks!

    ITKE361,885 pointsBadges:
  • Hadoop: What’s the difference between Pig and Hive?

    I'm pretty new to the Hadoop world (been using it for about a month) and I've started to get into Hive, Pig and Hadoop using Cloudera's Hadoop VM. Is there a difference between Pig and Hive? I understand they have similar commands so I'm trying to figure out the big differences.

    ITKE361,885 pointsBadges:
  • What’s the difference between S3 and S3N in Hadoop?

    When we recently connected our Hadoop cluster to our Amazon storage and downloaded a file to HDFS, we noticed that s3:// didn't work but when we tried out S3N, it worked. Why didn't it work with S3? Is there a difference between the two?

    ITKE361,885 pointsBadges:
  • Hadoop: Safemode recovery is taking too long

    We have a Hadoop cluster with 18 data nodes. We recently restarted the name node about three hours ago and it's still in safe mode! We're not sure if we should try to restart it. We looked online and found this to try: dfs.namenode.handler.count 3 true Should we try this? If not, has anyone seen...

    ITKE361,885 pointsBadges:
  • MongoDB: Find documents that have name array size greater than one

    We have a MongoDB collection that have documents in this format: { "_id" : ObjectId("4e8ae86d08101908e1000001"), "name" : ["Some Name"], "zipcode" : ["2223"] } { "_id" : ObjectId("4e8ae86d08101908e1000002"), "name" : ["Another ", "Name"], "zipcode" : ["2224"] } { "_id" :...

    ITKE361,885 pointsBadges:
  • Data set processing and machine learning in R

    I've been using R over the past year and I know it's designed to handle data sets that it can pull from memory. Are there any R packages that are recommended for signal processing / machine learning on data sets that can't be pulled from memory? If R can't do it, is there another software than can?

    ITKE361,885 pointsBadges:
  • Hadoop: How to handle data streams in real-time

    I've recently been working with Hadoop and now I'm using it to handle data streams in real-time. For this, I would like to build a meaningful POC around it so I could showcase it. I'm pretty limited in resources so any help would be appreciated.

    ITKE361,885 pointsBadges:
  • How to run Hadoop job without JobConf

    I'm trying to submit a Hadoop job that doesn't use the deprecated JobConf class. But my friend told me that JobClient only supports methods that take a JobConf parameter. Does anyone know how I can submit a Hadoop job using only the configuration class? Is there a Java code for it?

    ITKE361,885 pointsBadges:
  • Tell MongoDB to pretty print output

    Is there a way to tell MongoDB to pretty print output? Right now, everything is output to a single line and it's pretty difficult to read (especially with arrays and documents). I appreciate the help.

    ITKE361,885 pointsBadges:
  • Query MongoDB with LIKE

    I'm using MongoDB but I need a query like SQL's like. Something along the lines of this: select * from users where name like '%m%' Is there a way to do the same in MongoDB? I would appreciate any help.

    ITKE361,885 pointsBadges:
  • SANDISK data life span

    Is it possible to get data corruption on a sandisk compact flash if it has been stored without power for 4 or 5 years.

    A23857410 pointsBadges:
  • Process large text file with ByteStrings and lazy texts

    I'm looking to process a large unicode text file that has over 6 GB. I need to count the frequency of each unique word. I'm currently using Data.Map to track the count of each word but it's taking way too much time and space. Here's the code: import Data.Text.Lazy (Text(..), cons, pack, append)...

    ITKE361,885 pointsBadges:

Forgot Password

No problem! Submit your e-mail address below. We'll send you an e-mail containing your password.

Your password has been sent to:

To follow this tag...

There was an error processing your information. Please try again later.

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

Thanks! We'll email you when relevant content is added and updated.

Following