• What should I choose for file storage: MongoDB or Hadoop?

    For about the past month, I've been looking for the best solution to create scalable storage for big files. The file size varies from 1-2 megabytes and some get to 500-600 gigabytes. I'm deciding between MongoDB and Hadoop and I'm not sure which way to go. I'm thinking of using MongoDB as a file...

    ITKE355,490 pointsBadges:
  • Convert .txt file to Hadoop sequence file

    I have big data and I'm trying to store all of it in Hadoop's sequence file format. But all of the data is in a flat .txt format. Is there any way I can convert it? Thank you.

    ITKE355,490 pointsBadges:
  • Print documents in MongoDB shell

    Does anyone know of a way to print out more than 20 documents in MongoDB's shell? I've tried this: db.foo.find().limit(300) But this still prints out 20. Then I tried this code: db.foo.find().toArray() db.foo.find().forEach(printjson) But it's printing out an expanded view of each document of the...

    ITKE355,490 pointsBadges:
  • How to put the results of a Hive query to a CSV file

    I'm trying to put the results of a hive query to a CSV file. This is what my command looks like: insert overwrite directory '/home/output.csv' select books from table; So when I run it, it says it was successful but I'm having issues finding the file. Is there a way I can find this file? Thank you.

    ITKE355,490 pointsBadges:
  • Process range of Hbase rows using Spark

    We've been using HBase as a data source for Spark. We've already created a RDD from a HBase table but we can't figure out a way to create a RDD for a range scan. Does anyone know how to do it?

    ITKE355,490 pointsBadges:
  • How to count many lines in large files

    My partner and I usually work with files that are larger than 20 GB in size but we have to count the number of lines in any given file often. We've been doing it using cat fname | wc -| but it takes forever. Does anyone know of a way that will make it faster? We're working with a high performance...

    ITKE355,490 pointsBadges:
  • SSIS XMLNode as input parameter from web service

    Looking for a way to pass in an input parameter from a web service task that is of type XMLNode. After downloading the wsdl file, an input type of XMLNode is required.

    djcurtis25 pointsBadges:
  • New approaches to big data analytics

    What are the new approaches to modeling in analytics?

    rhari222520 pointsBadges:
  • Execute Mongo commands in Shell Script

    I'm trying to execute MongoDB commands using Shell Script. Here's what I tried so far: #!/bin/sh mongo myDbName db.mycollection.findOne() show collections When I executed it, it said the connection was established but not executed. Can someone help me out?

    ITKE355,490 pointsBadges:
  • Getting error message when installing Hadoop 2.2.0

    I've been trying to install Hadoop 2.2.0 cluster on my servers. All of the servers 64-bit and I've downloaded Hadoop and the configuration files are good to go. When I started to run ./start-dfs.sh, I keep getting this error: 13/11/15 14:29:26 WARN util.NativeCodeLoader: Unable to load...

    ITKE355,490 pointsBadges:
  • Data synchronization between AS/400 test server and production server

    The requirement is to make the data in test server same as the live server. This update should happen every week. The test server should hold only last one year data from production. Please let me know the best possible way to do this. (Taking backup from live and restore in test is an option. But...

    ShajiMohan50 pointsBadges:
  • How to merge several small files into one in Hadoop

    I have several multiple small files into my input directory that I want to merge into a single file. I need to do this without using the local file system or writing mapreds. Is there a Hadoop command I can use to do this?

    ITKE355,490 pointsBadges:
  • Error message when handling big data set in R

    I'm currently working with Windows 8 with a RAM of 8 GB. I also have a data frame of 1.8 million rows and 270 columns (on which I have to perform a GLM). I've already tried to use FF and BIGGLM packages to handle the data but it hasn't worked. I keep getting this error: Error: cannot allocate...

    ITKE355,490 pointsBadges:
  • Get names in keys in MongoDB

    I'm trying to get the names of all the keys in my MongoDB collection. Here's what I have so far: db.things.insert( { type : ['dog', 'cat'] } ); db.things.insert( { egg : ['cat'] } ); db.things.insert( { type : [] } ); db.things.insert( { hello : [] } ); And I'm trying to get the unique keys, like...

    ITKE355,490 pointsBadges:
  • Error message when starting Hadoop on OSX

    I keep getting this weird error message when Hadoop starts up on OSX 10.7. Here's the error message: Unable to load realm info from SCDynamicStore put: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot create directory /user/travis/input/conf. Name node is in safe mode. Has anyone...

    ITKE355,490 pointsBadges:
  • How to export big data from Cassandra to CSV

    My friend and I have been using Cassandra for storing big data (it's over 100 GB) in one column family. We're now trying to export it to CSV in a fast way but we're not being successful. We've tried CAPTURE and COPY but it's not producing what we need. Anyone know of a better way exporting...

    ITKE355,490 pointsBadges:
  • Writing map only Hadoop jobs

    I'm pretty new to the Hadoop scene and I've run into a problem. Every once in a while, I only map for a job so I actually only need the map result directly as output (AKA reduce phase isn't needed). Is there a way to do that?

    ITKE355,490 pointsBadges:
  • Should I move my SQL database to NoSQL?

    I have a very large table (roughly 100 million rows and 35 columns) and it's currently stored in a SQL database. However, my queries are running very slow at the moment so I'm wondering if I should move to NoSQL. But I have a few questions: Which NoSQL database should I use? Is there a way to move...

    ITKE355,490 pointsBadges:
  • How to use Python to parse a 12 GB CSV file

    We currently have a 12 GB CSV file. We're now trying to extract some columns from this data and then write a new CSV file that would load into R for data analysis. But we keep getting this error when we're loading the list before writing the new file. Is there a way we can parse the data row by row...

    ITKE355,490 pointsBadges:
  • What’s the difference between CLI and CQL in Cassandra?

    I'm pretty new to Cassandra and I'm looking to find out the difference between CLI and CQL. Which one is better to use? Also, are there any APIs I can use to query Cassandra using .NET? Thanks so much.

    ITKE355,490 pointsBadges:

Forgot Password

No problem! Submit your e-mail address below. We'll send you an e-mail containing your password.

Your password has been sent to:

To follow this tag...

There was an error processing your information. Please try again later.

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

Thanks! We'll email you when relevant content is added and updated.

Following