• comparision between water fall model and evolutionary model

    what are the advantages and disadvantages of waterfall model over evolutionay model in a software development life cycle.

    Coolsnipster0 pointsBadges:
  • Rename files in Hadoop/Spark

    A friend of mine (he would ask this question but his computer is down) has an input folder that contains over 100,000 files. He wants to do a batch operation and is trying to use Spark but when he tried this piece of code: final org.apache.hadoop.fs.FileSystem ghfs =...

    ITKE357,140 pointsBadges:
  • How to know which region server to write to in HBase

    Would there be a way in HBase to get operations to know which region server the row should be written to? Just in case that several rows need to be read, how multiple region servers are contacted and results are retrieved. Thanks so much

    ITKE357,140 pointsBadges:
  • How to sort large text data in Python

    We currently have a large file (that's over 100 million lines of tab separated values, which is about 1.5 GB in size). Does anyone know of a fast way we can sort this file through one of the fields. We already tried Hive but that was too slow. Would Python be able to do this?

    ITKE357,140 pointsBadges:
  • Is HBase a better choice than Cassandra for big data?

    We're trying to decide which software would be best for us when it comes to our big data. We're currently between HBase and Cassasndra (with Hadoop) and we're learning more towards HBase. Do you guys think HBase is a better choice for us? Is there really any difference between the two?

    ITKE357,140 pointsBadges:
  • How to get the last N records in MongoDB

    I'm currently using MongoDB and I'm trying to figure how I can get the last N records. I know that, be default, the find() process will get all the records from the beginning. I would appreciate any help available.

    ITKE357,140 pointsBadges:
  • How to sort big data in C

    I've recently begun to start working with big data. Today, I started a project that has a file with 10,000,000 ints. I'm trying to perform a number of sorts on the data / time the sorts but I'm sure how to go about it. The code below is what I would like to do: ./mySort < myDataFile >...

    ITKE357,140 pointsBadges:
  • Can R handle data that’s bigger than RAM?

    I've been using R as open source...but it's not letting me handle data sets that are bigger than RAM memory. Would it be possible to handle big data sets applying PL/R functions inside PostgreSQL? Does anyone know?

    ITKE357,140 pointsBadges:
  • Fetch data from HBase table in Spark

    We have this huge table in HBase that's named UserAction. It has three different column families. We're trying to fetch all of the data from one column family as a JavaRDD object. We've tried using the code below but it's not working. What else can we do? static SparkConf sparkConf = new...

    ITKE357,140 pointsBadges:
  • Hadoop: When do reduce tasks start?

    I'm using Hadoop and I can't figure out when reduce tasks start up. Do they actually start after a percentage of mappers complete? Would there a fixed threshold? Thank you for the help.

    ITKE357,140 pointsBadges:
  • How does secondary sorting work in Hadoop

    I'm pretty new to the Hadoop/big data industry but I'm trying to figure how secondary sorting works in Hadoop. Why would I have to use GroupingComparator when doing this? Does anyone know how this works?

    ITKE357,140 pointsBadges:
  • Is there a .NET equivalent for Hadoop?

    For the past few years, I've been a C# developer, along with knowledge in Java. I would like to start learning Hadoop but I'm not sure where to start. Is there something along the lines of a .NET equivalent to Hadoop? Any help would be greatly appreciated.

    ITKE357,140 pointsBadges:
  • How to export data from R to SQL Server quickly

    Is there a way I can export data from R to SQL Server quickly? The standard way (SQLSave) is really slow for a large amount of data. This is what I tried so far: toSQL = data.frame(...); sqlSave(channel,toSQL,tablename="Table1",rownames=FALSE,colnames=FALSE,safer=FALSE,fast=TRUE); Thank you!

    ITKE357,140 pointsBadges:
  • How to cluster big data on a server

    I have roughly thousands of points that I need plotted with Highcharts. Would there be a way to cluster the data on a server (so it shows less than 1,000 points but when you zoom in, it will make Ajax calls to get the data for that zoomed in region). Hopefully that makes sense. I would appreciate...

    ITKE357,140 pointsBadges:
  • Document database for big data

    My department has around 100 million of records in a database. But roughly 65% of the records will be deleted on a daily basis and roughly the same amount of records will be added in. We feel like a big data document database like HBase, Cassandra or Hadoop could do this for us but we're not sure...

    ITKE357,140 pointsBadges:
  • What should I choose for file storage: MongoDB or Hadoop?

    For about the past month, I've been looking for the best solution to create scalable storage for big files. The file size varies from 1-2 megabytes and some get to 500-600 gigabytes. I'm deciding between MongoDB and Hadoop and I'm not sure which way to go. I'm thinking of using MongoDB as a file...

    ITKE357,140 pointsBadges:
  • Convert .txt file to Hadoop sequence file

    I have big data and I'm trying to store all of it in Hadoop's sequence file format. But all of the data is in a flat .txt format. Is there any way I can convert it? Thank you.

    ITKE357,140 pointsBadges:
  • Print documents in MongoDB shell

    Does anyone know of a way to print out more than 20 documents in MongoDB's shell? I've tried this: db.foo.find().limit(300) But this still prints out 20. Then I tried this code: db.foo.find().toArray() db.foo.find().forEach(printjson) But it's printing out an expanded view of each document of the...

    ITKE357,140 pointsBadges:
  • How to put the results of a Hive query to a CSV file

    I'm trying to put the results of a hive query to a CSV file. This is what my command looks like: insert overwrite directory '/home/output.csv' select books from table; So when I run it, it says it was successful but I'm having issues finding the file. Is there a way I can find this file? Thank you.

    ITKE357,140 pointsBadges:
  • Process range of Hbase rows using Spark

    We've been using HBase as a data source for Spark. We've already created a RDD from a HBase table but we can't figure out a way to create a RDD for a range scan. Does anyone know how to do it?

    ITKE357,140 pointsBadges:

Forgot Password

No problem! Submit your e-mail address below. We'll send you an e-mail containing your password.

Your password has been sent to:

To follow this tag...

There was an error processing your information. Please try again later.

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

Thanks! We'll email you when relevant content is added and updated.

Following