I apologize if this isn't the right area to ask but I'm looking to get into the big data field (I would like to work in the industry) so would anyone happen to know of some great books on big data? I'm looking for anything on Hadoop or HBase. Thanks so much!
Thanks! We'll email you when relevant content is added and updated.
Would there be a way in HBase to get operations to know which region server the row should be written to? Just in case that several rows need to be read, how multiple region servers are contacted and results are retrieved. Thanks so much
Thanks! We'll email you when relevant content is added and updated.
We're trying to decide which software would be best for us when it comes to our big data. We're currently between HBase and Cassasndra (with Hadoop) and we're learning more towards HBase. Do you guys think HBase is a better choice for us? Is there really any difference between the two?
Thanks! We'll email you when relevant content is added and updated.
We have this huge table in HBase that's named UserAction. It has three different column families. We're trying to fetch all of the data from one column family as a JavaRDD object. We've tried using the code below but it's not working. What else can we do? static SparkConf sparkConf = new...
Thanks! We'll email you when relevant content is added and updated.
My department has around 100 million of records in a database. But roughly 65% of the records will be deleted on a daily basis and roughly the same amount of records will be added in. We feel like a big data document database like HBase, Cassandra or Hadoop could do this for us but we're not sure...
Thanks! We'll email you when relevant content is added and updated.
We've been using HBase as a data source for Spark. We've already created a RDD from a HBase table but we can't figure out a way to create a RDD for a range scan. Does anyone know how to do it?
Thanks! We'll email you when relevant content is added and updated.
Currently in Apache HBase, I've been implementing row count over ResultScanner, like this: for (Result rs = scanner.next(); rs != null; rs = scanner.next()) { number++; } But my data is starting to reach the millions so the computing is big. I'm trying to compute it real-time but I would like to...
Thanks! We'll email you when relevant content is added and updated.