• Is there a .NET equivalent for Hadoop?

    For the past few years, I've been a C# developer, along with knowledge in Java. I would like to start learning Hadoop but I'm not sure where to start. Is there something along the lines of a .NET equivalent to Hadoop? Any help would be greatly appreciated.

    ITKE1,124,835 pointsBadges:
  • Document database for big data

    My department has around 100 million of records in a database. But roughly 65% of the records will be deleted on a daily basis and roughly the same amount of records will be added in. We feel like a big data document database like HBase, Cassandra or Hadoop could do this for us but we're not sure...

    ITKE1,124,835 pointsBadges:
  • What should I choose for file storage: MongoDB or Hadoop?

    For about the past month, I've been looking for the best solution to create scalable storage for big files. The file size varies from 1-2 megabytes and some get to 500-600 gigabytes. I'm deciding between MongoDB and Hadoop and I'm not sure which way to go. I'm thinking of using MongoDB as a file...

    ITKE1,124,835 pointsBadges:
  • Convert .txt file to Hadoop sequence file

    I have big data and I'm trying to store all of it in Hadoop's sequence file format. But all of the data is in a flat .txt format. Is there any way I can convert it? Thank you.

    ITKE1,124,835 pointsBadges:
  • Process range of Hbase rows using Spark

    We've been using HBase as a data source for Spark. We've already created a RDD from a HBase table but we can't figure out a way to create a RDD for a range scan. Does anyone know how to do it?

    ITKE1,124,835 pointsBadges:
  • How to count many lines in large files

    My partner and I usually work with files that are larger than 20 GB in size but we have to count the number of lines in any given file often. We've been doing it using cat fname | wc -| but it takes forever. Does anyone know of a way that will make it faster? We're working with a high performance...

    ITKE1,124,835 pointsBadges:
  • Getting error message when installing Hadoop 2.2.0

    I've been trying to install Hadoop 2.2.0 cluster on my servers. All of the servers 64-bit and I've downloaded Hadoop and the configuration files are good to go. When I started to run ./start-dfs.sh, I keep getting this error: 13/11/15 14:29:26 WARN util.NativeCodeLoader: Unable to load...

    ITKE1,124,835 pointsBadges:
  • How to merge several small files into one in Hadoop

    I have several multiple small files into my input directory that I want to merge into a single file. I need to do this without using the local file system or writing mapreds. Is there a Hadoop command I can use to do this?

    ITKE1,124,835 pointsBadges:
  • Error message when starting Hadoop on OSX

    I keep getting this weird error message when Hadoop starts up on OSX 10.7. Here's the error message: Unable to load realm info from SCDynamicStore put: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot create directory /user/travis/input/conf. Name node is in safe mode. Has anyone...

    ITKE1,124,835 pointsBadges:
  • Writing map only Hadoop jobs

    I'm pretty new to the Hadoop scene and I've run into a problem. Every once in a while, I only map for a job so I actually only need the map result directly as output (AKA reduce phase isn't needed). Is there a way to do that?

    ITKE1,124,835 pointsBadges:
  • Where do I download large data for Hadoop?

    I apologize if this is a 'newbie' question but I'm looking for large data (more than 10 GB) to run a Hadoop demo. Does anyone know if/where I can find it?

    ITKE1,124,835 pointsBadges:
  • Available Scala projects to use Hadoop and MapReduce

    My team has recently begun a big data, analytics project and we're considering using Scala. Are there any Scala projects that are available to do Hadoop and MapReduce programs?

    ITKE1,124,835 pointsBadges:
  • How to use large datasets in Hadoop

    Would anyone happen to know of any large datasets to experiment with in Hadoop with a low cost? I need to use at least 1 GB of data and a production log data of a webserver. I would appreciate any help available. Thank you.

    ITKE1,124,835 pointsBadges:
  • Is there a performance difference between Java or Python on Hadoop?

    I've been working on a project in Hadoop for quite some time and now I'm trying to incorporate Java and provide support for Python. Does anyone know if there's any performance impact when it comes to choosing between the two. Any help would be appreciated.

    ITKE1,124,835 pointsBadges:
  • Does latest stable hive (1.0.0) support faster queries and ACID transactions?

    According to this doc: http://hortonworks.com/wp-content/uploads/2013/12/StingerTechnicalPreviewInstall.pdf Stinger is coming to work with hive (0.13) for full support of SQL queries. In addition another advantage is the faster performance and in order to check out the performance advantage of...

    nicknicknick5 pointsBadges:
  • How can one use HDFS as backend storage for Squid-proxy’s cache?

    HDFS is critical part of Hadoop landscape and provides distributed filesystem capabilities. How can one utilize HDFS for storing Squid-proxy's cache data? For example: first time a YouTube video is downloaded its stored in Squid-proxy's cache residing on HDFS. next time the same request is catered...

    dbaannaeh5 pointsBadges:
  • What tools are required for Hadoop?

    I'm just starting to study the HADOOP from a job point of view. Will it require Java programming? What type of other tools are required?

    Tejaswinee5 pointsBadges:
  • What are the benefits of integrating SAP HANA with Hadoop Ecosystem?

    HANA is a inmemory database used for real time in memory computing and to deliver reports on the fly. Also used for Fraud detection. I would like to know technical details on how to integrate SAP HANA with Hadoop Ecosystem- HDFS, Map Reduce, Hive, Pig etc and what are the benefits that can be...

    mrao4u5 pointsBadges:
  • Where to carry my work in big data

    I want to carry my research work in big data. There are various fields in big data, so I want to know which field of big data will be better and efficient to carry my research work? The various types i.e in medical field, images (Image Processing) and videos. I don't have much idea about those. So...

    rohit19885 pointsBadges:
  • Should I use MongoDB for file storage?

    I'm currently looking for the best available solution to create scalable storage for my big files. Some of the files range from 1-2 megabytes and others are 500-600 gigabytes. I was looking at Hadoop but it seems a bit complicated. Now I'm looking at MongoDB and its GridFS as my next file storage...

    ITKE1,124,835 pointsBadges:

Forgot Password

No problem! Submit your e-mail address below. We'll send you an e-mail containing your password.

Your password has been sent to:

To follow this tag...

There was an error processing your information. Please try again later.

Thanks! We'll email you when relevant content is added and updated.

Following