Input split in Hadoop

85 pts.
Tags:
Big Data
Big Data Analysts
Hadoop
What is Input Split in Hadoop?

Answer Wiki

Thanks. We'll let you know when a new response is added.
InputSplits are created by the logical division of data, which serves as the input to a single Mapper job. Blocks, on the other hand, are created by the physical division of data. One input split can spread across multiple physical blocks of data.

The basic need of Input splits is to feed accurate logical locations of data correctly to the Mapper so that each Mapper can process complete set of data spread over more than one blocks. When Hadoop submits a job, it splits the input data logically (Input splits) and these are processed by each Mapper. The number of Mappers is equal to the number of input splits created.

InputFormat.getSplits() is responsible for generating the input splits which uses each split as input for each mapper job.

Follow the link to learn more about InputSplits in Hadoop

Discuss This Question: 1  Reply

 
There was an error processing your information. Please try again later.
Thanks. We'll let you know when a new response is added.
Send me notifications when members answer or reply to this question.
  • Subhendu Sen
    The InputFormat is responsible to provide the splits. Files are split into HDFS blocks and the blocks are replicated. Hadoop assigns a node for a split based on data locality principle.
    106,040 pointsBadges:
    report

Forgot Password

No problem! Submit your e-mail address below. We'll send you an e-mail containing your password.

Your password has been sent to:

To follow this tag...

There was an error processing your information. Please try again later.

Thanks! We'll email you when relevant content is added and updated.

Following

Share this item with your network: