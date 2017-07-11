Online Training, Assessment and Talent hiring Platform powered by Analytics and AI

Contact

Hub4Tech

+91 90691 39140

info@hub4tech.com Hub4Tech+91 90691 39140

End

-- Last year was the year of Big Data – the year when big data and analytics made tremendous progress through innovative technologies, data-driven decision making and outcome-centric analytics. Worldwide revenues for big data and business analytics will grow more than $203 billion in 2020 (source IDC).Prepare with these top Hadoop MapReduce interview questions to get an edge in the burgeoning Big Data market where global and local enterprises, big or small, are looking for quality Big Data and Hadoop experts.Here, is the list of Hadoop MapReduce Interview Questions frequently asked by employers, these questions and answers will help you to stand up to the expectation of the employers.1. What is Hadoop Map Reduce ?For processing large data sets in parallel across a hadoop cluster, Hadoop MapReduce framework is used. Data analysis uses a two-step map and reduce process.2. How Hadoop MapReduce works?In MapReduce, during the map phase it counts the words in each document, while in the reduce phase it aggregates the data as per the document spanning the entire collection. During the map phase the input data is divided into splits for analysis by map tasks running in parallel across Hadoop framework.3. How JobTracker schedules a task?The TaskTrackers send out heartbeat messages to the JobTracker, usually every few minutes, to reassure the JobTracker that it is still alive. These message also inform the JobTracker of the number of available slots, so the JobTracker can stay up to date with where in the cluster work can be delegated. When the JobTracker tries to find somewhere to schedule a task within the MapReduce operations, it first looks for an empty slot on the same server that hosts the DataNode containing the data, and if not, it looks for an empty slot on a machine in the same rack.4. Explain what is shuffling in MapReduce ?The process by which the system performs the sort and transfers the map outputs to the reducer as inputs is known as the shuffle5. Explain what is distributed Cache in MapReduce Framework ?Distributed Cache is an important feature provided by map reduce framework. When you want to share some files across all nodes in Hadoop Cluster, DistributedCache is used. The files could be an executable jar files or simple properties file.6. What is a Task instance in Hadoop? Where does it run?Task instances are the actual MapReduce jobs which are run on each slave node. The TaskTracker starts a separate JVM processes to do the actual work (called as Task Instance) this is to ensure that process failure does not take down the task tracker. Each Task Instance runs on its own JVM process. There can be multiple processes of task instance running on a slave node. This is based on the number of slots configured on task tracker. By default a new task instance JVM process is spawned for a task.7. Explain what is NameNode in Hadoop?NameNode in Hadoop is the node, where Hadoop stores all the file location information in HDFS (Hadoop Distributed File System). In other words, NameNode is the centrepiece of an HDFS file system. It keeps the record of all the files in the file system, and tracks the file data across the cluster or multiple machines8. How many Daemon processes run on a Hadoop system?Hadoop is comprised of five separate daemons. Each of these daemon run in its own JVM. Following 3 Daemons run on Master nodes NameNode - This daemon stores and maintains the metadata for HDFS. Secondary NameNode - Performs housekeeping functions for the NameNode. JobTracker - Manages MapReduce jobs, distributes individual tasks to machines running the Task Tracker. Following 2 Daemons run on each Slave nodes DataNode – Stores actual HDFS data blocks. TaskTracker - Responsible for instantiating and monitoring individual Map and Reduce tasks.9. Explain what is heartbeat in HDFS?Heartbeat is referred to a signal used between a data node and Name node, and between task tracker and job tracker, if the Name node or job tracker does not respond to the signal, then it is considered there is some issues with data node or task tracker10. Explain what combiners is and when you should use a combiner in a MapReduce Job?To increase the efficiency of MapReduce Program, Combiners are used. The amount of data can be reduced with the help of combiner's that need to be transferred across to the reducers. If the operation performed is commutative and associative you can use your reducer code as a combiner. The execution of combiner is not guaranteed in Hadoop· Big Data and Hadoop Administrator Online Training· Big Data and Hadoop Developer Online Training