Search This Blog

Sunday, 21 May 2017

Yarn verus Zookeeper


Yarn Vs Zookeeper (in brief)

YARN is the resource manager in Hadoop-2 architecture. It is similar to Mesos, as a role:
Given a cluster, and requests of resources, YARN will grant access to those resources (by making orders to NodeManagers which actually manage nodes). So YARN is the central scheduling coordinator of the cluster taking care that job requests get scheduled to the cluster in an orderly fashion taking into accounts resources constraints, scheduling strategies, priorities, fairness, and any rules.
So yes, YARN manages a cluster of nodes from the resource allocation coordination and scheduling perspective.


Zookeeper is in another business: ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services.
Zookeeper is a cluster of its own, with 3 or 5 nodes, and does not manage a cluster outside of it, it just like a database superficially, it allows writes and reads, in a consistent fashion (it is a CP system from CAP perspective).

Now to their relation: YARN has a HA variant (a highly available setup). In that HA setup, Automatic failover (though embedded leader election) is set up via Zookeeper.
How does this failover works automatically over zookeeper generically? (meaning, nothing yarn specific here, imagine any daemon with failover capability over a set of hosts): You can simply imagine that in zookeeper, there is a piece of information about "what yarn nodes are there"? and there could be 0 (nasty, yarn is down), 1 (ok, we got yarn up), or 2 (great, first node from this list is the current yarn master, while the second one is a standby failover yarn node, currently waiting and just copying updates from the master so he is ready if the times come. notice that there is an order here, which can be lexicographical, sorting some attribute of the hosts or host names themselves). This is just an example how leader election would work: the leader is the first element in a sorted list of nodes "competing" to be a leader of the pack.

1 comment:

Spark Memory Management

 Spark’s performance advantage over MapReduce  is due to Spark’s In-Memory Persistence and Memory Management Rather than writing to disk ...