Jul 12, 2015

MongoDB - Sharding


Sharding is responsible to storing data records across multiple machines. As per demand to manage large amount of data, a single machine is not sufficient to store data nor provide an acceptable medium to read and write data. To resolve this problem MongoDB provide a process called Sharding. Sharding solve the problem with horizontal scaling. Through sahrding, you may add more machines to support data growth and demands of read and write operations.

Why Sharding?

  • In replication all writes go to master node
  • Latency sensitive queries still go to master
  • Single replica set has limitation of 12 nodes
  • Memory can't be large enough when active dataset is big
  • Local Disk is not big enough
  • Vertical scaling is too expensive

Sharding in MongoDB

Below given diagram shows the sharding in MongoDB using sharded cluster.

Shards: It is used to store data and provide high availability and data consistency , shard is a separate replica in production environment.

Config Servers: Config servers store the cluster's metadata. This data contains a mapping of the cluster's data set to the shards. The query router uses this metadata to target operations to specific shards. In production environment sharded clusters have exactly 3 config servers.

Query Routers: Query Routers are basically mongos instances, interface with client applications and direct operations to the appropriate shard. The query router processes and targets operations to shards and then returns results to the clients. A sharded cluster can contain more than one query router to divide the client request load. A client sends requests to one query router. Generally a sharded cluster have many query routers.