Indexing Limitations in MongoDB


This limitation regards to MongoDB version 2.6 and above.

1- As we know,  that the indexes are stored in RAM, we should make sure that the index size does not exceed the RAM limit. If index size increases than RAM will start deleting some indexes and hence causing performance loss.

2- Total size of an index entry must be less than 1024 bytes but remember it can include overhead depending on the BSON type.

3- If index entry of existing document exceeds to index key limit, MongoDB will deny creating an index on a collection. In the previous version, MongoDB would not deny creating but will not index the document.

4- Re-indexing operation will give an error if the index entry field exceeds the index key limit.

5- Update to the indexed field will give an error if entry would exceed the index key limit.

6- A collection can't have more than 64 indexes.

7- The name length can't be more than 128 characters.

8- A compound index can have a maximum 31 fields indexed.


9- If you want to create a 2dsphere index on a field than the field must hold geometry data in the form of coordinate pairs or GeoJSON data. If you try to insert a document with non-geometry data in a 2dsphere indexed field, or build a 2dsphere index on a collection where the indexed field has non-geometry data, the operation will fail.

What is the cursor in mongodb


As we know to fetch records we have to use find() method in MongoDB. db.collection.find() method is the primary method for the read operation. This method queries a collection and returns a cursor to the queried document.

For the query performance, we need to iterate cursor, for example, suppose we have a collection with 20 million records and fire below the query 


db.Books.find({"price":{$gte:200,$lte:300}})

this query will return data in more than lack, it may create an issue of the performance so we need to iterate of the cursor here.

In MongoDB shell, if the returned cursor is not assigned to a variable like var keyword, then the cursor is automatically iterated up to 20 times to print up to the first 20 documents in the results. but if we are using it with c# then it will ask for iteration of cursor due to performance.


Note: You can use the DBQuery.shellBatchSize to change the number of iteration from the default value 20.

Cursor Behavior

Closure of inactive cursor: Server close to cursor automatically after 10 minutes of inactivity or the client has exhausted the cursor. To override this behavior, you can specify the noTimeout flag in your query using cursor.addOption();

Cursor Batches: Server returns the query result in batches, the batch size may be overridden using batchsize() or limit().

See the following query

db.Books.find().batchSize(10)

Note: In MongoDB shell, it is not possible to change the default batch(20) size using method batchSize().
Use the limit() method on a cursor to specify the maximum number of documents the cursor will return. limit() is analogous to the LIMIT statement in an SQL database.

example :

cursor.limit(10)

it will return 10 document

It there is more than 10 document as result then use cursor.next() to get the other  10 documents.

Analyze Query Performance


When we start programming, one thing comes in our mind, which is performance. There are many factors of performance, but query optimization/analyzing and database structure are more important.

In this article, we will evaluate the performance of the query, Analyzing query is an aspect of measurement of database and indexing effectiveness.

db.collection.explain("executionStats")  methods provide statistics about the performance of the query. This data output can be useful in measuring if and how a query uses an index.

Using $explain
It provides statistics about the performance of the query.

Evaluate the performance of query


{ "_id" : 1, "ISBN" : "22345654562349", type: " Paper", quantity: 500 }
{ "_id" : 2, " ISBN " : "3345678765678", type: " Paper ", quantity: 100 }
{ "_id" : 3, " ISBN " : "1232234543551", type: " Paper ", quantity: 200 }
{ "_id" : 4, " ISBN " : "5545533344555", type: " Paper ", quantity: 150 }
{ "_id" : 5, " ISBN " : "9988844545457", type: " Paper ", quantity: 300 }


Query With No Index


db.Book.find(
   { quantity: { $gte: 100, $lte: 200 } }
)


Query Returns result


{ "_id" : 2, " ISBN " : "3345678765678", type: " Paper ", quantity: 100 }
{ "_id" : 3, " ISBN " : "1232234543551", type: " Paper ", quantity: 200 }
{ "_id" : 4, " ISBN " : "5545533344555", type: " Paper ", quantity: 150 }



$explain on the following query:


db.Book.find(
   { quantity: { $gte: 100, $lte: 200 } }
).explain("executionStats")


The above explain() query returns the following analyzed result:


{
   "queryPlanner" : {
         "plannerVersion" : 1,
         ...
         "winningPlan" : {
            "stage" : "COLLSCAN",
            ...
         }
   },
   "executionStats" : {
      "executionSuccess" : true,
      "nReturned" : 3,
      "executionTimeMillis" : 0,
      "totalKeysExamined" : 0,
      "totalDocsExamined" : 5,
      "executionStages" : {
         "stage" : "COLLSCAN",
         ...
      },
      ...
   },
   ...
}

The fields in this result set:

queryPlanner.winningPlan.stage : COLLSCAN, which indicates a collection is scanned.

executionStats.nReturned : 3 which indicates that the query matches and returns three documents.

executionStats.totalDocsExamined : 5, which indicates that MongoDB had to scan five documents (i.e., all documents in the collection) to find the three matching documents.


Now we indexing field quantity.


db.book.createIndex( { quantity: 1 } )


To view the query plan statistics, use the explain("executionStats") method:


db.Book.find(
   { quantity: { $gte: 100, $lte: 200 } }
).explain("executionStats")



The explain() method returns the following results:


{
   "queryPlanner" : {
         "plannerVersion" : 1,
         ...
         "winningPlan" : {
               "stage" : "FETCH",
               "inputStage" : {
                  "stage" : "IXSCAN",
                  "keyPattern" : {
                     "quantity" : 1
                  },
                  ...
               }
         },
         "rejectedPlans" : [ ]
   },
   "executionStats" : {
         "executionSuccess" : true,
         "nReturned" : 3,
         "executionTimeMillis" : 0,
         "totalKeysExamined" : 3,
         "totalDocsExamined" : 3,
         "executionStages" : {
            ...
         },
         ...
   },
   ...
}


queryPlanner.winningPlan.stage : IXSCAN which indicates a the collection is scanned.

executionStats.nReturned : 3 which indicates that the query matches and returns three documents.

executionStats.totalDocsExamined : 3 which indicate that MongoDB had to scan three documents (i.e., all documents in the index) to find the three matching documents.


Covered Queries in MongoDB


Covered Query: All the fields in the query are part of an index and all fields returned in the query are in the same index.

MongoDB matches the query conditions and returns the result using the same index without actually looking inside documents. Since indexes are present in RAM, fetching data from indexes is much faster as compared to fetching data by scanning documents.

Using Covered Queries
Consider the following document in customers collection:


{
   "_id": ObjectId("67667989hu9uh665"),
   "contact": "987654321"
   "gender": "M",
   "name": "Dilip",
   "user_name": "dksingh"
}


We should  create a compound index for customers collection on fields gender and user_name using following query:


>db. customers.ensureIndex({gender:1,user_name:1})


Here now, this index will cover the following query:


>db. customers.find({gender:"M"},{user_name:1,_id:0})


It would fetch the required data from indexed data which is very fast. Since our index does not include _id field it means MongoDB did not go into database documents, we have explicitly excluded _id from the result set of our query as MongoDB by default returns _id field in every query.

See the following query here we did not create an index of gender so when we execute this query, MongoDB  will scan this document in the database.


>db. customers.find({gender:"M"},{user_name:1})


In the end, remember that an index cannot cover a query if:

Any of the indexed fields is an array OR indexed fields is a subdocument

MongoDB 3.0: What's New


MongoDB 3.0 is the newest and most significant release of the world’s fastest-growing database. The version radically expands the use cases for MongoDB, allowing you to use it for the vast majority of all new applications. By improving performance by 7x - 10x, reducing storage needs by up to 80%, and lowering operations overhead by up to 95%, MongoDB 3.0 also dramatically increases your ability to deliver apps to market faster and more efficiently than ever before.

Download the white paper to learn about the latest features in 3.0:

  •     Pluggable storage engines, including WiredTiger
  •     Higher performance & efficiency
  •     Simplified operations through Ops Manager
  •     End-to-end auditing
  •     Enhanced Query Language and Tools

Companies ranging from startups to Fortune 500s choose MongoDB to build, improve, and innovate:


For more info click here

MongoDB - Relationships


It represents how several types of documents are logically related to each other. The relationship can be pattern via embedded and referenced approaches like  1:1, 1: N, N: 1 or N: N.

We will consider the example of storing addresses for users. Suppose one user can have multiple addresses making this a 1:N relationship.
Following is the sample document structure of user document:

{
   "_id":ObjectId("52mmm33cd85242f436000001"),
   "name": "Dilip",
   "contact": "987654321",
   "dob": "01-01-1991"
}


given below  is the sample document structure of address document:

{
   "_id":ObjectId("52ffc4a5d85242602e000054"),
   "building": "22 A",
   "pincode": 123456,
   "city": "Delhi",
   "state": "Delhi"
}


Modeling Embedded Relationships
In this approach, we will set the address document inside the user document.


{
   "_id":ObjectId("52mmm33cd85242f436000001"),
   "contact": "987654321",
   "dob": "01-01-1991",
   "name": "Dilip",
   "address": [
      {
         "building": "E-234",
         "pincode": 110096,
         "city": "Delhi",
         "state": "Delhi"
      },
      {
         "building": "A-24",
         "pincode": 203201,
         "city": "Noida",
         "state": "UP"
      }]
}


Embedded maintains all the related data in a single document which makes it easy to recover and maintain. The entire document can be recovered in a single query like this:


>db.users.findOne({"name":"Dilip"},{"address":1})


Guys, please note down that in the above query, db, and users are the database and collection respectively.
In this process, if the embedded document keeps on growing too much in size, it will impact the read/write performance.

Modeling Referenced Relationships
It is the main process of designing a normalized relationship. In this relationship, both the user and address documents will be maintained separately but the user the document will contain a field that will reference the address document's id field.


{
   "_id":ObjectId("52mmm33cd85242f436000001"),
   "contact": "987654321",
   "dob": "01-01-1991",
   "name": "Dilip",
   "address_ids": [
      ObjectId("52ffc4a5d85242602e000054"),
      ObjectId("52ffc4a5d85242602e000456")
   ]
}


As above, the user document contains the array field address_ids which contains ObjectIds of corresponding addresses. Using these ObjectIds, we can query the address documents and get address details from there. With this approach, we will need two queries: first to fetch the address_ids fields from user document and second to fetch these addresses from address collection.



>var result = db.users.findOne({"name":"Dilip"},{"address_ids":1})
>var addresses = db.address.find({"_id":{"$in":result["address_ids"]}})

MongoDB - Sharding


Sharding

Sharding is responsible for storing data records across multiple machines. As per demand to manage large amounts of data, a single machine is not sufficient to store data nor provide an acceptable medium to read and write data. To resolve this problem MongoDB provides a process called Sharding. Sharding solves the problem with horizontal scaling. Through sharding, you may add more machines to support data growth and the demands of reading and write operations.

Why Sharding?

  • In replication, all writes go to the master node
  • Latency sensitive queries still go to master
  • Single replica set has a limitation of 12 nodes
  • Memory can't be large enough when the active dataset is big
  • Local Disk is not big enough
  • Vertical scaling is too expensive

Sharding in MongoDB

Below given diagram shows the sharding in MongoDB using sharded cluster.



Shards: It is used to store data and provide high availability and data consistency, shard is a separate replica in the production environment.

Config Servers: Config servers store the cluster's metadata. This data contains a mapping of the cluster's data set to the shards. The query router uses this metadata to target operations to specific shards. In a production environment sharded clusters have exactly 3 config servers.

Query Routers: Query Routers are basically mongos instances, interface with client applications and direct operations to the appropriate shard. The query router processes and targets operations to shards and then returns results to the clients. A sharded cluster can contain more than one query router to divide the client request load. A client sends requests to one query router. Generally, a sharded cluster has many query routers.

MongoDB- Replication


  • Replication is the process of synchronizing data across multiple servers.
  • Provides redundancy and increases data availability with multiple copies of data of different database servers.
  • It protects a database from the loss of a single server and recover from hardware failure and service interruptions.
  • Replication also allows you to recover from hardware failure and service interruptions. With additional copies of the data, you can dedicate one to disaster recovery, reporting, or backup.
Why we need to use Replication:
1-It is a process to keep your data safe,
2-high availability of data,
3-disaster recovery,
3-no downtime for maintenance,
4-read scaling and it is transparent to the application.

How replication works in MongoDB

A replica set is a group of MongoDB instances that host the same data set. There is two-node first is the primary node that receives all write operations and secondary, apply operations from the primary so that they have the same data set. The replica set can have only one primary node.
1. A replica set is the group of nodes generally a minimum of three nodes are required, one node is the primary node and the remaining nodes are secondary.
2. At the time of automatic failover or maintenance, election establishes for primary and a new primary node is elected.
3. After the recovery of the failed node, it again joins the replica set and works as a secondary node.

A typical diagram of MongoDB replication is shown in which client application always interact with the primary node and primary node then replicate the data to the secondary nodes.

Replica set features


A cluster of N nodes
Anyone node can be primary
All write operations goes to primary
Automatic failover
Automatic Recovery
Consensus selection of primary

Set up a replica set 

In this codefari we will convert standalone MongoDB instance to a replica set. To convert to replica set follow the below-given steps:
·         The shutdown already running MongoDB server.
Now start the MongoDB server by specifying --replSet option. The basic syntax of --replSet is given below:


mongod --port "PORT" --dbpath "YOUR_DB_DATA_PATH" --replSet "REPLICA_SET_INSTANCE_NAME"


 Example:


mongod --port 27017 --dbpath "D:\set up\mongodb\data" --replSet rs0


It will start a MongoDB instance with the name rs0, on port 27017. Now start the command prompt and connect to this mongod instance. In mongo client issue the command rs.initiate() to initiate a new replica set. To check the replica set configuration issue the command rs.conf(). To check the status of replica set issue the command rs.status().

Add members to the replica set

To add members to a replica set, start mongod instances on multiple machines. Now start a mongo client and issue a command rs.add().

Syntax:

Basic syntax of rs.add() command is as follows:


>rs.add(HOST_NAME:PORT)

Example:


Suppose your mongod instance name is mongod1.net and it is running on port 27017. To add this instance to replica set issue the command rs.add() in mongo client.


>rs.add("mongod1.net:27017")
> 



You can add mongod instances to replica set only when you are connected to the primary node. To check whether you are connected to primary or not issue the command db.isMaster() in mongo client.

Indexing in MongoDB


As we know index is the technique to arrange data in the database which supports the efficient resolution of queries. If we run a query to select data, MongoDB does scan every document of collection. Scanning of all document may affect the performance of MongoDB when it processes the large volume of data.
The index stores the value of a specific field or set of fields, ordered by the value of the field as specified in the index.

The ensureIndex() Method

In MongoDB to create an index we need to use ensureIndex() method.
Syntax:
 Basic syntax of ensureIndex() method is given below:


>db.COLLECTION_NAME.ensureIndex({KEY:1})


In MongoDB, if we want to create index in ascending order, we will use 1 or descending order to use -1. 

Example


>db.testCol.ensureIndex({"title":1})


In ensureIndex() method we can pass multiple fields, to create an index on multiple fields.

>db.testCol.ensureIndex({"title":1,"description":-1})


ensureIndex() method also accepts a list of options (which are optional), whose list is given below:

Parameter

Type

Description

Background
Boolean
Builds the index in the background so that building an index does not block other database activities. Specify true to build in the background. The default value is false.
Unique
Boolean
It creates a unique index so that the collection will not accept the insertion of documents where the index key or keys match an existing value in the index. Specify true to create a unique index. The default value is false.
Name
String
The name of the index. If unspecified, MongoDB generates an index name by concatenating the names of the indexed fields and the sort order.

dropDups
Boolean
It creates a unique index on a field that may have duplicates. MongoDB indexes only the first occurrence of a key and removes all documents from the collection that contain subsequent occurrences of that key. Specify true to create a unique index. The default value is false.
Sparse
Boolean
If true, the index only references documents with the specified field. These indexes use less space but behave differently in some situations (particularly sorts). The default value is false.
expireAfterSeconds
Integer
Specifies a value, in seconds, as a TTL to control how long MongoDB retains documents in this collection.
V
index version
The index version number. The default index version depends on the version of mongod running when creating the index.

Weights
Document

The weight is a number ranging from 1 to 99,999 and denotes the significance of the field relative to the other indexed fields in terms of the score.
default_language
String
For a text index, the language that determines the list of stop words and the rules for the stemmer and tokenizer. The default value is English.
language_override
String
For a text index, specify the name of the field in the document that contains, the language to override the default language.

Related Posts

What is the Use of isNaN Function in JavaScript? A Comprehensive Explanation for Effective Input Validation

In the world of JavaScript, input validation is a critical aspect of ensuring that user-provided data is processed correctly. One indispensa...