Indexing Limitations in MongoDB

This limitation regards to MongoDB version 2.6 and above.

1- As we know,  that  the indexes are stored in RAM, we should make sure that the index size does not exceed the RAM limit. If index size increases  than RAM will start deleting some indexes and hence causing performance loss.

2- Total size of an index entry must be less than 1024 bytes but remember it can include overhead depending on the BSON type.

3- If index entry of existing document exceed to index key limit, MongoDB will deny to create an index on a collection. In previous version mongoDB would not deny to create but will not index the document.

4- Re-indexing operation will give error if index entry field exceeds the index key limit.

5- Update to indexed field will give error if entry would exceed the index key limit.

6- A collection can't have more than 64 index.

7- The name length can't be more than 128 character.

8- A compound index can have maximum 31 fields indexed.


9- If you want to create 2dsphere index on a field than field must hold geometry data in the form of coordinate pairs or GeoJSON data. If you try to insert a document with non-geometry data in a 2dsphere indexed field, or build a 2dsphere index on a collection where the indexed field has non-geometry data, the operation will fail.

What is the cursor in mongodb

As we know to fetch records we have to use find() method in mongodb. db.collection.find() method is the primary method for the read operation. This method queries a collection and returns a cursor to the queried document.

For the query performance, we need to iterate cursor, for example, suppose we have a collection with 20 million records and fire below the query 


db.Books.find({"price":{$gte:200,$lte:300}})

this query will return data in more than lack, it may create an issue of the performance so we need to iterate of the cursor here.

In mongodb shell, if the returned cursor is not assigned to a variable like var keyword, then the cursor is automatically iterated up to 20 times to print up to the first 20 documents in the results. but if we are using it with c# then it will ask for iteration of cursor due to performance.


Note: You can use the DBQuery.shellBatchSize to change the number of iteration from the default value 20.

Cursor Behavior

Closure of inactive cursor: Server close to cursor automatically after 10 minutes of inactivity or client has exhausted the cursor. To override this behavior, you can specify the noTimeout flag in your query using cursor.addOption();

Cursor Batches: Server returns the query result in batches, the batch size may be override using batchsize() or limit().

See following query

db.Books.find().batchSize(10)

Note : In mongodb shell, it is not possible to change default batch(20) size using method batchSize().
Use the limit() method on a cursor to specify the maximum number of documents the cursor will return. limit() is analogous to the LIMIT statement in a SQL database.

example :

cursor.limit(10)

it will return 10 document

It there are more than 10 document in result then use cursor.next() to get other  10 document.

Analyze Query Performance

When we start programming, one thing come in our mind which is performance. There are many factors of performance but query optimization/analyzing and database structure is the more important.

In this article we will evaluate the performance of the query, Analyzing query is aspect of measurement of database and indexing effectiveness.

db.collection.explain("executionStats")  methods provide statistics about performance of query. This data output can be useful in measuring if and how a query uses an index.

Using $explain
It's provides statistics about performance of query.

Evaluate the performance of query


{ "_id" : 1, "ISBN" : "22345654562349", type: " Paper", quantity: 500 }
{ "_id" : 2, " ISBN " : "3345678765678", type: " Paper ", quantity: 100 }
{ "_id" : 3, " ISBN " : "1232234543551", type: " Paper ", quantity: 200 }
{ "_id" : 4, " ISBN " : "5545533344555", type: " Paper ", quantity: 150 }
{ "_id" : 5, " ISBN " : "9988844545457", type: " Paper ", quantity: 300 }


Query With No Index


db.Book.find(
   { quantity: { $gte: 100, $lte: 200 } }
)


Query Returns result


{ "_id" : 2, " ISBN " : "3345678765678", type: " Paper ", quantity: 100 }
{ "_id" : 3, " ISBN " : "1232234543551", type: " Paper ", quantity: 200 }
{ "_id" : 4, " ISBN " : "5545533344555", type: " Paper ", quantity: 150 }



$explain on the following query:


db.Book.find(
   { quantity: { $gte: 100, $lte: 200 } }
).explain("executionStats")


The above explain() query returns the following analyzed result:


{
   "queryPlanner" : {
         "plannerVersion" : 1,
         ...
         "winningPlan" : {
            "stage" : "COLLSCAN",
            ...
         }
   },
   "executionStats" : {
      "executionSuccess" : true,
      "nReturned" : 3,
      "executionTimeMillis" : 0,
      "totalKeysExamined" : 0,
      "totalDocsExamined" : 5,
      "executionStages" : {
         "stage" : "COLLSCAN",
         ...
      },
      ...
   },
   ...
}

The fields in this result set:

queryPlanner.winningPlan.stage : COLLSCAN which indicate a collection is scanned.

executionStats.nReturned : 3 which indicate that the query matches and returns three documents.

executionStats.totalDocsExamined : 5 which indicate that MongoDB had to scan five documents (i.e. all documents in the collection) to find the three matching documents.


Now we indexing field quantity


db.book.createIndex( { quantity: 1 } )


To view the query plan statistics, use the explain("executionStats") method:


db.Book.find(
   { quantity: { $gte: 100, $lte: 200 } }
).explain("executionStats")



The explain() method returns the following results:


{
   "queryPlanner" : {
         "plannerVersion" : 1,
         ...
         "winningPlan" : {
               "stage" : "FETCH",
               "inputStage" : {
                  "stage" : "IXSCAN",
                  "keyPattern" : {
                     "quantity" : 1
                  },
                  ...
               }
         },
         "rejectedPlans" : [ ]
   },
   "executionStats" : {
         "executionSuccess" : true,
         "nReturned" : 3,
         "executionTimeMillis" : 0,
         "totalKeysExamined" : 3,
         "totalDocsExamined" : 3,
         "executionStages" : {
            ...
         },
         ...
   },
   ...
}


queryPlanner.winningPlan.stage : IXSCAN which indicate a collection is scanned.

executionStats.nReturned : 3 which indicate that the query matches and returns three documents.

executionStats.totalDocsExamined : 3 which indicate that MongoDB had to scan three documents (i.e. all documents in the index) to find the three matching documents.


Covered Queries in MongoDB

According to mongoDB document overview of index,If your application queries a collection on a particular field or set of fields, then an index on the queried field or a compound index on the set of fields can prevent the query from scanning the whole collection to find and return the query results. For more information about indexes click here.

What is a Covered Query

Covered Query : All the fields in the query are part of an index and all fields returned in the query are in the same index.

 MongoDB matches the query conditions and returns the result using the same index without actually looking inside documents. Since indexes are present in RAM, fetching data from indexes is much faster as compared to fetching data by scanning documents.

Using Covered Queries
Consider the following document in customers collection:


{
   "_id": ObjectId("67667989hu9uh665"),
   "contact": "987654321"
   "gender": "M",
   "name": "Dilip",
   "user_name": "dksingh"
}


We should  create a compound index for customers collection on fields gender and user_name using following query:


>db. customers.ensureIndex({gender:1,user_name:1})


Here now, this index will cover the following query:


>db. customers.find({gender:"M"},{user_name:1,_id:0})


It would fetch the required data from indexed data which is very fast. Since our index does not include _id field it means MongoDB did not go into database documents, we have explicitly excluded _id from result set of our query as MongoDB by default returns _id field in every query.
See following query here we did not create index of gender so when we execute this query, mongoDB  will scan the this document in database.


>db. customers.find({gender:"M"},{user_name:1})


At the end , remember that an index cannot cover a query if:

Any of the indexed fields is an array OR indexed fields is a subdocument

MongoDB 3.0: What's New

MongoDB 3.0 is the newest and most significant release of the world’s fastest growing database. The version radically expands the use cases for MongoDB, allowing you to use it for the vast majority of all new applications. By improving performance by 7x - 10x, reducing storage needs by up to 80%, and lowering operations overhead by up to 95%, MongoDB 3.0 also dramatically increases your ability to deliver apps to market faster and more efficiently than ever before.

Download the white paper to learn about the latest features in 3.0:

  •     Pluggable storage engines, including WiredTiger
  •     Higher performance & efficiency
  •     Simplified operations through Ops Manager
  •     End-to-end auditing
  •     Enhanced Query Language and Tools

Companies ranging from startups to Fortune 500s choose MongoDB to build, improve, and innovate:


For more info click here

MongoDB - Relationships

It's represent how several types of documents are logically related to each other. Relationship can be pattern via embedded and referenced approaches like  1:1, 1: N, N: 1 or N: N.

We will consider the example of storing addresses for users. Suppose one user can have multiple addresses making this a 1:N relationship.
Following is the sample document structure of user document:

{
   "_id":ObjectId("52mmm33cd85242f436000001"),
   "name": "Dilip",
   "contact": "987654321",
   "dob": "01-01-1991"
}


given below  is the sample document structure of address document:

{
   "_id":ObjectId("52ffc4a5d85242602e000054"),
   "building": "22 A",
   "pincode": 123456,
   "city": "Delhi",
   "state": "Delhi"
}


Modeling Embedded Relationships
In this approach, we will set the address document inside the user document.


{
   "_id":ObjectId("52mmm33cd85242f436000001"),
   "contact": "987654321",
   "dob": "01-01-1991",
   "name": "Dilip",
   "address": [
      {
         "building": "E-234",
         "pincode": 110096,
         "city": "Delhi",
         "state": "Delhi"
      },
      {
         "building": "A-24",
         "pincode": 203201,
         "city": "Noida",
         "state": "UP"
      }]
}


Embedded maintains all the related data in a single document which makes it easy to recover and maintain. The entirely document can be recovered in a single query like this:


>db.users.findOne({"name":"Dilip"},{"address":1})


Guys, please note down that in the above query, db and users are the database and collection respectively.
In this process if the embedded document keeps on growing too much in size, it will impact the read/write performance.

Modeling Referenced Relationships
It is main process of designing normalized relationship. In this relationships , both the user and address documents will be maintained separately but the user document will contain a field that will reference the address document's id field.


{
   "_id":ObjectId("52mmm33cd85242f436000001"),
   "contact": "987654321",
   "dob": "01-01-1991",
   "name": "Dilip",
   "address_ids": [
      ObjectId("52ffc4a5d85242602e000054"),
      ObjectId("52ffc4a5d85242602e000456")
   ]
}


As above, the user document contains the array field address_ids which contains ObjectIds of corresponding addresses.Using these ObjectIds, we can query the address documents and get address details from there. With this approach, we will need two queries: first to fetch the address_ids fields from user document and second to fetch these addresses from address collection.



>var result = db.users.findOne({"name":"Dilip"},{"address_ids":1})
>var addresses = db.address.find({"_id":{"$in":result["address_ids"]}})

MongoDB - Sharding

Sharding


Sharding is responsible to storing data records across multiple machines. As per demand to manage large amount of data, a single machine is not sufficient to store data nor provide an acceptable medium to read and write data. To resolve this problem MongoDB provide a process called Sharding. Sharding solve the problem with horizontal scaling. Through sahrding, you may add more machines to support data growth and demands of read and write operations.

Why Sharding?

  • In replication all writes go to master node
  • Latency sensitive queries still go to master
  • Single replica set has limitation of 12 nodes
  • Memory can't be large enough when active dataset is big
  • Local Disk is not big enough
  • Vertical scaling is too expensive

Sharding in MongoDB

Below given diagram shows the sharding in MongoDB using sharded cluster.



Shards: It is used to store data and provide high availability and data consistency , shard is a separate replica in production environment.

Config Servers: Config servers store the cluster's metadata. This data contains a mapping of the cluster's data set to the shards. The query router uses this metadata to target operations to specific shards. In production environment sharded clusters have exactly 3 config servers.

Query Routers: Query Routers are basically mongos instances, interface with client applications and direct operations to the appropriate shard. The query router processes and targets operations to shards and then returns results to the clients. A sharded cluster can contain more than one query router to divide the client request load. A client sends requests to one query router. Generally a sharded cluster have many query routers.

MongoDB- Replication

  • Replication is the process of synchronizing data across multiple servers.
  • Provides redundancy and increases data availability with multiple copies of data of different database servers.
  • It is protects a database from the loss of a single server and recover from hardware failure and service interruptions.
  • Replication also allows you to recover from hardware failure and service interruptions. With additional copies of the data, you can dedicate one to disaster recovery, reporting, or backup.
Why we need to use Replication:
1-It is a process to keep your data safe ,
2-high availability of data,
3-disaster recovery ,
3-no down time for maintenance ,
4-read scaling and it is  a transparent to the application.

How replication works in MongoDB

A replica set is a group of MongoDB instances that host the same data set. There is two node first is primary node that receives all write operations and secondary, apply operations from the primary so that they have the same data set. Replica set can have only one primary node.
1. Replica set is the group of nodes generally minimum three nodes are required, one node is primary node and remaining nodes are secondary.
2. At the time of automatic failover or maintenance, election establishes for primary and a new primary node is elected.
3 .After the recovery of failed node, it again join the replica set and works as a secondary node.

A typical diagram of mongodb replication is shown in which client application always interact with primary node and primary node then replicate the data to the secondary nodes.

Replica set features


A cluster of N nodes
Anyone node can be primary
All write operations goes to primary
Automatic failover
Automatic Recovery
Consensus election of primary

Set up a replica set 

In this codefari we will convert standalone mongodb instance to a replica set. To convert to replica set follow the below given steps:
·         Shutdown already running mongodb server.
Now start the mongodb server by specifying --replSet option. Basic syntax of --replSet is given below:


mongod --port "PORT" --dbpath "YOUR_DB_DATA_PATH" --replSet "REPLICA_SET_INSTANCE_NAME"


 Example:


mongod --port 27017 --dbpath "D:\set up\mongodb\data" --replSet rs0


It will start a mongod instance with the name rs0, on port 27017. Now start the command prompt and connect to this mongod instance. In mongo client issue the command rs.initiate() to initiate a new replica set. To check the replica set configuration issue the command rs.conf(). To check the status of replica sete issue the command rs.status().

Add members to replica set


To add members to replica set, start mongod instances on multiple machines. Now start a mongo client and issue a command rs.add().

Syntax:


Basic syntax of rs.add() command is as follows:


>rs.add(HOST_NAME:PORT)

Example:


Suppose your mongod instance name is mongod1.net and it is running on port 27017. To add this instance to replica set issue the command rs.add() in mongo client.


>rs.add("mongod1.net:27017")
> 



You can add mongod instance to replica set only when you are connected to primary node. To check whether you are connected to primary or not issue the command db.isMaster() in mongo client.

Indexing in MongoDB

As we know index is technique to arrange data in database which support the efficient resolution of queries. If we run a query to select data, MongoDB do scan every document of collection. Scanning of all document may affect performance of MongoDB when it process the large volume of data.
The index stores the value of a specific field or set of fields, ordered by the value of the field as specified in index.

The ensureIndex() Method

In MongoDB to create an index we need to use ensureIndex() method.
Syntax:
 Basic syntax of ensureIndex() method is given below:


>db.COLLECTION_NAME.ensureIndex({KEY:1})


In MongoDB if we want to create index in ascending order the we will use 1 or descending order to use -1. 

Example


>db.testCol.ensureIndex({"title":1})



In ensureIndex() method we can pass multiple fields, to create index on multiple fields.

>db.testCol.ensureIndex({"title":1,"description":-1})


ensureIndex() method also accepts list of options (which are optional), whose list is given below:

Parameter

Type

Description

Background
Boolean
Builds the index in the background so that building an index does not block other database activities. Specify true to build in the background. The default value is false.
Unique
Boolean
Creates a unique index so that the collection will not accept insertion of documents where the index key or keys match an existing value in the index. Specify true to create a unique index. The default value is false.
Name
String
The name of the index. If unspecified, MongoDB generates an index name by concatenating the names of the indexed fields and the sort order.

dropDups
Boolean
Creates a unique index on a field that may have duplicates. MongoDB indexes only the first occurrence of a key and removes all documents from the collection that contain subsequent occurrences of that key. Specify true to create unique index. The default value is false.
Sparse
Boolean
If true, the index only references documents with the specified field. These indexes use less space but behave differently in some situations (particularly sorts). The default value is false.
expireAfterSeconds
Integer
Specifies a value, in seconds, as a TTL to control how long MongoDB retains documents in this collection.
V
index version
The index version number. The default index version depends on the version of mongod running when creating the index.

Weights
Document

The weight is a number ranging from 1 to 99,999 and denotes the significance of the field relative to the other indexed fields in terms of the score.
default_language
String
For a text index, the language that determines the list of stop words and the rules for the stemmer and tokenizer. The default value is english.
language_override
String
For a text index, specify the name of the field in the document that contains, the language to override the default language.

PostgreSQL-Query: Sort result set by specific field values using ORDER BY Clause

Problem: Suppose we have a book_inventory table which has some columns such as id, isbn, title, author, publisher, publish_date, etc.. whe...