Friday, February 27, 2015

Mongodb Installation and crud operation with index overview

Mongodb current version at the time of writing this blog is 2.8. It can be downloaded from

curl -O http://downloads.mongodb.org/osx/mongodb-osx-x86_64-2.6.8.tgz
 
or brew install mongodb on mac machines.
 
update PATH variable using 
 
export PATH=<mongodb-install-directory>/bin:$PATH 

and create data directory using

mkdir -p /data/db
 
now to run mongodb run command mongod or ./mongod from bin directory
 
mongo client - console based which can run through mongo or ./mongo
 
At this stage if you see prompt with mongo > means that you are connected to mongodb server
 
CRUD operations
 
Some understanding before we start on CRUD operations on console
 
1. show dbs will display all databases available
2. use test will connect to database test. If not existing at the time of inserting first
   document database will be created automatically.
3. mongo documents are json format, BSON is binary format type of json document
 
Inserting document in mongodb
 
db.students.insert({firstName:"paresh",lastName:"bhavsar",age:39,interest:["cricket","basket-ball","gardening"],address:{"zip":85298,"city":"phoenix"}})
 
Here, interest is array type and address is complex type element.

TO Retrieve document from collection
 
1. db.students.find({"firstName":"paresh"}) - search by first name
2. db.students.find({"address.city":"phoenix"}) - searching in complex structure
3. db.students.find({interest:{$all:["cricket","gardening"]}}) - search in array. all elements must match 

By default cursor are closed after 20 minutes this timeframe can be configured.

db.getServerStatus() command will provide information about server status.

_id is unique index for each collection in mongodb. you can not remove this index. All documents
inserted in mongodb will have default value of _id. 

To optimize the query performance, index needs to be created in mongodb.

At present as we have not created any index in the collection we can get following output for the query


 db.students.getIndexes()
[
 {
  "v" : 1,
  "key" : {
   "_id" : 1
  },
  "name" : "_id_",
  "ns" : "test.students"
 }
]

db.students.ensureIndex({firstName:1}) this will create single field index while below mentioned
will create compound index, as it is associated with multiple fields
 
db.students.ensureIndex({"firstName":1,"lastName":1})
 
now to check how many indexes are available we can use below mentioned command.
 
db.students.getIndexes()
[
 {
  "v" : 1,
  "key" : {
   "_id" : 1
  },
  "name" : "_id_",
  "ns" : "test.students"
 },
 {
  "v" : 1,
  "key" : {
   "firstName" : 1
  },
  "name" : "firstName_1",
  "ns" : "test.students"
 },
 {
  "v" : 1,
  "key" : {
   "firstName" : 1,
   "lastName" : 1
  },
  "name" : "firstName_1_lastName_1",
  "ns" : "test.students"
 }
]
 
 
 
As index is created we can findout explain command to check whether mongo is using indexes or not for query
 
db.students.find({firstName:"paresh"}).explain()
{
 "cursor" : "BtreeCursor firstName_1",
 "isMultiKey" : false,
 "n" : 1,
 "nscannedObjects" : 1,
 "nscanned" : 1,
 "nscannedObjectsAllPlans" : 2,
 "nscannedAllPlans" : 2,
 "scanAndOrder" : false,
 "indexOnly" : false,
 "nYields" : 0,
 "nChunkSkips" : 0,
 "millis" : 0,
 "indexBounds" : {
  "firstName" : [
   [
    "paresh",
    "paresh"
   ]
  ]
 },
 "server" : "Pareshs-MacBook-Pro.local:27017",
 "filterSet" : false
}

Here, index firstName_1 is used, for query with first name and last name it is mentioned below. 
 
db.students.find({firstName:"paresh","lastName":"bhavsar"}).explain()
{
 "cursor" : "BtreeCursor firstName_1_lastName_1",
 "isMultiKey" : false,
 "n" : 1,
 "nscannedObjects" : 1,
 "nscanned" : 1,
 "nscannedObjectsAllPlans" : 2,
 "nscannedAllPlans" : 2,
 "scanAndOrder" : false,
 "indexOnly" : false,
 "nYields" : 0,
 "nChunkSkips" : 0,
 "millis" : 0,
 "indexBounds" : {
  "firstName" : [
   [
    "paresh",
    "paresh"
   ]
  ],
  "lastName" : [
   [
    "bhavsar",
    "bhavsar"
   ]
  ]
 },
 "server" : "Pareshs-MacBook-Pro.local:27017",
 "filterSet" : false
} 
 
 
For queries which includes firstName and age indexes used are 
 
 db.students.find({firstName:"paresh","age":{$gt:30}}).explain()
{
 "cursor" : "BtreeCursor firstName_1_lastName_1",
 "isMultiKey" : false,
 "n" : 1,
 "nscannedObjects" : 1,
 "nscanned" : 1,
 "nscannedObjectsAllPlans" : 2,
 "nscannedAllPlans" : 2,
 "scanAndOrder" : false,
 "indexOnly" : false,
 "nYields" : 0,
 "nChunkSkips" : 0,
 "millis" : 0,
 "indexBounds" : {
  "firstName" : [
   [
    "paresh",
    "paresh"
   ]
  ],
  "lastName" : [
   [
    {
     "$minElement" : 1
    },
    {
     "$maxElement" : 1
    }
   ]
  ]
 },
 "server" : "Pareshs-MacBook-Pro.local:27017",
 "filterSet" : false
} 
 
 
 
Why is performance better, because index occupies less storage than the document and index information
is available in RAM and mostly sequential on disk.
 
If we are searching only thru ID field it uses IDCursor index to search documents
 
db.students.find({_id:1,"address.zip":85298}).explain()
{
 "cursor" : "BtreeCursor _id_",
 "isMultiKey" : false,
 "n" : 0,
 "nscannedObjects" : 0,
 "nscanned" : 0,
 "nscannedObjectsAllPlans" : 1,
 "nscannedAllPlans" : 1,
 "scanAndOrder" : false,
 "indexOnly" : false,
 "nYields" : 0,
 "nChunkSkips" : 0,
 "millis" : 0,
 "indexBounds" : {
  "_id" : [
   [
    1,
    1
   ]
  ]
 },
 "server" : "Pareshs-MacBook-Pro.local:27017",
 "filterSet" : false
}
 
 
 
IndexOnly Field : Covered Index

If returned fields in the query (called projection) are all covered with indexes, mongo will not
have to read from disk to retrieve the document and this give extremely fast, blazing fast
performance. This type of queries are called covered query.
 
> db.students.find({"firstName":"paresh"},{_id:0,"firstName":1}).explain()
 
{
 "cursor" : "BtreeCursor firstName_1",
 "isMultiKey" : false,
 "n" : 6,
 "nscannedObjects" : 0,
 "nscanned" : 6,
 "nscannedObjectsAllPlans" : 0,
 "nscannedAllPlans" : 12,
 "scanAndOrder" : false,
 "indexOnly" : true,
 "nYields" : 0,
 "nChunkSkips" : 0,
 "millis" : 0,
 "indexBounds" : {
  "firstName" : [
   [
    "paresh",
    "paresh"
   ]
  ]
 },
 "server" : "Pareshs-MacBook-Pro.local:27017",
 "filterSet" : false
} 

As we have index on lastName below mentioned query is displaying indexOnly value as true.

db.students.find({"firstName":"paresh"},{_id:0,"firstName":1,lastName:1}).explain()
{
 "cursor" : "BtreeCursor firstName_1_lastName_1",
 "isMultiKey" : false,
 "n" : 6,
 "nscannedObjects" : 0,
 "nscanned" : 6,
 "nscannedObjectsAllPlans" : 6,
 "nscannedAllPlans" : 12,
 "scanAndOrder" : false,
 "indexOnly" : true,
 "nYields" : 0,
 "nChunkSkips" : 0,
 "millis" : 0,
 "indexBounds" : {
  "firstName" : [
   [
    "paresh",
    "paresh"
   ]
  ],
  "lastName" : [
   [
    {
     "$minElement" : 1
    },
    {
     "$maxElement" : 1
    }
   ]
  ]
 },
 "server" : "Pareshs-MacBook-Pro.local:27017",
 "filterSet" : false
}





 
 
 
 


 


 
 

No comments:

Post a Comment