A. Im, G. Cai, H. Tunc, J. Stevens, Y. Barve, S. Hei Vanderbilt

81 Slides3.51 MB

A. Im, G. Cai, H. Tunc, J. Stevens, Y. Barve, S. Hei Vanderbilt University mongodb.org

Content Part 1: Introduction & Basics 2: CRUD 3: Schema Design 4: Indexes 5: Aggregation 6: Replication & Sharding

History mongoDB “Humongous DB” Open-source Document-based “High performance, high availability” Automatic scaling C-P on CAP -blog.mongodb.org/post/475279604/on-distributed-consistency-part-1 -mongodb.org/manual

Other NoSQL Types Key/value (Dynamo) Columnar/tabular (HBase) Document (mongoDB) http://www.aaronstannard.com/post/2011/06/30/MongoDB-vs-SQL-Server.aspx

Motivations Problems with SQL Rigid schema Not easily scalable (designed for 90’s technology or worse) Requires unintuitive joins Perks of mongoDB Easy interface with common languages (Java, Javascript, PHP, etc.) DB tech should run anywhere (VM’s, cloud, etc.) Keeps essential features of RDBMS’s while learning from key-value noSQL systems http://www.slideshare.net/spf13/mongodb-9794741?v qf1&b &from search 13

Company Using mongoDB “MongoDB powers Under Armour’s online store, and was chosen for its dynamic schema, ability to scale horizontally and perform multidata center replication.” http://www.mongodb.org/about/production-deployments/

-Steve Francia, http://www.slideshare.net/spf13/mongodb-9794741?v qf1&b &from search 13

Data Model Document-Based (max 16 MB) Documents are in BSON format, consisting of field-value pairs Each document stored in a collection Collections Have index set in common Like tables of relational db’s. Documents do not have to have uniform structure -docs.mongodb.org/manual/

JSON “JavaScript Object Notation” Easy for humans to write/read, easy for computers to parse/generate Objects can be nested Built on name/value pairs Ordered list of values http://json.org/

BSON “Binary JSON” Binary-encoded serialization of JSON-like docs Also allows “referencing” Embedded structure reduces need for joins Goals – Lightweight – Traversable – Efficient (decoding and encoding) http://bsonspec.org/

BSON Example { " id" : "37010" "city" : "ADAMS", "pop" : 2660, "state" : "TN", “councilman” : { name: “John Smith” address: “13 Scenic Way” } }

BSON Types Type Number Double 1 String 2 Object 3 Array 4 Binary data 5 Object id 7 Boolean 8 Date 9 Null 10 Regular Expression 11 JavaScript 13 Symbol 14 JavaScript (with scope) 15 32-bit integer 16 Timestamp 17 64-bit integer 18 Min key 255 Max key 127 http://docs.mongodb.org/manual/reference/bson-types/ The number can be used with the type operator to query by type!

The id Field By default, each document contains an id field. This field has a number of special characteristics: – Value serves as primary key for collection. – Value is unique, immutable, and may be any non-array type. – Default data type is ObjectId, which is “small, likely unique, fast to generate, and ordered.” Sorting on an ObjectId value is roughly equivalent to sorting on creation time. http://docs.mongodb.org/manual/reference/bson-types/

mongoDB vs. SQL mongoDB SQL Document Tuple Collection Table/View PK: id Field PK: Any Attribute(s) Uniformity not Required Uniform Relation Schema Index Index Embedded Structure Joins Shard Partition

CRUD Create, Read, Update, Delete

Getting Started with mongoDB To install mongoDB, go to this link and click on the appropriate OS and architecture: http://www.mongodb.org/downloads First, extract the files (preferrably to the C drive). Finally, create a data directory on C:\ for mongoDB to use i.e. “md data” followed by “md data\db” http://docs.mongodb.org/manual/tutorial/install-mongodb-on-windows/

Getting Started with mongoDB Open your mongodb/bin directory and run mongod.exe to start the database server. To establish a connection to the server, open another command prompt window and go to the same directory, entering in mongo.exe. This engages the mongodb shell—it’s that easy! http://docs.mongodb.org/manual/tutorial/getting-started/

CRUD: Using the Shell To check which db you’re using db Show all databases show dbs Switch db’s/make a new one use name See what collections exist show collections Note: db’s are not actually created until you insert data!

CRUD: Using the Shell (cont.) To insert documents into a collection/make a new collection: db. collection .insert( document ) INSERT INTO table VALUES( attributevalues );

CRUD: Inserting Data Insert one document db. collection .insert({ field : valu e }) Inserting a document with a field name new to the collection is inherently supported by the BSON model. To insert multiple documents, use an array.

CRUD: Querying Done on collections. Get all docs: db. collection .find() Returns a cursor, which is iterated over shell to display first 20 results. Add .limit( number ) to limit results SELECT * FROM table ; Get one doc: db. collection .findOne()

CRUD: Querying To match a specific value: db. collection .find({ field : value }) “AND” db. collection .find({ field1 : value1 , field2 : value2 }) SELECT * FROM table WHERE field1 value1 AND field2 value2 ;

CRUD: Querying OR db. collection .find({ or: [ field : value1 field : value2 ] }) SELECT * FROM table WHERE field value1 OR field value2 ; Checking for multiple values of same field db. collection .find({ field : { in [ value , value ]}})

CRUD: Querying Including/excluding document fields db. collection .find({ field1 : value }, { field2 : 0}) SELECT field1 FROM table ; db. collection .find({ field : value }, { field2 : 1}) Find documents with or w/o field db. collection .find({ field : { exists: true}})

CRUD: Updating db. collection .update( { field1 : value1 }, //all docs in which field value { set: { field2 : value2 }}, //set field to value {multi:true} ) //update multiple docs upsert: if true, creates a new doc when none matches search criteria. UPDATE table SET field2 value2 WHERE field1 value1 ;

CRUD: Updating To remove a field db. collection .update({ field : value }, { unset: { field : 1}}) Replace all field-value pairs db. collection .update({ field : value }, { field : value , field : value }) *NOTE: This overwrites ALL the contents of a document, even removing fields.

CRUD: Removal Remove all records where field value db. collection .remove({ field : value }) DELETE FROM table WHERE field value ; As above, but only remove first document db. collection .remove({ field : value }, true)

CRUD: Isolation By default, all writes are atomic only on the level of a single document. This means that, by default, all writes can be interleaved with other operations. You can isolate writes on an unsharded collection by adding isolated:1 in the query area: db. collection .remove({ field : value , isolated: 1})

Schema Design

RDBMS MongoDB Database Database Table Collection Row Document Index Index Join Embedded Document Reference Foreign Key

Intuition – why database exist in the first place? Why can’t we just write programs that operate on objects? Memory limit We cannot swap back from disk merely by OS for the page based memory management mechanism Why can’t we have the database operating on the same data structure as in program? That is where mongoDB comes in

Mongo is basically schema-free The purpose of schema in SQL is for meeting the requirements of tables and quirky SQL implementation Every “row” in a database “table” is a data structure, much like a “struct” in C, or a “class” in Java. A table is then an array (or list) of such data structures So we what we design in mongoDB is basically same way how we design a compound data type binding in JSON

There are some patterns Embedding Linking

Embedding & Linking

One to One relationship zip { id: 35004, city: “ACMAR”, loc: [-86, 33], pop: 6065, State: “AL” } Council person { zip id 35004, name: “John Doe", address: “123 Fake St.”, Phone: 123456 } zip { id: 35004 , city: “ACMAR” loc: [-86, 33], pop: 6065, State: “AL”, council person: { name: “John Doe", address: “123 Fake St.”, Phone: 123456 } }

Example 2 MongoDB: The Definitive Guide, By Kristina Chodorow and Mike Dirolf Published: 9/24/2010 Pages: 216 Language: English Publisher: O’Reilly Media, CA

One to many relationship Embedding book { title: "MongoDB: The Definitive Guide", authors: [ "Kristina Chodorow", "Mike Dirolf" ] published date: ISODate("2010-09-24"), pages: 216, language: "English", publisher: { name: "O’Reilly Media", founded: "1980", location: "CA" } }

One to many relationship – publisher { Linking id: "oreilly", name: "O’Reilly Media", founded: "1980", location: "CA" } book { title: "MongoDB: The Definitive Guide", authors: [ "Kristina Chodorow", "Mike Dirolf" ] published date: ISODate("2010-09-24"), pages: 216, language: "English", publisher id: "oreilly"

Linking vs. Embedding Embedding is a bit like pre-joining data Document level operations are easy for the server to handle Embed when the “many” objects always appear with (viewed in the context of) their parents. Linking when you need more flexibility

Many to many relationship Can put relation in either one of the documents (embedding in one of the documents) Focus how data is accessed queried

Example book { title: "MongoDB: The Definitive Guide", authors : [ { id: "kchodorow", name: "Kristina Chodorow” }, { id: "mdirolf", name: "Mike Dirolf” } ] published date: ISODate("2010-09-24"), pages: 216, language: "English" } author { id: "kchodorow", name: "Kristina Chodorow", hometown: "New York" } db.books.find( { authors.name : "Kristina Chodorow" } )

What is bad about SQL ( semantically ) “Primary keys” of a database table are in essence persistent memory addresses for the object. The address may not be the same when the object is reloaded into memory. This is why we need primary keys. Foreign key functions just like a pointer in C, persistently point to the primary key. Whenever we need to deference a pointer, we do JOIN It is not intuitive for programming and also JOIN is time consuming

Example 3 Book can be checked out by one student at a time Student can check out many books

Modeling Checkouts student { id: "joe" name: "Joe Bookreader", join date: ISODate("2011-10-15"), address: { . } } book { id: "123456789" title: "MongoDB: The Definitive Guide", authors: [ "Kristina Chodorow", "Mike Dirolf" ], . }

Modeling Checkouts student { id: "joe" name: "Joe Bookreader", join date: ISODate("2011-10-15"), address: { . }, checked out: [ { id: "123456789", checked out: "2012-1015" }, { id: "987654321", checked out: "2012-0912" }, . ] }

What is good about mongoDB? find() is more semantically clear for programming (map (lambda (b) b.title) (filter (lambda (p) ( p 100)) Book) Data locality, and Data locality provides speed De-normalization provides

Part 4: Index in MongoDB

Before Index What does database normally do when we query? MongoDB must scan every document. Inefficient because process large volume db.users.find( { score: { “ lt” : 30} } ) of data

Definition of Index Definition Indexes are special data structures that store a small portion of the collection’s data set in an easy to traverse form. Index Diagram of a query that uses an index to select

Index in MongoDB Operations Creation index db.users.ensureIndex( { score: 1 } ) Show existing indexes db.users.getIndexes() Drop index db.users.dropIndex( {score: 1} ) Explain—Explain db.users.find().explain() Returns a document that describes the process and indexes Hint db.users.find().hint({score: 1}) Overide MongoDB’s default index selection

Index in MongoDB Types Single Field Indexes Compound Field Indexes Multikey Indexes Single Field Indexes – db.users.ensureIndex( { score: 1 } )

Index in MongoDB Types Single Field Indexes Compound Field Indexes Multikey Indexes Compound Field Indexes – db.users.ensureIndex( { userid:1, score: -1 } )

Index in MongoDB Types Single Field Indexes Compound Field Indexes Multikey Indexes Multikey Indexes – db.users.ensureIndex( { addr.zip:1} )

Demo of indexes in MongoDB Import Data Create Index Single Field Index Compound Field Indexes Multikey Indexes Show Existing Index Hint Single Field Index Compound Field Indexes Multikey Indexes Explain Compare with data without indexes

Demo of indexes in MongoDB Import Data Create Index Single Field Index Compound Field Indexes Multikey Indexes Show Existing Index Hint Single Field Index Compound Field Indexes Multikey Indexes Explain Compare with data without indexes

Demo of indexes in MongoDB Import Data Create Index Single Field Index Compound Field Indexes Multikey Indexes Show Existing Index Hint Single Field Index Compound Field Indexes Multikey Indexes Explain Compare with data without indexes

Demo of indexes in MongoDB Import Data Create Index Single Field Index Compound Field Indexes Multikey Indexes Show Existing Index Hint Single Field Index Compound Field Indexes Multikey Indexes Explain Compare with data without indexes

Demo of indexes in MongoDB Import Data Create Index Single Field Index Compound Field Indexes Multikey Indexes Show Existing Index Hint Single Field Index Compound Field Indexes Multikey Indexes Explain Compare with data without indexes

Demo of indexes in MongoDB Import Data Create Index Single Field Index Compound Field Indexes Multikey Indexes Show Existing Index Hint Single Field Index Compound Field Indexes Multikey Indexes Explain Compare with data without indexes

Demo of indexes in MongoDB Import Data Create Index Single Field Index Compound Field Indexes Multikey Indexes Show Existing Index Hint Single Field Index Compound Field Indexes Multikey Indexes Explain Compare with data without indexes

Demo of indexes in MongoDB Import Data Create Index Single Field Index Compound Field Indexes Multikey Indexes Show Existing Index Hint Single Field Index Compound Field Indexes Multikey Indexes Explain Compare with data without indexes Without Index With Index

Aggregation Operations that process data records and return computed results. MongoDB provides aggregation operations Running data aggregation on the mongod instance simplifies application code and limits resource requirements.

Pipelines Modeled on the concept of data processing pipelines. Provides: filters that operate like queries document transformations that modify the form of the output document. Provides tools for: grouping and sorting by field aggregating the contents of arrays, including arrays of documents Can use operators for tasks such as calculating the average or concatenating a string.

Pipelines limit skip sort

Map-Reduce Has two phases: A map stage that processes each document and emits one or more objects for each input document A reduce phase that combines the output of the map operation. An optional finalize stage for final modifications to the result Uses Custom JavaScript functions Provides greater flexibility but is less efficient and more complex than the aggregation pipeline Can have output sets that exceed the 16 megabyte output limitation of the aggregation pipeline.

Single Purpose Aggregation Operations Special purpose database commands: returning a count of matching documents returning the distinct values for a field grouping data based on the values of a field. Aggregate documents from a single collection. Lack the flexibility and capabilities of the aggregation pipeline and map-reduce.

Replication & Sharding Image source: http://mongodb.in.th

Replication What is replication? Purpose of replication/redundancy Fault tolerance Availability Increase read capacity

Replication in MongoDB Replica Set Members Primary Read, Write operations Secondary Asynchronous Replication Can be primary Arbiter Voting Can’t be primary Delayed Secondary Can’t be primary

Replication in MongoDB Automatic Failover Heartbeats Elections The Standard Replica Set Deployment Deploy an Odd Number of Members Rollback Security SSL/TLS

Demo for Replication

Sharding What is sharding? Purpose of sharding Query Routers Horizontal scaling out mongos Shard keys Range based sharding Cardinality Avoid hotspotting

Demo for Sharding

Thanks

Back to top button