Quantcast
Channel: Blog Posts From Zenoss Blog: No Node Left Behind Tagged With databases
Viewing all articles
Browse latest Browse all 2

Datacenter Barometer: Enter the World of NoSQL, Part 2

$
0
0

In last week's episode of Datacenter Barometer, you began the journey into the strange and mystical world of non-relational databases... and discovered that this class of database wasn't so strange and mystical as one might think.

 

Thanks to a talk given a couple of weeks ago at Penguicon by Geeknet Senior Developer Mark Ramm, I was able to learn quite a bit of introductory information about non-relational databases, which are euphemistically known as "NoSQL" databases.

 

What makes NoSQL databases unique is the independence from Structured Query Language found in relational databases. Relational databases all use SQL as the

domain-specific language for ad-hoc queries, while non-relational databases have no such standard query language, so they can use whatever they want--including SQL. Non-relational databases also have their own APIs, designed for maximum scalability and flexibility.

 

NoSQL databases are typically designed to excel in one specific area: speed. To do so, they will use techniques that will seem frightening to relational database users--such as not promising that all data is consistent within a system all of the time.

 

Because so much read and write activity is needed in a single relational database transaction, a relational database that could never keep up with the speed and scaling necessary to make a company like Amazon work as it does now. So what Amazon does with their proprietary non-relational Dynamo database is apply an "eventually consistent" approach to their data in order to gain speed and uptime for their system when a database server somewhere goes down.

 

Dynamo is part of a class of non-relational databases known as distributed key-value store (DKVS) databases. DKVS is one of five classes that comprise the topology of the NoSQL landscape, each with a different architecture and approach to managing data.

 

DKVS databases, also known as eventually consistent key-value store databases, are specifically designed to deal with data spread out over a large number of servers. These systems use distributed hash tables for their key-value stores, and because they're distributed, the database uses peer-to-peer relationships between servers, with no "master" control. Currently most of the databases in this class are Dynamo or Dynamo-based implementations of Dynamo, such as the open source Project Voldemort, Dynomite, and KAI databases.

 

Key-value store (KVS) databases are similar in architecture to DKVS, as the name would imply, where keys are mapped to values. Instead of being distributed across servers, data is held on disk or in RAM. Redis, an open source database that's currently being funded by VMware, is in the KVS family, as are the Berkeley DB and MemcacheDB databases.

 

Imagine, if you can, a single giant database table, with embedded tables of data found within. That gives you a fair mental picture of the architecture found within a column-oriented store. Google's BigTable is a well-known example of this class of NoSQL database, but the popular open source Cassandra, Cloudera, and Hadoop projects are in this class, too.

 

Some non-relational databases move away from the table/row/column methodology and store and sort entire documents' worth of data. There are the (predictably named) document-oriented store databases. Ramm's own project, mongoDB, is part of this class, using JSON documents as opposed to the more commonly used XML documents.

 

Finally, there is the graph-oriented store class of NoSQL database. Data is manipulated in an object-oriented architecture, using graphs to map keys, values, and their relationships to each other, instead of just tables. Neo4j is an open source database in this class, as are HyperGraphDB and Bigdata.

 

As you can see in this brief introduction, there is a lot of variation in the broad family of non-relational databases, which is really part of the point. By tailoring databases to specific tasks, non-relational databases can optimally perform with the speed that any web-based service has to have.


Viewing all articles
Browse latest Browse all 2

Latest Images

Trending Articles





Latest Images