RavenDB vs MongoDB
RavenDB is an open source Non-relational document database first developed in 2009 that specializes in online transaction processing (OLTP) and is fully transactional both across the database and throughout the cluster. Features include high availability, high performance, zero administration and self-optimization, resulting in a database system trusted and employed by a global client base including Fortune 100 companies spanning several continents. RavenDB is deployed in a variety of systems both on-prem and in the cloud using the RavenDB Cloud DBaaS Managed Service starting from simple single server deployments to geo distributed clusters, including a satisfied client with over 1.5 million instances on Point of Sale (PoS) machines deployed worldwide. RavenDB is designed as an easy to use all-in-one database, striving to minimize the need for third party applications, tools, or support.
MongoDB is offered since 2009 as an open source database. MongoDB takes its name from “humongous”, referring to its intended usage for storing large amount of data. MongoDB is used in applications ranging from simple TODO apps to critical business systems and is widely used and known in the Non-Relational community. They became fully transactional in 2018.
1. Maintaining Data Integrity
How well does each database preserve data integrity, preventing data corruption or loss?
RavenDB is a fully transactional non-relational database. It supports ACID transactions both throughout your database and across your database cluster so your data is safe. You can modify multiple documents in a single transaction and be assured that all changes will be persisted to disk or all of them will be rolled back.
RavenDB can ensure that your transaction boundaries will be kept even when the data is replicated among the different nodes in a cluster. Transactions can be configured to span the whole cluster for strong consistency or use mutli-master mode for high availability.
This is very helpful for hybrid solutions using a document database in conjunction with a relational SQL solution. You can maintain ACIDity throughout your current data architecture while enjoying the ability to rapidly scale up with a Non-Relational solution. You can enjoy the speed, agility, and performance of a Document Database solution while keeping the data guarantees of the traditional database model.
RavenDB has been transactional for over a decade, finding ways to constantly improve your performance within the ACID framework so you don’t have to pay with performance for greater data integrity.
MongoDB became a transactional database in 2018. After 10 years of being ACID only over a single document, MongoDB announced its ACID compliance over multiple documents. Their challenge will be to improve upon their new ACID guarantees without sacrificing performance. Going ACID with MongoDB comes with performance costs.
2. Querying & Aggregating Data
How fast can you query data? How do you get aggregation results?
All queries made with RavenDB will always use an index. You never have to fear a table scan or an unoptimized query plan grinding your business to a halt.
When you make a query, the query optimizer will recognize when a query cannot be answered using an index and will update the database index configuration on the fly. As you make more queries, you are feeding the query optimizer to make better decisions to retrieve your data. Over time, the query optimizer will be able to produce the exact set of indexes that you need in order to answer your queries quickly and efficiently.
You don’t need to have a highly trained administrator constantly monitor and adjust the database indexes to achieve great performance. RavenDB is already doing that for you.
This is most noticeable when you consider aggregation queries, which are handled using RavenDB’s native MapReduce features. When an aggregation query is sent to RavenDB, the query optimizer will create a MapReduce index to answer it if one doesn’t exist already.
Latency is obliterated as query results come faster because RavenDB is not required to comb your database all the time for results. Once an index is set up, query times drop by over 99.9%, using precomputed results that are kept current for you behind the scenes. This not only saves you time, on the cloud it saves you money. Aggregation queries in RavenDB are very cheap and require almost no work from developers or admins to get things working. As a native part of RavenDB, you do not need third party components to perform aggregates.
MongoDB supports dynamic queries, but it has no concept of auto indexing or learning from production usage profiles. As such, unless the administrator has taken steps to define indexes ahead of time, queries will scan the entire database and filter results on document each time.
This typically results in acceptable performance initially, with small datasets, but quickly causes performance deterioration as the size of the data grows. Creating indexes ahead of time resolves this issue, but creating indexes after the fact is a complex and delicate process.
Indexing in MongoDB is single purpose. If you indexed the fields LastName and First name and you need to search by the FirstName and LastName prefix, you’ll need a new index.
For aggregation support, MongoDB provides both MapReduce queries and aggregation pipeline queries. These are more complex than simply using a GROUP BY statement, with MapReduce being more flexible and the aggregation pipeline being faster. In both cases, MongoDB needs to evaluate all the matching documents and then compute the final total. This happens on every query, resulting in workarounds such as spilling the results of a query to a temporary collection and refreshing that on a routine schedule.
In such cases, you need to schedule refreshing the results during off hours and manage it all manually. You need to spend time and effort about the freshness of the results and the cost of refreshing the query.
How much “new stuff” do you need to learn in order to query each database?
RavenDB uses the Raven Query Language (RQL) for queries. RQL is the SQL of the Document Database. Like SQL, it is built to be user friendly for developers and non-developers alike. RQL gives you an intuitive way to query the database, project results, and work with documents in RavenDB.
If you have any experience with SQL, you can understand the RQL syntax easily. Learning to write queries in RQL with knowledge of SQL is like being a tennis pro and having to learn racquetball.
Let’s use a simple query to aggregate results from a collection of Zip Code statistical data and get the states that have more than 10 million residents and their population.
Here is how you would need to write your query using SQL, RavenDB, and MongoDB:
With MonogDB, you need to have a DBA go over all queries and ensure that they don’t put too much load on the server. You also need to define indexes ahead of time and re-validate your configuration on each deployment of your software.
All of this is handled automatically by RavenDB behind the scenes, as part of the notion that you should have as close to a zero-admin unattended database experience as possible in your application stack.
How fast can each database process your data? How well does each handle Enterprise Level load?
RavenDB is able to reach over 150,000 writes per second per node on commodity hardware (machines selling for less than $1,000), and exceeds 1 million reads per second with sustained low latency throughput. RavenDB enjoys single digit millisecond performance right up until you hit the limits of your hardware.
The RavenDB team keeps improving the database’s performance from version to version, sustaining its rich and complex functionality along with ACID guarantees both database and clusterwide. In its latest version, RavenDB has overhauled most of the database and introduced significant performance improvements to make handling Big Data a small challenge.
Using RavenDB on the cloud whether in your own cloud, as a service with RavenDB Cloud, or in a hybrid architecture will save you time, resources, and money.
Voron, RavenDB’s tailor made Persistent Key/Value storage engine was built from the ground up specifically for RavenDB 4.0. When a new feature comes out for RavenDB, the resources that support it are managed by the same team, making it easier to identify and build the optimal set of tools. We aim to localize the number of components and tools available to you in one place, making RavenDB as easy and powerful to use as possible.
The standard performance test for RavenDB is to load the entire Stack Overflow dataset. This includes tens of millions of questions totaling over 50 GB of data. Using RavenDB, it currently takes less than five minutes to load, and our team keeps improving the performance.
MongoDB supports CRUD operations, simple updates, simple indexing, and both simple and aggregate queries at peak performance speeds, just like RavenDB. For non-concurrent, plain write oriented tasks with no indexing MongoDB is typically faster.
However, there are many operations that MongoDB does not support: Pre-computed map-reduce operations, complex patch operations, and non-trivial indexes and queries. As the features you will need become more advanced, the performance costs in MongoDB rise significantly.
MongoDB lags behind RavenDB in performance for aggregation queries, transactions support, and real data integrity. RavenDB is also able to handle all your writes in a transactional manner with a speed that can match your needs.
How does each database’s processing method push performance to the max?
The RavenDB native format is called blittable JSON, a zero-overhead format for storing and processing JSON data. The RavenDB team developed the blittable format to allow processing documents efficiently. To make this happen, we restructured how we save things to memory and to disk to make the process of reading a document dirt cheap. This is one of the advantages of creating an all in one database where every component is custom made to work in tandem with one another to maximize performance.
The blittable format allows RavenDB to avoid deserialization of JSON objects when reading them from persistent storage. This saves lots of memory and CPU – especially on the cloud.
Blittable can process data without having to first parse the entire document into its object form. This reduces the costs of most operations in RavenDB significantly. The blittable format was designed to take advantage of the way Voron works to streamline document processing and significantly simplify the amount of work RavenDB needs to do.
For example, instead of writing our own caching subsystems, RavenDB was designed to take advantage of the operating system’s own page cache and access documents in such a way as to make optimal usage of the kernel’s behavior. The kernel has more information about the state of the whole system, which allows RavenDB to be a better team player and share the resources of the system, instead of hogging them all.
MongoDB uses a format called BSON (Binary JSON) to store documents. Processing BSON is somewhat easier for a computer than processing of JSON textual data, but it is still a format that requires deserializing documents whenever you load them, increasing both memory and CPU usage.
RavenDB’s blittable format means that it can just access the documents directly in the operating system’s page cache, MongoDB uses a separate cache (in addition to the page cache) and needs to deserialize the BSON documents whenever it accesses them.
4. Scaling Out Your Database
How does each database maintain high-availability?
RavenDB makes it easy to set up a cluster of multiple servers to act as nodes for your database. As part of your free license, you can set up a cluster with up to 3 nodes. Setting up a cluster is as simple as point and click in the RavenDB studio, and it’s even easier with RavenDB Cloud, our Database as a Service. The cluster takes care of all the details of replicating data between nodes, ensuring sufficient copies of your data are kept, and dynamic load balancing and failover between the nodes in the cluster.
If one node failed, other nodes will continue to operate and your users will have continuous access to your database. Once the faulted node is up again, one of the other nodes will replicate the most current state of data to it, keeping your information highly available with multiple copies.
When running in a cluster, RavenDB uses the multi-master model. When you make a write to any node in the cluster, that write will be accepted and then replicated to the rest of the cluster. RavenDB’s multi-master node handles failure more gracefully because each node writes independently of the rest of the cluster and there is no period of unavailability if the leader of the cluster fails.
RavenDB also has a monitoring dashboard that highlights nodes that have gone down, and tells you what the root cause of the problem is, enabling you to perform maintenance on your system faster and more efficiently. Assignment failover makes sure that all outstanding tasks assigned to a downed node are evenly redistributed to the working nodes in your cluster.
MongoDB has a slew of processes with different purposes that you need to calibrate on your own to set up a cluster which will maintain high availability for the database. You have to configure the servers, query routers, and shards. This increases the deployment and management complexity of the cluster. In some cases, to reconfigure settings in your cluster, you have to shut down every node and individually reset the settings.
MongoDB uses the master-slave replication process where the data is written to a single node and then that node will write to the other nodes in the cluster. This can create a single chokepoint in the data architecture and a node failure can stall the entire system while a different node is selected as the new master.
To replicate, MongoDB uses the OpLog. It receives the operations needed to be executed by the slave nodes in order to replicate the data to the master state. If a failure occurs and there are enough writes to fill the OpLog, this can put your cluster into a permanently bad state and require an admin intervention to recover.