Welcome back to Database Deep Dives, where we get one-on-one with engineers, builders, and leaders in the database space. Recently, we caught up with Richard Kreuter from MongoDB.
Read the interview below to learn about the future direction of Atlas spanning multiple clouds, how they are moving from just being a database to being a data platform with Atlas Data Lake, and how they built and delivered transactions in a NoSQL data store.
Tell us a little about yourself and what you are working on today?
Richard Kreuter (RK): I’m Richard Kreuter, MongoDB’s SVP of Field Engineering. I’m responsible for a number of our customer- and partner-facing teams—our solution architects, business value consultants, project managers, consulting engineers, customer success managers, and technical architects that support our partners.
How did you get involved with MongoDB?
(RK): I first became aware of MongoDB in November of 2009. I’m a Software Engineer by background, and I had worked on projects in the previous decade that had really needed a more flexible database than what was available in the market. When I first saw MongoDB, I thought, “Wow, I wish I had that for previous projects.”
I applied for a role and joined the company in January of 2010 to work on our software products. When I joined the company, we were less than 10 people, and now, we are a company of more than 1,500 serving more than 14,000 customers.
We went public in October of 2017—the first database company to reach that particular milestone in 25 years at that point. We’ve had a few corporate acquisitions along the way that have substantially accelerated our growth, and, in the last couple of years, we have expanded our offerings beyond the core database to more of a platform of different products that span several different aspects of data management and data lifecycle.
With the success of Atlas and a broader ecosystem of supporting app dev services like Stitch and Charts, where you do you see MongoDB going in the next 5-10 years?
(RK): We are working on fleshing out the Intelligent Data Platform, which is an integrated set of products and capabilities that provide users with the best way to work with data via MongoDB’s document model. Documents—flexible, JSON-inspired documents— are much more easy, natural, versatile, and performant than the traditional rigidly structured way of working with data that many are familiar with.
We have capabilities today in the core database and in other products that we are continually building out to enable customers to place their data strategically across the globe. That can mean keeping data closer to large user populations in order to offer a lower-latency experience for those users, or in order to comply with regulatory requirements by keeping data inside national and other geographic boundaries.
We’ll also remain committed to offering customers the freedom to run MongoDB wherever they need to. We have versions of MongoDB that can run on mobile devices, on standard server grade hardware, cloud instances, IBM Mainframes, and other sorts of hardware. That capability to move workloads and to run MongoDB software in a lot of different environments is a key reason that people adopt MongoDB.
Of course, in the big picture, people are all contending with the macro transition to the cloud. But, different sorts of workloads and environments, countries, industries, and so forth are still being run on-premises for the foreseeable future. MongoDB’s platform is one that runs substantially the same whether you are running it yourself in your own data center, whether we are operating it for you in the public cloud via MongoDB Atlas, and all points in between.
As part of that larger story, we are broadening the overall set of capabilities that our platform offers. For example, we recently announced a new product—the Atlas Data Lake—which is a way to leverage data that is stored in object stores, such as S3, available in the cloud.
Atlas Data Lake brings the full capacity of the MongoDB query language, which is a very powerful and rich query language that people have been enjoying in an operational database context for many years now, and brings that capability to data in object stores. As people are storing large volumes of data in S3, much of that data tends to be stored in common formats such as JSON, Comma Separated Values, or whathaveyou. We can leverage the MongoDB Query language, which is well suited to semi-structured and hierarchical data like JSON, to be able to get the full benefit of information that is stored in those S3 buckets.
We also recently acquired a mobile database company, Realm, which has a very MongoDB-like, flexible way of working with data in mobile devices. As MongoDB has grown, we’ve heard from users and customers that the challenges MongoDB solved at the database level still exist in other areas. MongoDB wants to bring its mission of giving developers the best way to work with data to more of the data ecosystem.
Let’s talk multi-document transaction—why was it needed and how did the business go about delivering the feature?
(RK): MongoDB always had ACID transactional capabilities at the individual document level. If you are modelling all the data about me, Richard, as a customer in your business, you are probably going to store most of the information you have about me inside of a single document. As that document changes from one state to the next, we’ve always had ACID transactions at that single-document level.
But, in order to future-proof our customers’ applications when they aren’t sure what future requirements will be, multi-document transactions give a guarantee that even if the requirements for their application should evolve over time, customers won’t somehow reach the limits of what MongoDB can do for them. MongoDB is able to wrap up multiple operations across multiple collections and documents inside of a single transaction.
That was the business analysis of why we wanted to go in this direction—to provide full, traditional, conversational transactions where an application can do sort of arbitrary things inside the scope of the transaction and not have to, for example, pre-define which operations can go inside of a transactional scope or restrict what operations can be performed inside of a transaction.
The technical needs for multi-document transactions began with MongoDB’s first acquisition, a database storage engine called WiredTiger, which was founded by the folks who created the BerkeleyDB embedded database—one of the world’s most popular database engines.
The WiredTiger storage engine, which has been the default storage engine since MongoDB 3.2 some years back, actually already supported the underlying capability of multi-record transactions. Our engineers then threaded that capability upwards, so to speak, through MongoDB’s query language, replication protocol, sharding architecture, so that MongoDB applications could take advantage of that capability that the underlying storage engine provided.
Today, we have customers that are beginning to leverage these transactions in really quite advanced ways. In that respect, it makes it a bit easier have the mindset to go from a traditional, tabular database to MongoDB.
Where do you think there is room for improvement in the Mongo stack?
(RK): The work we are doing to expand our multi-cloud capabilities with MongoDB Atlas is perhaps the thing we are working on today that is resonating the most with our largest customers.
Today, if you want to spin up a MongoDB Atlas deployment, you have to pick a particular cloud like IBM Cloud, AWS, Azure, or GCP. Each individual MongoDB deployment can span multiple regions in each of those clouds but can’t span multiple clouds.
We are working towards the ability to have a single MongoDB deployment span distinct cloud providers, giving customers the ability to leverage the best-in-breed technologies that each of the different clouds provide. So, if they want to take advantage of some capabilities that are particular to Amazon, they can do that and have those capabilities read and write their data in MongoDB. When we provide cross-cloud clusters, the same cluster under the same management domain and under the same user permissions and access control roles will also be able to exist via replication in other clouds, so that users can take advantage of technologies that are available in Azure or GCP.
What advice would you have for people running MongoDB themselves?
(RK): Firstly, if you are running MongoDB yourself, you would be remiss not to look at the capabilities that MongoDB Atlas can provide. Atlas is the easiest way to get the benefits of MongoDB because, frankly, we will run it for you. So, you get back whatever resources you were putting into tasks and upgrade operations, security operations, and so forth. It’s a way for your team to be more productive, faster, all while having the peace of mind that the experts that built MongoDB are running operations.
If you are running MongoDB yourself today in some on-prem situation or other self-managed situation, you should take a look at MongoDB’s management tools. We have a couple of different management suites—one of them called MongoDB Cloud Manager—which can orchestrate, run, monitor, and backup MongoDB in a cloud environment.
We also have a packaged up, on-prem version of the same capabilities called MongoDB Ops Manager, which again, is the most complete suite of management tools for running MongoDB comprising all the capabilities you would need around orchestration, upgrade, maintenance tasks, monitoring and alerting.
There is a large, large ecosystem of MongoDB experts that can be a valuable resource as well. We have had over 75 million downloads of our core database and over a million registrations for our online educational platform, MongoDB University. There are very supportive forums such as Google groups for user support as well as in Stack Overflow for other Q&A around MongoDB technology. Of course, at MongoDB, we have a large number of MongoDB experts who are available to assist you in various ways if you are running MongoDB yourself and you need additional assistance.
Learn more
Thanks to Richard for joining us.
If you want to provision a fully managed MongoDB instance on the IBM Cloud, you can check out the following:
If you want to read other Database Deep Dives, check out our past interviews:
- Database Deep Dives: PostgreSQL
- Database Deep Dives: CouchDB
- Database Deep Dives: PingCAP and TiKV
- Database Deep Dives: JanusGraph
- MongoDB: An Essential Guide
Thanks to Emily Hu for her assistance on this piece.