MongoDB: Capped Collections & Time-To-Live Index

Photo by Rubaitul Azad on Unsplash

MongoDB: Capped Collections & Time-To-Live Index

Using Capped Collection to Reduce Memory/Storage Requirements and Increase Read/ Write Capacity

Sofwan A. Lawal's photo
Sofwan A. Lawal
·Aug 22, 2022·

6 min read

Subscribe to my newsletter and never miss my upcoming articles

Play this article

Table of contents

When you use MongoDB in a production environment, you may eventually need to lower the amount of memory your database needs or increase how much it can read and write. The good news is that MongoDB has several ways to deal with these problems. One of them is putting a limit on the size of your data by putting a cap on size of your collections. A capped collection is a special kind of MongoDB collection that can only hold up to a certain number of documents. This blog post will explain why capped collections are so useful and how they can be used with MongoDB.

Introduction to Capped Collection

A capped collection is a feature of MongoDB that lets you set a limit on how big the collection can get. This can be used to lower your database's need for memory or to make sure that your database can keep up with the ever-growing need for read/write capacity. Also, capped collections are helpful when you have to follow rules about how long you can keep data. Capped collections are the same as regular collections in MongoDB, but they have some extra options.

Introduction to TTL Indexes

Time-to-live (TTL) indexes are special single-field indexes that MongoDB can use to automatically remove documents from a collection after a certain amount of time or at a certain clock time. Data expiration is useful for some kinds of information that only need to stay in a database for a certain amount of time, like machine-generated event data, logs (which we'll look at later in this article), and session information.

Why is a capped collection useful?

There are several scenarios in which you might want to use a capped collection.

  • You want to preserve the insertion order, so the most recently inserted documents can be retrieved efficiently.
  • Cache, We can store some cache in MongoDB and then build the index for documents, since we mostly read from the cache and rarely write to it.
  • You want to store a large amount of data that is not needed all the time. In such cases, you can store the data in a capped collection to be deleted when it is no longer needed.
  • You have to adhere to certain data retention policies. If this is the case, you can use a capped collection and set the maximum size of the collection.

Why is the TTL index useful?

There are several scenarios in which you might want to use a TTL index on a collection.

  • You want to expire users' session at a specific time after login
  • You want to auto-delete data which isn't useful after a certain period of time
  • You want to prevent collection size from growing really large

How to implement a capped collection?

Let’s say you want to store all the request logs to your API endpoint in a MongoDB database. The data structures of the logs are very simple - each log has an endpoint, the date, the request method, and the response code. In addition, you want to store the logs for a maximum of 10 years in the database. The logs are stored as a single document per request. You could store the logs in a normal collection.

Here is an example of a normal collection, that stores these logs using Typescript with Moongoose (an ODM which provides a straightforward, schema-based solution to model your application data).

import mongoose from "mongoose";

const apiLogSchema = new mongoose.Schema(
  {
    endpoint: {
      type: String,
      required: true,
      immutable: true
    },
    method: {
      type: String,
      required: true,
      immutable: true
    },
    status: {
      type: Number,
      required: true,
      immutable: true
    },
    createdAt: {
      type: Date,
      required: true,
      default: () => new Date(),
      immutable: true,
      index: true
    },
  }, {
    // options
  }
)

const ApiLog = mongoose.model('ApiLog', apiLogSchema)
export default ApiLog

But you don't know how many API requests you'll get per year in advance. If you get more customers than you expected, your database will grow, and you may have trouble with how well it works or how much it can hold. So, a better way to store the logs would be in a collection with a cap size. Then you can tell the collection how big it can get by setting the maximum size. When the collection has as many documents as it can hold, older ones are automatically deleted.

Limiting the size of documents in a collection

In a capped collection, you set the highest bytes that a document can have. If this number is reached, the document is taken out of the collection automatically. This keeps the amount of storage space from getting out of hand, which can slow down the computer. Extending the normal collection above, we can add a capped option as shown below

const apiLogSchema = new mongoose.Schema({
  ...
  createdAt: {
    type: Date,
    required: true,
    default: () => new Date(),
    immutable: true,
    index: true
  },
},  {
  capped: {
    size: 1024 * 1024 * 1024, // 1GB Maximum size
    autoIndexId: true
  }
});

Limiting the number of documents in a collection

You can also limit the number of documents in a MongoDB collection. This can be done by setting the max property for the capped options as shown below. If this threshold is reached, the document is automatically removed from the collection.

const apiLogSchema = new mongoose.Schema({
  ...
  , {
  capped: {
    size: 1024 * 1024 * 1024, // The size is always important
    max: 1_000_000, // Maximum of 1million records
    autoIndexId: true
  }
});

const model = mongoose.model('Model', schema);
export default model;

How to implement TTL Indexes on a collection?

To create a TTL index, use the db.collection.createIndex() method with the expireAfterSeconds option on a field whose value is either a date or an array that contains date values.

const apiLogSchema = new mongoose.Schema({
  ....
  createdAt: {
      type: Date,
      required: true,
      default: () => new Date(),
      immutable: true,
      index: true
    },
});

// Optionally deleting logs after a number of seconds has elapsed
apiLogSchema.index({ createdAt: 1 }, { expireAfterSeconds: 60 * 60 * 24 * 365 }) // 1 year

// Optionally deleting logs at a specified time (with an expiredAt field defined)
apiLogSchema.index({ expireAt: 1 }, { expireAfterSeconds: 0 }) // Watch for date in expireAt

Conclusion

MongoDB offers a variety of ways to deal with problems such as the volume of data and the time it takes for data to become obsolete. One of these is the implementation of capped collections, which will restrict the maximum amount of space that can be occupied by your data. Another is the implementation of the TTL index, which will remove data at a predetermined time or after a predetermined amount of time has elapsed. You can limit the amount of memory and storage space that your database needs by using capped collections to either save only the most recent data, store a huge quantity of data that is often updated, or both.

 
Share this