Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Excerpt
nameExcerpt: MongoDB Backup/Restore
Note

The examples given below make use of so called “ephemeral disk space”.

This means that disk space may become an issue if the database is very big and disk space is limited. The ephemeral disk space is located in /var/lib/kubelet and /var/lib/containers. If you have /var/lib placed on a separate disk, you can check the available space with the following command:

Code Block
df -h /var/lib

Note

Best practices:

  • Label files so that you can identify the contents of the backup as well as the point in time that the backup reflects. The examples below do this already.

  • Use an alternative backup strategy such as Filesystem Snapshots or Cloud Backups in MongoDB Atlas if the performance impact of mongodump and mongorestore is unacceptable for your use case.

  • To ensure mongodump can take a consistent backup of a replica set, you must either use the --oplog option to capture writes received during backup operations or stop all writes to the replica set for the duration of the backup.

    For sharded cluster replica sets, see Back Up a Sharded Cluster with Database Dumps.

  • Ensure that your backups are usable by restoring them to a test MongoDB deployment.

  • To help reduce the likelihood of inconsistencies in a sharded cluster backup, you must stop the balancer, stop all write operations, and stop any schema transformations for the duration of the backup.

Backup MongoDB

Single node MongoDB

Open a shell to your MongoDB node.

To backup the MongoDB instance, run mongodump with the following command-line options:

Code Block
mongodump --gzip --archive=/tmp/mongobackup_$(date "+%Y.%m.%d_%H.%M.%S").gz

After the backup has completed, there are several ways to retrieve the compressed backup file from the container. Here is an example for Kubernetes:

Code Block
kubectl cp cdcm/cdcm-mongodb-0:/tmp/mongobackup_<date>.gz .

The <date> portion needs to be adapted to reflect the actual file.

Multi node MongoDB

Open a shell to your primary MongoDB node.

To backup the MongoDB instance, run mongodump with the following command-line options:

Code Block
mongodump --oplog --gzip --archive=/tmp/mongobackup_$(date "+%Y.%m.%d_%H.%M.%S").gz

The --oplog option captures incoming write operations during the mongodump operation to ensure that the backups reflect a consistent data state.

After the backup has completed, there are several ways to retrieve the compressed file from the container. Here is an example for Kubernetes:

Code Block
kubectl cp cdcm/cdcm-mongodb-0:/tmp/mongobackup_<date>.gz .

The <date> portion needs to be adapted to reflect the actual file.

Restore MongoDB

First, upload the backup to the target instance.

Note

Make sure that you are using the correct context!

Once again, this example applies to Kubernetes:

Code Block
kubectl cp <archive> cdcm-mongodb-0:/tmp/

To restore a single node MongoDB instance, run mongorestore with the following options:

Code Block
mongorestore --gzip --archive=<path-to-archive>

Restoring a multi node MongoDB

If you used the --oplog option with mongodump, you need to run mongorestore with the --oplogReplay option:

Code Block
mongorestore --gzip --oplogReplay --archive=<path-to-archive>

Please find the complete documentation at https://www.mongodb.com/docs/manual/tutorial/backup-and-restore-tools/

Troubleshooting

  • If a mongodump operation fails with an error message like the following
    Failed: archive writer: error writing data for collection `oslc-codebeamer.cb-cache` to disk: error reading collection: (CursorNotFound) cursor id 1292809968534318070 not found / Mux ending but selectCases still open 4
    it is usually due to the operation running against one of the slaves instead of the the master node.
    In order to resolve this, make sure that all mongodump and mongorestore operations are running only against the master node.
    This can be tricky to spot, because when you are running the operations against the Kubernetes service they may either work flawlessly or not, depending on which node you are being routed to.