Excerpt: MongoDB Backup/Restore

The examples given below make use of so called “ephemeral disk space”.

This means that disk space may become an issue if the database is very big and disk space is limited. The ephemeral disk space is located in /var/lib/kubelet and /var/lib/containers. If you have /var/lib placed on a separate disk, you can check the available space with the following command:

df -h /var/lib

Best practices:

Label files so that you can identify the contents of the backup as well as the point in time that the backup reflects. The examples below do this already.
Use an alternative backup strategy such as Filesystem Snapshots or Cloud Backups in MongoDB Atlas if the performance impact of mongodump and mongorestore is unacceptable for your use case.
To ensure mongodump can take a consistent backup of a replica set, you must either use the --oplog option to capture writes received during backup operations or stop all writes to the replica set for the duration of the backup.
For sharded cluster replica sets, see Back Up a Sharded Cluster with Database Dumps.
Ensure that your backups are usable by restoring them to a test MongoDB deployment.
To help reduce the likelihood of inconsistencies in a sharded cluster backup, you must stop the balancer, stop all write operations, and stop any schema transformations for the duration of the backup.

Backup MongoDB

Single node MongoDB

Open a shell to your MongoDB node.

To backup the MongoDB instance, run mongodump with the following command-line options:

mongodump --gzip --archive=/tmp/mongobackup_$(date "+%Y.%m.%d_%H.%M.%S").gz

After the backup has completed, there are several ways to retrieve the compressed backup file from the container. Here is an example for Kubernetes:

kubectl cp cdcm/cdcm-mongodb-0:/tmp/mongobackup_<date>.gz .

The <date> portion needs to be adapted to reflect the actual file.

Multi node MongoDB

Open a shell to your primary MongoDB node.

To backup the MongoDB instance, run mongodump with the following command-line options:

mongodump --oplog --gzip --archive=/tmp/mongobackup_$(date "+%Y.%m.%d_%H.%M.%S").gz

The --oplog option captures incoming write operations during the mongodump operation to ensure that the backups reflect a consistent data state.

After the backup has completed, there are several ways to retrieve the compressed file from the container. Here is an example for Kubernetes:

kubectl cp cdcm/cdcm-mongodb-0:/tmp/mongobackup_<date>.gz .

The <date> portion needs to be adapted to reflect the actual file.

Restore MongoDB

First, upload the backup to the target instance.

Make sure that you are using the correct context!

Once again, this example applies to Kubernetes:

kubectl cp <archive> cdcm-mongodb-0:/tmp/

To restore a single node MongoDB instance, run mongorestore with the following options:

mongorestore --gzip --archive=<path-to-archive>

Restoring a multi node MongoDB

If you used the --oplog option with mongodump, you need to run mongorestore with the --oplogReplay option:

mongorestore --gzip --oplogReplay --archive=<path-to-archive>

Please find the complete documentation at Back Up and Restore a Self-Managed Deployment with MongoDB Tools - Database Manual - MongoDB Docs

Troubleshooting

If a mongodump operation fails with an error message like the following
Failed: archive writer: error writing data for collection `oslc-codebeamer.cb-cache` to disk: error reading collection: (CursorNotFound) cursor id 1292809968534318070 not found / Mux ending but selectCases still open 4
it is usually due to the operation running against one of the slaves instead of the the master node.
In order to resolve this, make sure that all mongodump and mongorestore operations are running only against the master node.
This can be tricky to spot, because when you are running the operations against the Kubernetes service they may either work flawlessly or not, depending on which node you are being routed to.