How to Benchmark MongoDB with YCSB Workloads on All-Flash Block Storage?
Audio : Listen to This Blog.
Introduction:
MongoDB is a very popular open-source scale-out NoSQL database. It powers modern big data analytics applications that require low latency read and writes, high availability, and advanced data management.
Usecases:
Key use cases for MongoDB includes real-time analytics, product catalogs, content management, and mobile applications.
The storage solution with MongoDB offers the following key benefits:
- Scalability provided by the storage array, which can modularly scale storage with MongoDB
- Non-Disruptive Operations with high availability to deliver maximum uptime and stable performance, specifically during drive failure.
- Predictable high performance with low latency, providing excellent response time for the most demanding analytics applications built on MongoDB.
Summary:
This reference document showcases a validated end-to-end solution design for efficiently deploying a virtualized scale-out MongoDB NoSQL database on the block storage using VMware vSphere Hypervisor Environment.
Note: Storage array hosts MongoDB virtual machines (VMs) and databases.
MongoDB Architecture:
Sharding divides the dataset and distributes the data over multiple servers, or shards. Each shard is an independent database. Collectively, the shards make up a single logical database.
Shard stores data. In a production shard cluster, each shard own the responsibility of maintaining a replica set. This strategy facilitates high availability and data consistency.
Storage Architecture:
4 Key Steps to Remember before Benchmarking MongoDB with YCSB Workloads on All-Flash Block Storage
1.Server Configurations
- 2 Fibre Channel 16Gbps ports.
- ESXi 6.7 Installed on hosts.
- Fibre Channel switch connectivity.
2.Storage controller Configuration
- 2-node HA pair.
- 4 Fibre Channel 16Gbps ports.
- SSD Drives.
3.Configurations on ESXi Host
1.Created three volumes of each 500G on storage controller.
2.Mapped the three volumes to 3-node ESXi hosts from storage controller.
3.Created 3 VMFS6 Datastores on hosts (i.e one storage volume created as one VMFS6 Datastore)
- Created 8 Virtual Machines with 2vdisks (i.e 8 VMs resides in 3 Datastores).
- Size of vdisk1-16G Thin Prov (where RHEL OS is installed), vdisk2-500G Thin Prov (used as DB – Data disk)
- Login to all 8 VMs, created partition on 500G disk (added as vdisk2), mount it on the hosts.
[root@localhost opt]# fdisk /dev/sdb [root@localhost opt]# mkfs -t ext4 /dev/sdb mke2fs 1.42.9 (28-Dec-2013) [root@localhost ~]# mkdir -p /data/db1; mount /dev/sdb /data/db1 [root@localhost ~]# cat /etc/fstab | grep sdb /dev/sdb1 /data/db1 ext4 defaults 0 0
4.Configuring MongoDB on VMs:
Prerequisites:
3 RHEL 7 VMs as Config Replica Sets 10.20.178.30 configsvr1 10.20.178.92 configsvr2 10.20.178.96 configsvr3 4 RHEL 7 VMs as Shard Replica Sets 10.20.178.93 shardsvr1 10.20.178.95 shardsvr2 10.20.178.94 shardsvr3 10.20.178.46 shardsvr4 1 RHEL 7 VMs as mongos/Query Router 10.20.178.91 mongos
Step1: Edit the hosts file on each VMs,
[root@configsvr1 ~]# cat /etc/hosts 10.20.178.30 configsvr1 10.20.178.92 configsvr2 10.20.178.96 configsvr3 10.20.178.93 shardsvr1 10.20.178.95 shardsvr2 10.20.178.94 shardsvr3 10.20.178.46 shardsvr4 10.20.178.91 mongos
Step2: Install MongoDB on all instances/VM’s
[root@configsvr1 ~]# mongod --version db version v4.0.9 git version: fc525e2d9b0e4bceff5c2201457e564362909765 OpenSSL version: OpenSSL 1.0.1e-fips 11 Feb 2013 allocator: tcmalloc modules: none build environment: distmod: rhel70 distarch: x86_64 target_arch: x86_64
Step3: Create Config Server Replica Set
Change the DB storage path to your own directory in /etc/mongod.conf. We will use '/data/db1' on all the VM’s, storage: dbPath: /data/db1
Step4: Create the Shard Replica Sets
Change the default storage to your specific directory in /etc/mongod.conf. storage: dbPath: /data/db1
Step 5 – Configure mongos/Query Router
mongos --configdb "replconfig01/configsvr1:27017,configsvr2:27017,configsvr3:27017"
Step 6 – Add shards to mongos/Query Router
mongo --host mongos --port 27017 sh.addShard( "shardreplica01/shardsvr1:27017") sh.addShard( "shardreplica01/shardsvr2:27017") sh.addShard( "shardreplica01/shardsvr3:27017") sh.addShard( "shardreplica01/shardsvr4:27017") mongos> sh.status() --- Sharding Status --- sharding version: { "_id" : 1, "minCompatibleVersion" : 5, "currentVersion" : 6, "clusterId" : ObjectId("5cb8d38dd5a157f0f404f31f") } shards: { "_id" : "shardreplica01", "host" : "shardreplica01/shardsvr1:27017,shardsvr2:27017,shardsvr3:27017,shardsvr4:27017", "state" : 1 } active mongoses: "4.0.9" : 1 autosplit: Currently enabled: yes balancer: Currently enabled: yes Currently running: no Failed balancer rounds in last 5 attempts: 0 Migration Results for the last 24 hours: No recent migrations databases: { "_id" : "config", "primary" : "config", "partitioned" : true }
Performance Validation with YCSB:
YCSB is a popular Java open-source specification and program suite developed at Yahoo! to compare the relative performance of various NoSQL databases. Its workloads are used in various comparative studies of NoSQL databases
Workloads Used: A, B, C.
- Workload A: Update heavy workload: 50/50% Mix of Reads/Writes
- Workload B: Read mostly workload: 95/5% Mix of Reads/Writes
- Workload C: Read-only: 100% reads.
The following command was used to run workload A,B & C where threads were 8, 16, 32, and 64:
/root/YCSB/bin/ycsb run mongodb -s -P /root/YCSB/workloads/workloada -p mongodb.url="mongodb://mongos:27017/ycsb1" -p operationcount=120000000 -p maxexecutiontime=1800 -threads 8 /root/YCSB/bin/ycsb run mongodb -s -P /root/YCSB/workloads/workloadb -p mongodb.url="mongodb://mongos:27017/ycsb1" -p operationcount=120000000 -p maxexecutiontime=1800 -threads 8 /root/YCSB/bin/ycsb run mongodb -s -P /root/YCSB/workloads/workloadc -p mongodb.url="mongodb://mongos:27017/ycsb1" -p operationcount=120000000 -p maxexecutiontime=1800 -threads 8 /root/YCSB/bin/ycsb run mongodb -s -P /root/YCSB/workloads/workloadd -p mongodb.url="mongodb://mongos:27017/ycsb1" -p operationcount=120000000 -p maxexecutiontime=1800 -threads 8
References:
https://www.howtoforge.com/tutorial/deploying-mongodb-sharded-cluster-on-centos-7/
https://github.com/brianfrankcooper/YCSB/wiki/Running-a-Workload
https://www.netapp.com/us/media/tr-4600.pdf