Sync MongoDB to Elastic Search with Monstache

Hi Everyone!

I have read about ElasticSearch and MongoDB in the last couple of weeks. Even people in my telegram group taking about sync the MongoDB data to ElasticSearch. So, I decided to write about Sync MongoDB data to ElasticSearch.

Scenario: In a Company, you are uploading and processing a lot of data to MongoDB and you want to know what type of data you process and visualize the changes, etc.

To Achieve those needs, I am gonna use ElasticSearch and Kibana to query and visualize the data from MongoDB. I use Monstahce to sync data from MongoDB (MongoDB Atlas) and I am gonna use AWS ElasticSearch to represent data.

Please refer to the planned architecture and you will get the big picture.

Planned Architecture

From the Above Planned Architecture,

Elastic Search (ESS)

Used as View and Query the Database. It Deployed in Private Subnets. Only be accessed by the Jump server.

Kibana

Kibana is deployed on top of ElasticSearch to visualize the data. Kibana has secured a login mechanism using a Cognito. So, user login who wish to log in to Kibana must have a valid login Credential.

Jump Server

It is a public-facing instance that can be used to RDP into the Tunnel Server to access Kibana.

Master Node

It is a private instance used to synchronize the MongoDB using Monstache Deamon.

Monstache

This is open-source software that is deployed in the Master Node.

Monstache is a sync daemon written in Go that continuously indexes your MongoDB collections into Elasticsearch. Monstache gives you the ability to use Elasticsearch to do complex searches and aggregations of your MongoDB data and easily build realtime Kibana visualizations and dashboards.

VPC Peering

Used to peer the two VPC in which MongoDB Atlas and Elastic Search VPCs have peered.

Note: In this tutorial, I will only be talking about how to deploy AWS ElasticSearch, AWS Cognito, and Configure Monstache.
I won't explain about building architecture.

Let's dive in,

Step 01: Configure Cognito for ElasticSearch.

Before deploying ElasticSearch, Cognito has to be deployed to have an authentication mechanism for Kibana access. For the deployment, you must fulfill several prerequisites.

Amazon Cognito Authentication for Kibana requires the following resources:

Amazon Cognito user pool
Amazon Cognito identity pool

Configure Cognito User Pool

Navigate to Amazon Cognito → Click on the manage user pool and Create a user pool.

At this point, provide a user pool as your desire.

Note: Selecting the attribute and policy is a very important concern to have better authentication.

In attribute selection, Select Username as a signing option and select attributes like name, birthdate, phone number, and email.

In the policy section, Select password strength, only administrators allow changing the password and temporary password expiration time.

Pool Domain is to redirect users to the login page for kibana. It is important to create the domain name for your user pool.

Click save changes and deploy your user pool.

Creating User in User pool in Cognito

Now you need to add a user group and a user and modify the IAM policy to sign in to Kibana. Navigate to the Amazon Cognito console, choose Users and groups. Then choose to Create a user.

Identity Pool Navigate to Amazon Cognito console, choose Manage Identity Pools, and then choose to create a new identity pool.

At this point, Click Unauthenticated Identities and click create user Identity pool now. Once ElasticSearch deployed change to authentication provides.

Once you click create pool it will go to the IAM page as Cognito identity requires to access your resources. So it will create a role for you. Do not modify anything in the role section, just click on allow.

Deploy ElasticSearch Domain

Navigate to the ES console and click Create a new domain. Select Deployment type and ElasticSearch version (7.4).

Under the Domain configuration tab, Provide ElasticSearch name and Data Instance (2 nodes, t2 medium instance) Note: Only create worker node if necessary (in my case, I don’t need it).

Storage

You can choose between EBS and instance stores, depending on which data node type you picked. Instance store is the more performant option, because there is no network overhead when reading from disk, and you aren’t limited on IOPS. EBS generally makes sense for high storage and low access requirements, since it is slower but about half the cost of instance stores.

Network configuration

As we planned, we need to deploy our ElasticSearch domain to a private subnet. In-network Configuration, check the VPC access button.

Note: Provide the VPC id, Subnet and Security Group

Security Group:- HTTP,HTTPS (Optional 8080).

Kibana authentication

Select the User pool and Identity pool as you created earlier and click next to set the policy to access ElasticSearch Domain.

Policy for Kibana

Provide policy for Kibana to access the only Identity authenticated user pool only (identity authenticated role - authenticated provide role)
Note: The Policy created during the Identity pool creation.

{

"Version": "2012-10-17",

"Statement": [

{

"Effect": "Allow",

"Principal": {

"AWS": [

"arn:aws:iam::123456789012:role/Cognito_identitypoolAuth_Role",

“Arn:aws:iam::123456789012:root” #root user allowed

]

"Action": [

"es:ESHttp*",

“es:*”

"Resource": "arn:aws:es:region:123456789012:domain/domain-name/*"

}

]

}

Now, Navigate back to your user pool and Note it down User POOL ID (Summary Section) and APP CLIENT ID (In the App client section). Again go back to your Identity pool console and select your identity pool. Then, On the Right top Corner Select Edit Identity Pool

On the Edit Section, Collapse Down the Authenticated Identity Provide. Under the Cognito tab, Enter the USER POOL ID and APP CLIENT ID. Then, click Save.

Now, ElasticSearch with Kibana authentication Deployed. At this point, you can check whether authentication for kibana works.

2.4. Installation Monstache In Worker Node (VM) Once All the Instance and ElasticSearch Deployed Properly, Configure the Monstahce on the Worker Node to sync MongoDB to our Deployed ElasticSearch Domain. Note: For Monstache Installation, you must have an internet connection from your private instance. In this case, create NAT Gateway to your private instances. SSH into the Worker Node from Jump and Follow the Steps below to Install Monstache

Install GoLang

sudo wget https://storage.googleapis.com/golang/go1.4.2.linux-amd64.tar.gz

tar -xzf go1.4.2.linux-amd64.tar.gz

export GOROOT=PATH_WHERE_YOU_EXTRACT_GO

export PATH=$PATH:$GOROOT/bin

export GOBIN=$GOROOT/bin

mkdir ~/golang/

export GOPATH=~/golang/

export PATH=$GOPATH/bin:$PATH

Download the Current version of Monstahce Source File from Git Hub.

(Read the instruction the monstache document before Install)

Now cd into the downloaded directory and type

go install

monstahce -version

2.5. Configure monstache for sync database Once Monstache Installed, Create mongo-elastic.toml file on desired directory [linxe@root#] sudo vim /opt/mongo-elastic/mongo-elastic.toml mongo-url ="Mongo Connection String" elasticsearch-urls=["ElasticSearch Domain Endpoint url"] elasticsearch-max-conns = 10 replay = false resume = true resume-name = "default" direct-read-namespaces = ["ops_test.documents"] change-stream-namespaces = ["ops_test.documents"] index-as-update = true dropped-collections = true dropped-databases = true verbose = true [logs] info = "/etc/monstache-build/logs/monstache_info.log" warn = "/etc/monstache-build/logs/monstache_warn.log" error = "/etc/monstache-build/logs/monstache_error.log" trace = "/etc/monstache-build/logs/monstache_trace.log" [aws-connect] access-key="xyz" secret-key="abc" region="ap-south-1" In the above configuration, logs are important, and Monstache support logs and make sure you configured logging. Read the full configuration Document here. When you sync data to elastic search the instance or the Monstache must have access permission to ElasticSearch. So, Create an IAM User or IAM Role to invoke ElasticSearch Domain. In my case, I have created a [aws-connect] toml table to pass my IAM Credentials. 2.5.1. Policy for aws-connect { "Version": "2012-10-17", "Statement": [ { "Sid": "VisualEditor0", "Effect": "Allow", "Action": [ "es:DescribeReservedElasticsearchInstanceOfferings", "es:ListElasticsearchInstanceTypeDetails", "es:CreateElasticsearchServiceRole", "es:RejectInboundCrossClusterSearchConnection", "es:PurchaseReservedElasticsearchInstanceOffering", "es:DeleteElasticsearchServiceRole", "es:AcceptInboundCrossClusterSearchConnection", "es:DescribeInboundCrossClusterSearchConnections", "es:DescribeReservedElasticsearchInstances", "es:ListDomainNames", "es:DeleteInboundCrossClusterSearchConnection", "es:ListElasticsearchInstanceTypes", "es:DescribeOutboundCrossClusterSearchConnections", "es:ListElasticsearchVersions", "es:DescribeElasticsearchInstanceTypeLimits", "es:DeleteOutboundCrossClusterSearchConnection" ], "Resource": "*" }, { "Sid": "VisualEditor1", "Effect": "Allow", "Action": "es:*", "Resource": "arn:aws:es:<region>:<accountid>:domain/<es domain name/*>" } ] } Note: It is a best practice Create IAM Role to the Instance instead of creating the programmatic access and attaching to the instance. 2.5.2. Configure Monstache at startup Monstache has support built-in for integrating with systemd. The following monstache.service is an example systemd configuration. Before configuring ExecStart, check where are monstache and mongo-elastic.toml file [Unit] Description=monstache sync service [Service] Type=notify ExecStart=/usr/local/go/bin/monstache -f /op/mongo-elastic/mongo-elastic.toml WatchdogSec=30s Restart=always [Install] WantedBy=multi-user.target WatchdogSec=30s restart the configured systemd every to 30s to sync the MongoDB Read the documentation here

That's Pretty much it guys and this was a bit long blog. But I believe this is very important to contend to cover.

Reference

1. Amazon Cognito Authentication for Kibana [ https://docs.aws.amazon.com/elasticsearch-service/latest/developerguide/es-cognito-auth.html ]

2. How to Install golang on CentOS / Amazon Linux [ https://gist.github.com/dasgoll/5e1a46cb3e7a35746bb88f386fcf78fa ]

3. Monstahce Installation [ https://rwynn.github.io/monstache-site/start/ ]

4. Monstache Configuration [ https://rwynn.github.io/monstache-site/advanced/#systemd ]

5. ElasticSearch Domain Installation [ https://docs.aws.amazon.com/elasticsearch-service/latest/developerguide/es-createupdatedomains.html ]

6. Toml Scripting [ https://www.w3schools.io/file/toml-introduction/ ]

1 comment:

dddApril 28, 2022 at 1:36 AM
ElasticSearch + Kibana database

Elasticsearch is a free, open-source search and analytics engine based on the Apache Lucene library. It’s the most popular search engine and has been available since 2010. It’s developed in Java, supporting clients in many different languages, such as PHP, Python, C#, and Ruby.

Kibana is an free and open frontend application that sits on top of the Elastic Stack, providing search and data visualization capabilities for data indexed in Elasticsearch. Commonly known as the charting tool for the Elastic Stack (previously referred to as the ELK Stack after Elasticsearch, Logstash, and Kibana), Kibana also acts as the user interface for monitoring, managing, and securing an Elastic Stack cluster — as well as the centralized hub for built-in solutions developed on the Elastic Stack. Developed in 2013 from within the Elasticsearch community, Kibana has grown to become the window into the Elastic Stack itself, offering a portal for users and companies.ElasticSearch + Kibana database

our ElasticSearch + Kibana database expert skills & 24/7 dedicated support for stable clusters and achieve unparalleled performance and cost reduction along with stable data health. Experience our enterprise-class, worldwide support for Kibana integrated Elasticsearch & other stack.With years of direct, hands-on experience managing large Elasticsearch deployments, Genex efficiently supports data-analytics in real time. Take advantage of market-leading functionalities and Kibana visualizations on large data sets, with features including high available clusters, TLS, and RBAC.

ReplyDelete
Replies

Add comment