Accessing a MongoDB instance from AWS Lambda using Python

by Emre Yilmaz
Jun 8, 2018

AWS • Serverless • Python • Databases • AWS Lambda
Istanbul

Accessing MongoDB from AWS Lambda using Python

In recent days, I made some trials for connecting to MongoDB databases from AWS Lambda functions using Python. In today’s post I will share my experiences with you and take some notes about these trials for future reference. We will install MongoDB on an EC2 instance and develop simple Python functions to access it. Let’s start!

About MongoDB on AWS

Unlike some relational databases such as MySQL or PostgreSQL, MongoDB has not been offered by AWS as a managed solution yet. Although AWS provides some whitepapers and examples on how you can deploy MongoDB on Amazon EC2 instances, they prioritise DynamoDB over it as a no-SQL database. Therefore, you need to have a MongoDB server installed and running on an Amazon EC2 instance.

Installing MongoDB on Amazon Linux 2

First of all, let me note that this post is not about how you can deploy MongoDB on EC2 according to best practices. I will only prepare a MongoDB database to make some trials and make connections from AWS Lambda functions. As we made our note, let’s continue and launch an Amazon EC2 instance with Amazon Linux 2 operating system.

Before installing MongoDB, we need to add MongoDB repository to install version 3.6.

sudo nano /etc/yum.repos.d/mongodb-org-3.6.repo

Then, add these values below and save.

[mongodb-org-3.6]
name=MongoDB Repository
baseurl=https://repo.mongodb.org/yum/amazon/2013.03/mongodb-org/3.6/x86_64/
gpgcheck=1
enabled=1
gpgkey=https://www.mongodb.org/static/pgp/server-3.6.asc

And install MongoDB.

sudo yum install -y mongodb-org

In default mode, MongoDB accepts only requests from the local instance. Actually this is very similar to MySQL server. To allow connections from outside such as AWS Lambda we should remove this constraint. We will protect connections to our MongoDB instance using VPC security groups.

In the /etc/mongod.conf file, you will see a bind 127.0.0.1 which should be updated as bind 0.0.0.0. I tried commenting it out but it was useless and connections failed. This will allow all connections from outside.

After making this change start MongoDB:

sudo service mongod start

After completing these steps, your MongoDB server will be ready to accept connections. However, we should create a Security Group to allow AWS Lambda but deny other connections from port 27017 which is the default port of MongoDB.

Security Groups for AWS Lambda and MongoDB

If you recall my post about accessing RDS from AWS Lambda, you can remember that we need to define security groups and subnets for our AWS Lambda function. It is because our MongoDB is inside a VPC.

Now, create two security groups as below:

A security group for AWS Lambda functions with no inbound rule. Let’s call it lambda-sg.
A security group for MongoDB instances allowing custom TCP port 27017 inbound connections from the Lambda security group (lambda-sg) and let’s call it mongodb-sg.

While deploying your AWS Lambda functions attach lambda-sg to them and select all subnets in the same VPC of your MongoDB instance. You should be able to connect MongoDB instance from your AWS Lambda function using port 27017.

AWS Lambda functions

We created our MongoDB database and now, let’s implement simple AWS Lambda functions which create, update, delete and retrieve users from our database. In these functions, we will use PyMongo module which provides tools to work with MongoDB databases.

Deployment notes

Deployment of lambda functions in this example is no different than any function that accesses VPC resources. You should attach an IAM role having AWSLambdaVPCAccessExecutionRole managed policy, appropriate security groups and subnets to your functions. If you need more information, this previous blog post might be useful for you.

AWS Lambda function for creating users

import logging
import json
import bson
import os
from pymongo import MongoClient

# Logger settings - CloudWatch
logger = logging.getLogger()
logger.setLevel(logging.INFO)

# Set client
client = MongoClient('mongodb://{}:27017/'.format(os.environ['DB_HOST'] ))

# Set database
db = client.test_database

def handler(event, context):
    logger.info("Received event: " + json.dumps(event, indent=2))

    logger.info("initializing the collection")
    users = db.users

    logger.info("creating the user...")
    user = {
        "first_name": event["first_name"],
        "last_name": event["last_name"],
        "email": event["email"]
    }
    user_id = users.insert_one(user).inserted_id

    # Get created document from the database using ID.
    user = users.find_one({ "_id": user_id })

    return json.loads(json.dumps(user, default=json_unknown_type_handler))

def json_unknown_type_handler(x):
    """
    JSON cannot serialize decimal, datetime and ObjectId. So we provide this handler.
    """
    if isinstance(x, bson.ObjectId):
        return str(x)
    raise TypeError("Unknown datetime type")

As you can see, it is a simple function which assumes that first_name, last_name and email attributes are all provided to event when calling the function.

MongoDB creates databases and collections lazily. If they do not exist during the creation of a document, it creates them at that stage. In the code, we construct a user document, provide it to the collection’s insert_one method and return the created id using the inserted_id attribute from the response. In the end, we retrieve the created document from the database using this id to verify that it is created and provide in the response.

Python’s json module is unable to serialize ObjectId values which MongoDB documents’ _id attribute is a member of. To solve this issue, we need to use a helper method (json_unknown_type_handler) when returning the created document in JSON format.

In the part initializing the collections you see users = db.users. Here, users on the right is the collection name and this expression is equal to users = db["users"]. It is simply used to access the collection named users in MongoDB.

A note on database host IP address

We get the MongoDB host IP from DB_HOST environment variable. You should set this environment variable to the private IP of your MongoDB instance while deploying your AWS Lambda function.

Why should you use private IP instead of the public one? Well, if you provide the public IP address, AWS Lambda function has to traverse the Internet to access your instance and it will timeout as you did nothing to configure its Internet it is best to use the private IP address.

AWS Lambda function for updating users

This function uses the email of the user to find the document in MongoDB database and updates it with first_name or last_name values provided. I will only include handler method below as the other parts are same as the create function.


def handler(event, context):
    logger.info("Received event: " + json.dumps(event, indent=2))

    logger.info("initializing the collection")
    users = db.users

    user ={}
    if 'first_name' in event:
        user["first_name"] = event["first_name"]
    if 'last_name' in event:
        user["first_name"] = event["first_name"]

    logger.info("updating the user...")
    response = users.update_one({
            "email": event["email"]
        },
        {
           "$set": user
        },
        upsert=False
    )

    # Get updated document from the database using ID.
    user = users.find_one({ "email": event["email"] })

    return json.loads(json.dumps(user, default=json_unknown_type_handler))

Here, I kept the function simple and did not implement any validations, but you should definitely add them if you are developing for production.

In the function update_one method does the trick. It takes three parameters:

A filter in JSON format to find the document. Here we filtered using the email, but we can provide _id value as well. Then the filter would become as below.

{
    "_id": event["id"]
}

The attributes to be updated and their values in a JSON object with $set attribute. For example, if you provide only first_name with value Emre this becomes:

{
    "$set": {
        "first_name": "Emre"
    }
}

Whether an insertion will occur if the document is not found in the database. I chose not to do it using upsert=false.

In the end, the function retrieves the updated document and returns.

AWS Lambda function for retrieving a single user

In this function, I will use the email of the user to retrieve it from the database. It is a short, simple function. Again, I only included handler part of the Lambda function.

def handler(event, context):
    logger.info("Received event: " + json.dumps(event, indent=2))

    logger.info("initializing the collection")
    users = db.users

    logger.info("retrieving the user...")
    user = users.find_one({ "email": event["email"] })

    return json.loads(json.dumps(user, default=json_unknown_type_handler))

This function uses find_one method of the collection. Actually, we used this in the previous functions. It gets a filter in JSON which we used email attribute for.

AWS Lambda function for deleting a single user

To delete the user, I will use him/her email. It is a short, simple function and I am including all of it as it will only delete and return a response for successful operation.

import logging
import json
import os
from pymongo import MongoClient

# Logger settings - CloudWatch
logger = logging.getLogger()
logger.setLevel(logging.INFO)

# Set client
client = MongoClient('mongodb://{}:27017/'.format(os.environ['DB_HOST'] ))

# Set database
db = client.test_database

def handler(event, context):
    logger.info("Received event: " + json.dumps(event, indent=2))

    logger.info("initializing the collection")
    users = db.users

    logger.info("deleting the user...")
    response = users.delete_one({ "email": event["email"] })

    return { "operation": "success" }

delete_one method is used for deletion and it is very similar to find_one. It gets a filter in JSON and again, you can use _id instead of email.

Additional notes for Python and MongoDB

Listing documents in a collection

Retrieving all documents form MongoDB using Python is simple:

for user in users.find():
        print(user)

To filter the documents, just provide a filter into the find method:

for user in users.find({ "first_name": "Emre" }):
        print(user)

This code retrieves and prints all user documents having Emre as the first name.

Adding a new attribute

Unlike relational databases, NoSQL databases are very convenient if your documents have different numbers of attributes. You can easily add a new field while updating the document. For example, let’s say that our document does not have an attribute named link and we would like to store link information for some users. All we need to do is to add a new attribute and its value in $set attribute in update_one method.

users.update_one({
        "email": event["email]
    },
    {
        "$set": {
            "link": "https://some-url"
        }
    },
    upsert=False
)

Removing an attribute from a document

Adding a new attribute was simple, right? So is removing an existing one. Now, let’s say that we need to remove the link attribute previously added. In this case we use $unset method and as the name suggests, it unsets a set attribute. We provide the attribute name with an empty value and it’s gone.

users.update_one({
        "email": event["email]
    },
    {
        "$unset": {
            "link": ""
        }
    },
    upsert=False
)

Connecting to a MongoDB instance using authentication

In my examples, I mostly used unauthenticated connections and protected the connection between AWS Lambda functions and the MongoDB instance using VPC security groups. However, it is a good practice to secure your environment in all layers. So, in case you activate authentication on your MongoDB instance, the only thing you need to do is editing your client’s initialization line.

    client = MongoClient('mongodb://{}:{}@{}:27017/'.format(
            os.environ["DB_USERNAME"],
            os.environ["DB_PASSWORD"],
            os.environ["DB_HOST"],
        )
    )

This connection retrieves username and password values from DB_USERNAME and DB_PASSWORD environment variables respectively.

Conclusion

It was an enjoyable work for me to set up a MongoDB database on an EC2 instance in minutes and try Python’s pymongo module within AWS Lambda. Nowadays, NoSQL databases are very popular and MongoDB is among them.

While deploying these Lambda functions, I used AWS Serverless Application Modal to simplify and automate the process and plan to discuss it soon along with SAM CLI.

I hope it was a useful post and you enjoyed it, too.

Thanks for reading!