REST Lambda Versioning

So, we want to build a "serverless" application on AWS. We have configured Amazon's "API Gateway" to invoke Lambda functions when accessing certain URLs, and the lambda functions access back-ends like DynamoDB. So far so good. But we don't want to be breaking the application every time we change some code. Instead, we want separate dev and prod versions so we can experiment with the dev version, then move it to prod once it is live.

To be specific, we want two things. First, when we are using it in "dev" mode, we want the lambda function to access a "dev" set of data by reading from a separate set of DynamoDB tables. We can create two sets of DynamoDB tables, named table-name-dev and table-name-prod, but we somehow need to let the code inside of the Lambda function know which version to connect to: we need to pass the environment to the code of the lambda function.

The second thing we want is to be able to change the code of the lambda (in order to experiment in dev) without breaking the prod version. That means that API Gateway needs to call different versions of the lambda depending on whether it is being invoked in dev or prod mode. Lambda has a facility for this, called aliases: whenever you edit a lambda function it modifies a particular version called "$Latest", but you can use the option to "publish a new version" and make a permanent, immutable copy of that version numbered 1, 2, etc (it increments the number each time you publish). Then you can create an alias like "dev" or "prod" and assign the alias to certain published versions. So we want API Gateway to call an appropriate aliased version of the API based on whether it is being invoked in dev or prod mode.

Finally, API Gateway needs to know whether it is being invoked in dev or prod mode. Here, we want to use the API Gateway feature called "stages". We can create two stages called "dev" and "prod" and each time we "publish" the API Gateway settings it will ask which stage we are publishing it to. The URL it generates to invoke the API will have the stage name embedded in it, so that's how the client code specifies whether to use dev or prod: by selecting "dev" or "prod" for the stage in the URL.

So we have a plan -- now to determine how to implement it. We just explained how clients will inform API Gateway what environment is desired, so what remains is to decide how API Gateway will inform the lambda functions what environment is desired (so they can connect to the proper data sources) and how API Gateway will invoke the correct alias'ed version of the lambda function.

We can go to the settings for a stage of our API in API Gateway and edit the "Stage Variables". In the dev stage, we can create one with a name "envName" and a value of "dev" while the prod stage has one with a name "envName" and a value of "prod". Assuming that the "Integration Request" settings for the API endpoint are connecting to a Lambda Function using "Lambda Proxy integration", the "stage variables" will be included as fields in the event that is pased to the lambda function. For instance, in Python we can execute the following:

def lambda_handler(event, context):
    env = event['stageVariables']['envName']

Which will set "env" to either "dev" or "prod", which can then be used to connect to the proper tables.

Invoking the correct alias of the lambda function is a bit more difficult. First, of course, we need to publish the lambda and associate the proper aliases. This is made a little harder by the fact that from the web console there appears to be no way (that I can find) to re-assign an alias to a new version (despite there being an API call for that). But one can delete an alias and then re-create it. After that, in the API Gateway settings, if we go to the endpoint ("resource") that we want and go to configure the "Integration Request", there is a field for specifying the "Lambda Function". NORMALLY, this contains the name of a lambda function (eg: "teamdocket-person-list"), which means that it should invoke the $Latest version of that lambda. But an undocumented feature is that you can instead put the name of a lambda, then a colon, then an alias name: so putting "teamdocket-person-list:dev" will cause it to invoke whichever version of that lambda is currently bound to the "dev" alias.

ALSO not so well documented is the fact that you can use certain meta-characters to interpolate values into fields like this. We can use this to create something that will invoke the correct lambda function for the environment we want. We can set the "Lambda Function" field to the following (for example): "teamdocket-person-list:${stageVariables.envName}", and *I THINK* that the dev stage (where "envName" is "dev") will attempt to call the version of the lambda with the dev alias, while the prod stage (where "envName is "prod") will attempt to call whichever version has the prod alias.

There is an incredibly important thing to know. After publishing a new API version, THINGS WILL NOT WORK. For some amount of time -- I don't know how long -- and the result is that things will SEEM not to work. You will lose hours and hours of time thinking you are doing the wrong thing. I recommend waiting 15 minutes after each change to API Gateway is published. Seriously.

Posted Sun 02 July 2017 by mcherm in Programming