Musings of Rodos

I stumbled across another AWS Blogger, Eric Hammond who blogs at https://alestic.com

One of the recent things which Eric has done is his Unreliable Town Clock (UTC) which you can use to schedule triggering of AWS Lambda functions. Its a cool idea.

Eric certainly knows what he is doing, he not only launched a service he sat down and ensured "this service is as reliable as I can reasonably make it". No wonder he is a AWS Community Hero!

Of course reliability is only one of the elements of an architectural review of an AWS environment. You should cover off such things as Security, Availability, Scalability and Cost Efficiency. Eric has covered some of this. Check out what he has done to ensure UTC is always up and running, there are some great tips in there.

What if you wanted to do a architectural review of your AWS environment. How would you go about that? What questions would you ask? What things require focus? Maybe post in the comments. Saying I will call my friendly AWS Solution Architect is cheating, although its a great idea.

Two items that will really help you get started with a review are these whitepapers.

What would you do beyond this? Here is some very small things I would investigate.

Auditing. Is CloudTrail, Config and VPC Flows all turned on? Its hard to do debugging or forensics on something in the past when you were not capturing the data. Is all the activity from the instance logged to CloudWatch Logs?
What dependancies are there that might stop a failed employment? That autoscaling group may relaunch an instance if it fails. What AMI is it using? Is it your own AMI sitting in the account or are you launching from a public one? What if the public ones goes away because a new one is released? How is the code deployed into that AMI? Is it baked in, coming from S3, does it need to download software from github, what if it can't?
Monitoring. There are 4 metrics in CloudWatch for SNS. Are there any alarms that could be created to provide alert of failure? What if the number of published messages dropped below a certain rate? An alarm like that could replace what Eric is using Cronitor.io for. You can even create those alarms with CloudFormation!
Turning on MFA is always a great idea.

This is the simplest of examples. For your typical system there are hundreds of review items to assess. But you get the idea.

Doing an architectural review is something you should do periodically in your AWS environment. As AWS keeps releasing new features there is frequently new things you can do to improve your setup.

If only everyone was like Eric! Also, anyone use builds everything in CloudFormation is a winner in my book!

Rodos

Todays post is about an AWS service I have been having some fun with, Lambda.

Essentially Lambda its a service which executes your code within millisecond of an "event" happening. An event may be your own action or it can be triggered by actions in other AWS services such as S3, DyamoDB or Kinesis. The great thing is there is no infrastructure to build or run and you pay only for the requests served and the compute time required to run your code. Billing is metered in increments of 100 milliseconds! Its "way cool". You can read all about it on the product page if you need an introduction. But this post is not about whats so cool about Lambda.

What I wanted to cover was that you need to make sure your functions that you write are idempotent. Idempotency in software "describes an operation that will produce the same results if executed once or multiple times". "It means that an operation can be repeated or retried as often as necessary without causing unintended effects."

Why is this important to remember with Lambda? Well there is some text in the documentation and FAQ that sort of explains why.

From the documentation. [highlight is mine]

Your Lambda function code must be written in a stateless style, and have no affinity with the underlying compute infrastructure. Your code should expect local file system access, child processes, and similar artifacts to be limited to the lifetime of the request, and store any persistent state in Amazon S3, Amazon DynamoDB, or another cloud storage service. Requiring functions to be stateless enables AWS Lambda to launch as many copies of a function as needed to scale to the incoming rate of events and requests. These functions may not always run on the same compute instance from request to request, and a given instance of your Lambda function may be used more than once by AWS Lambda.

Also from the FAQ.

Q: Will AWS Lambda reuse function instances?
To improve performance, AWS Lambda may choose to retain an instance of your function and reuse it to serve a subsequent request, rather than creating a new copy. Your code should not assume that this will always happen.

Today Lambda functions are written in Node.js. Here is my Lambda function which returns Twitter data combined with Amazon Machine Learning Predictions to tell me if those tweets are on topic (aka SPAM) or not. My use case was creating a tweet board that filtered junk message based on machine learning. It actually worked really well. But back to our code, you want to jump right to the end, not need to read it all.

getTweetsError = function (err, response, body) {
    console.log('ERROR [%s]', err);
};

function retrieveATweetPrediction(tweet) {

    // This is an async operation and we are going to have lots. Therefore we
    // will use a promise which we will
    // return for our caller to track. When we do our actual work we will mark
    // our little promise as resolved.

    var deferred = Q.defer();

    var req = aml.predict(
    {       
     MLModelId: '',
     PredictEndpoint: 'https://realtime.machinelearning.us-east-1.amazonaws.com',
     Record: { 
         text: tweet['text'].toString(),
         id: tweet['id'].toString(),
         followers: tweet['user']['followers_count'].toString(),
         favourites: tweet['favorite_count'].toString(),
         friends: tweet['user']['friends_count'].toString(),
         lists: tweet['user']['listed_count'].toString(),
         retweets: tweet['retweet_count'].toString(),
         tweets: tweet['user']['statuses_count'].toString(),
         user: tweet['user']['screen_name'].toString(),
    source: tweet['source'].toString(),
   }
    });

    // We did not pass a function to predict so we can call the .on function and 
    // get access to the complete response data. This allows us to look up the original request and 
    // tie this async call back to our original data. If we call it the normal way we dont have access
    // to that, just the response and can't tie it back!
    req.on('success', function(response) {
     if (response.error) {
      console.log(response.error)
     } else {
      var t = "";
   if (response.data.Prediction.predictedLabel == "0") {
          t += 'ON';
    } else {
       t += 'OFF';
         }
            returnData[response.request.params.Record.id]['prediction'] = t;

    var val = response.data.Prediction.predictedScores[response.data.Prediction.predictedLabel];
    if (val < 0.5 ) {
       val = 1 - val;
    }   
            returnData[response.request.params.Record.id]['probability'] = Math.round(val*100000)/1000;
            deferred.resolve(); // This task can now be marked as done
            
     }
    });
    req.send();
    return deferred.promise;
};

function extractTweets() {

    var deferred = Q.defer();

    twitter.getSearch({'q':'#aws','count': 15}, getTweetsError, 
    
        function (data) {

            var tweets = JSON.parse(data)['statuses'];

            // We need to create a list of tasks as we are going to fire off a bunch of async calls to 
            // do a prediction for each tweet.
            var tasks = [];

            for (i in tweets) {

                var id = tweets[i]['id'];
                returnData[id] = {}; 
                returnData[id]['text']       = tweets[i]['text'];
                returnData[id]['name']       = tweets[i]['user']['name'];
                returnData[id]['screen_name']= tweets[i]['user']['screen_name'];
                returnData[id]['followers']  = tweets[i]['user']['followers_count'];
                returnData[id]['friends']    = tweets[i]['user']['friends_count'];
                returnData[id]['listed']     = tweets[i]['user']['listed_count'];
                returnData[id]['statuses']   = tweets[i]['user']['statuses_count'];
                returnData[id]['retweets']   = tweets[i]['retweet_count'];
                returnData[id]['favourites'] = tweets[i]['favorite_count'];
                returnData[id]['source']     = tweets[i]['source'];
                returnData[id]['image_url']  = tweets[i]['user']['profile_image_url'];

                // The prediction return a promise which we will push into our list of tasks.
                // When the prediction is returned it will mark its little task as resolved.
                tasks.push(retrieveATweetPrediction(tweets[i]));
            }

            // We have a list of tasks which are happening. Lets wait till ALL of them are done.
            Q.all(tasks).then(function(result) { 
                // Woot woot, all predicitons are returned and we have our data!
                // We are therefore resolved ourselves now. Whoever is waiting on us is going to 
                // now get some further stuff done.
                deferred.resolve();
            });
        }
    );
    return deferred.promise;
};

// End of Functions, let look at out main bit of code.

// Setup AWS SDK
var aws = require('aws-sdk');
aws.config.region = 'us-east-1';
var aml = new aws.MachineLearning();

// Setup Twitter SDK
var Twitter = require('twitter-node-client').Twitter;
var twitter = new Twitter({
    "consumerKey": "",
    "consumerSecret": "",
    "accessToken": "",
    "accessTokenSecret": "",
    "callBackUrl": ""
});

// Setup Q for our promises, we have lots of calls to make and we need to track when they are all done!
var Q = require('q');

var returnData = {};

// This is the function required by Lambda
exports.handler = function(event, context) {

    returnData = {}; // We may be reincarnated so ensure we are idempotent 
    
    Q.allSettled([extractTweets()]).then(
        function(result){
            // Return our data an end the Lambda function
            context.succeed(returnData);
        },
        function(reason){
            console.log("Opps : " + reason);
        });

};

See how there are lots of functions then some code which sets up some variables, Q and returnData, and then the main function which Lambda will call when an event occurs, exports.handler. Notice how I am not a great coder and I used a global variable to store some data which is used by all of the functions. Well if exports.handler gets called over and over again in the same environment those global variables will not be re-created or cleared. I did not quite realize this at first and wondered why I was sometimes getting weird data back from Lambda, not always, just sometimes.

To fix my problem I simple ensured that I cleared the key variable each time the handler function was called, so you can see that the first thing it does above is the "returnData = {}; // We may be reincarnated so ensure we are idempotent". Fixed. Of course I know I could just code better, but this was my first ever time writing node.js. You can tell me how to improve my function in the comments.

I will probably do another writeup on my Amazon Machine Learning experiment and how I trained it to filter tweets, it was really easy and I have no servers involved, thanks to Lambda to execute my application logic, so I just have S3, Lambda and AML Live Prediction for a highly scalable site.

Hopefully you won't get caught by the same mistake.

Rodos

Musings of Rodos

Lambda

Using an architectural review for improving site reliability

Remember to make your Lambda functions idempotent

Rodney Haywood

Archives

TripIt

Categories