Relationship between AWS EBS snapshots and EBS volume failure rates

I’ve been doing some research lately on best practices that pertain to the correct use of AWS EBS volumes from the point of view of data redundancy – in particular around the relationship between combining EBS volumes in a software RAID volume at the OS level (either RAID 1 or RAID 10) and the proper use of EBS snapshots.

On the AWS EBS details page (http://aws.amazon.com/ebs/details/) in the section on “Amazon EBS Availability and Durability” we find the following:

Amazon EBS volumes are designed to be highly available and reliable. At no additional charge to you, Amazon EBS volume data is replicated across multiple servers in an Availability Zone to prevent the loss of data from the failure of any single component. For more details, see the Amazon EC2 and EBS Service Level Agreement.

The durability of your volume depends both on the size of your volume and the percentage of the data that has changed since your last snapshot. As an example, volumes that operate with 20 GB or less of modified data since their most recent Amazon EBS Snapshot can expect an annual failure rate (AFR) of between 0.1% – 0.5%, where failure refers to a complete loss of the volume. This compares with commodity hard disks that typically fail with an AFR of around 4%, making EBS volumes 10 times more reliable than typical commodity disk drives.

The statement that really puzzled me for a while was the claim that EBS volumes are more durable (with a lower failure rate) when EBS snapshots for them are created more often. It just doesn’t appear that the two would be related … at least not when we think about drives and backups in a non-cloud, non-redundant way. I thought for a while that this was just some strange marketing statement made by AWS – almost like a reverse Murphy’s law: the more backups/snapshots you make the lower the chance that your drive will fail. I understand that having more frequent EBS snapshots of a volume would enable one to restore more recent versions of that volume’s data from the snapshot but how exactly could taking snapshots affect the rate at which a volume would physically fail?

Read more

HTTPS certificate verification problem in Python 2.7.9

Over the last few days I was working through some AWS security best practices as outlined in a video presentation from AWS re:Invent 2013 called “Intrusion Detection in the Cloud (SEC402)” (video linkPDF presentation link).

One of the interesting ideas that the authors were promoting was this particular Python script (script linkJSON template for IAM policy) which would use the AWS Python SDK to obtain textual descriptions of various critical security settings that would be part of an AWS account (such as IAM users, groups and policies, S3 bucket policies, EC2 security groups and so on). These settings could all be exported and saved in a text file, the script could be scheduled to run at various intervals and alerts could be raised each time differences in the file would be detected.

I’m currently in the middle of setting up the AWS environment for a client project so I liked the idea of having a script such as the one mentioned above which I could use to script various critical settings and monitor how they change over time. Besides … Python is not really a language that I use often (though I do plan to become more familiar with it) so I welcomed the chance to try something new – how hard could it really be? 😉

I initially proceeded to download and install the latest version of Python – 3.4.2. It didn’t take too long to realize that the script authors actually wrote it for the Python 2.x branch. I was getting all sorts of errors trying to run it under 3.4.2 – as soon as I managed to get one of them fixed, another one came up. I didn’t really care much which version of Python I’d use so since 3.4.2 was giving me enough trouble I decided to switch to the 2.x branch and installed 2.7.9.

Success … or so it seemed. The script finally started running until half-way through its execution it died with this strange looking error while trying to connect to some AWS API endpoint:

ERROR: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:581)

I was pretty sure that Python was not happy about some discrepancy it found in a HTTPS certificate that it received while trying to make a HTTPS call to AWS. The problem was that the HTTPS site it was reaching out to was controlled by AWS and used deep in the AWS Python SDK code – not a resource for which I could do anything about the configuration of its HTTPS certificate.

So what exactly was Python complaining about? After some research I came across this link:

https://www.python.org/dev/peps/pep-0476/

It appears that a fairly recent change was made in the 2.7.x branch in regards to the default behavior for certificate verification. I ended up using the following monkey patch to globally disable verification for all HTTPS calls made by the script:

import ssl

ssl._create_default_https_context = ssl._create_unverified_context

The script finally ran successfully to the end and I had my scripted security settings. All in all it was an interesting first adventure in the land of Python.

Using Entity Framework Code First with an existing database – Part 1

If you do any sort of .NET development using a relational database then chances are pretty good that you’d use Entity Framework as your ORM (object-relational mapping) technology – and in particular Entity Framework Code First since that’s the Entity Framework approach that Microsoft appears to favor going forward.

So what exactly is Entity Framework Code First?

The Entity Framework Code First approach allows us to write Plain Old CLR Objects (POCOs) for our data models and then persist them in a data store using the DbContext class. Model classes are much cleaner this way and they are developer-friendly – gone are the days of complex (and often buggy) XML .edmx files which were needed in the past with Entity Framework in order to describe and map the model layer to the physical data storage layer.

Below is a simple example of such a POCO class:

[Table("Books")] // Table name
public class Book
{
    [Key] // Primary key
    public int Id { get; set; }
    public string Title { get; set; }
    public string ISBN { get; set; }

    // Navigational property for reviews
    public virtual ICollection<Review> Reviews { get; set; }
}

We can see here that certain attributes are used to describe some of the physical properties of the SQL table and its columns. Entity Framework Code First makes heavy use of conventions in order to determine what to expect at the physical data storage layer given a particular POCO class that describes the model.

Read more

Microsoft patch MS14-066 leads to https problems with IIS and Google Chrome

Patch Tuesday – two words that will bring some amount of fear in the hearts of many Microsoft Windows administrators. It’s the day of the month (usually the second Tuesday) when Microsoft releases its monthly batch of software updates across the entire family of Microsoft products. Most of the time updates will just be applied without any problems (other than the mandatory system restart) but every now and then things just don’t go as planned … in some very extreme cases the updates lead to blue screens, application crashes and pretty much all-nighters for the stressed Windows admins.

To some extent a little bit of this happened this week. On 11/11/14 Microsoft made available the updates for November 2014. Among the various updates a particular one of notice is MS14-066 (KB2992611) – https://technet.microsoft.com/library/security/MS14-066.

It’s a security patch marked as critical affecting pretty much every operating system (both client and server) that Microsoft currently supports. The ‘critical’ label comes from the fact that it fixes a remote code execution bug which pretty much means that an attacker can do something bad to a system without any action being needed on part of the user. This particular patch deals with a problem in SChannel which is Microsoft’s component that deals with https traffic and cryptography.

It was pretty hard to read the news this week and not come across some story or article that referred to MS14-066 as the Windows Shellshock bug (in reference to Shellshock and Heartbleed – two very serious security bugs that were discovered in the last few months affecting the open source community and *nix operating systems in particular). Bottom line – this is one patch for Windows that you want to install as soon as possible especially if you maintain IIS web servers that run https sites.

Read more