It’s time to seriously consider Identity as a Service (IDaaS) solutions (such as Azure AD B2C) for user authentication

If you watched the news in 2016 alone it would be pretty clear that many organizations are doing a very poor job in protecting the user credentials that are in their care. Even more, when organizations somehow lose credentials for hundreds of millions of users (looking at you Yahoo) and users are not fully up-in-arms about it we know we’re in a bad place – the place where users are so used to hearing bad news related to security or privacy that they almost don’t care anymore.

We can do better than this. The goal of this post is to present arguments on why authentication mechanisms in many organizations are failing and show why Identity as a Service (IDaaS) solutions such as Azure Active Directory B2C (business-to-consumer) are the future.


According to a Gartner report from June 2016 (see link below in the resources section) by 2020, 40% of identity and access management (IAM) purchases will use the identity and access management as a service (IDaaS) delivery model — up from less than 20% in 2016. If you’re planning to build any new consumer facing applications in the near future or if the security of your existing application credentials is keeping you up at night then you should seriously start to look at IDaaS solutions.

Challenges with traditional consumer identity and access management systems

Security & privacy risks

  • The username/password list is a target for attackers – attackers look for easy targets knowing consumers often reuse login credentials across accounts (the credentials list is often more important than the data it protects)
  • Developers often don’t really understand well how to properly secure passwords with custom solutions
  • Attacks are getting more sophisticated / threats are constantly evolving

High TCO (total cost of ownership)

  • Development time & costs – lots of code to write for identity management functions (sign-up/sign-in, email verification, password resets, MFA, user experience UI/UX)
  • Software licensing, maintenance and upgrade costs (when using off-the-shelf software for identity management features)
  • Identity management functionality is a moving target – for example in the Microsoft .NET world built-in identity management approaches change every 1-2 years (ASP.NET Membership, ASP.NET Identity) leaving behind fragmented applications that are hard to maintain

Scalability and availability challenges

  • Consumer traffic is highly seasonal
  • Organizations are forced to provision for peak capacity
  • With millions of users this can be very costly

What is Identity as a Service (IDaaS) and where does it fit in?

  • Cloud-based service that provides a set of identity and access management functions
  • An authentication infrastructure that is built, hosted and managed by a third-party service provider
  • This is in contrast to traditional identity and access management (IAM) solutions that are typically completely on-premises and delivered via bundled software and/or hardware means
  • According to Gartner, IDaaS functionality includes:
    • Identity governance and administration (“IGA”) — this includes the ability to provision identities held by the service to target applications
    • Access — this includes user authentication, single sign-on (SSO), and authorization enforcement
    • Intelligence — this includes logging events and providing reporting that can answer questions such as “who accessed what, and when?”

IDaaS sounds interesting … why consider Azure Active Directory B2C?

Yes – there are quite a few IDaaS solution providers out there – so why am I advocating for Azure AD B2C? To answer that I’ll just point to a recent June 2016 study (see link below) where Gartner analyzed the IDaaS space:

Gartner 2016 Magic Quadrant for Identity and Access Management as a Service

Microsoft with its IDaaS offerings is currently in the leader quadrant. The success of its IDaaS solution (Azure Active Directory) is very closely tied to the success and growth of Microsoft Azure – its cloud solution – and by all indications it has a strong future ahead of it.

What exactly is Azure Active Directory B2C?

  • Cloud identity service with support for social accounts and app-specific (local) accounts
  • For enterprises and ISVs building consumer facing web, mobile & native apps
  • Builds on Azure Active Directory – a global identity service serving hundreds of millions of users and billions of sign-ins per day (same directory system used by Microsoft online properties – Office 365, XBox Live and so on)
  • Worldwide, highly-available, geo-redundant service – globally distributed directory across all of Microsoft Azure’s datacenters

How is it better than a custom authentication solution?

  • Easy to integrate consumer self-service capabilities (sign-up, password resets)
  • Site owner controls user experience (custom html & css for sign-in/sign-up)
  • Enterprise-grade security with continuously evolving anomalous activity, anti-fraud and account compromise detection systems (offload security to the real domain experts)
  • Benefits of security-at-scale – uses machine learning to watch billions of authentications per day across the entire Azure AD ecosystem and detect unusual behavior
  • Superior economics compared to on-premises – pay-as-you-go pricing + free tier
  • Based on open protocols and open standards – OAuth 2.0, OpenID Connect
  • Uses open source libraries for .NET, Node.js, iOS, Android and others / REST-based Graph API for management
  • Better and faster development experience for authentication / easy to integrate with existing sites wherever they’re running from (not just those in Azure)
  • Ability to easily integrate social logins if needed (Facebook, Google and such)
  • Support for MFA (multi-factor authentication)
  • Authentication database is separate from the application data / easier to enable SSO (single sign on) later across other apps in the enterprise (unified view of the consumer across apps)

Sounds interesting – tell me more: who is using Azure AD B2C?

  • Azure AD B2C is a natural choice for consumer facing apps hosted on Azure (but certainly not only for those)
  • Some stats (from Microsoft presentations as recent as September 2016) on Microsoft Azure AD (the technology that B2C is built on)
    • 90% of Fortune 500 companies use Microsoft Cloud
    • More than 10 million Azure AD directories
    • More than 750 million user accounts in Azure AD
    • More than 110K third-party applications that use Azure AD each month
    • More than 1.3 billion authentications every day with Azure AD
  • The state of Indiana used Azure AD B2C for user authentication in order to integrate various features into a single citizen portal
  • Real Madrid (one of the most popular soccer clubs on the planet) uses Azure AD B2C to offer authentication services for their mobile app used by more than 450 million fans (Microsoft case study)

What about security for all this authentication data in the cloud?

To answer the topic of security for authentication data when IDaaS solutions are used I’ll just include a quote from the Gartner report I mentioned above – I totally agree with their assessment:

No security is perfect. Ultimately, prospective customers must decide whether vendors’ stated control sets are sufficient for their needs. IDaaS vendors give significant attention to ensure the security of their platforms. Based on the number of enterprise security breaches that have been made public, and the lack of any such breaches for IDaaS providers, Gartner believes that IDaaS vendors are more likely to provide better security for IAM services than their customers could provide for themselves.

Additional Resources

Amazon, microservices and the birth of AWS cloud computing

I started doing some research on microservices and came across this really interesting video from about 5 years ago where Werner Vogels, Amazon’s CTO, talks about how (and why) Amazon switched to a microservices architecture. It’s a really interesting presentation that explains the challenges that was facing in its early years and how internal solutions to those early problems were the basis for AWS cloud computing later.

Werner Vogels – Amazon and the Lean Cloud

It’s a relatively short presentation – about 30 minutes – but it’s full of interesting details about those ‘early days’ of cloud computing. Here are some highlights:

  • In the early 2000’s Amazon’s main e-commerce site – – was facing some technical challenges. Its architecture at that time was typical of the web applications we still build today – a single monolith application code base, a common technology stack in all web areas, with massive relational databases on the backend. What were some of the problems they were having in those early days? Code compiles and deployments were taking too long. The backend databases were massive and hard to manage. Bottlenecks existed everywhere – it was getting harder and harder to make progress, release new features and keep up with growth.
  • Amazon’s technical architects analyzed the problem and realized that the path they were on would not take them far in the future. The decision was made to move towards a microservices architecture (they didn’t call it that back then but that’s what they were basically building). The idea with microservices was that every little feature and capability for the retail site would be provided by a mini-service that would interact with other services through well-defined interfaces. This is the path that went on for the next few years. According to Werner the current homepage for is put together by a few hundred such microservices.
  • It’s hard to believe that such an architecture could actually work at the scale that needed – it sounds like the perfect recipe for chaos. Specific changes were needed to how Amazon’s internal teams worked in order to make it work. The idea of “two-pizza teams” was at the core – a team supporting a particular microservice should not be bigger than the number of developers who could eat two pizzas. This usually meant no more than 10 technical folks to such a team – a perfect number for a team that could do work without needing complex meetings to bring everybody up-to-date on progress. Teams chose the technology stack they would use for a particular microservice. Another critical concept was the idea of “you build it, you run it”. These small teams were in charge with development and operations for their service (they were doing devops before it was actually cool). Amazon now had hundreds of such teams working on the site.
  • Things were going well initially but they realized after a while that the rate of progress and productivity was slowing down. A more careful analysis of the situation showed that these teams were now spending close to 70% of their time doing operations work – making sure that their services would be operational according to the standards for high availability required for Engineers were solving the same problems over and over on their own because they had no common internal infrastructure resources they could use.
  • This is when the idea of infrastructure on demand started to come up – the beginning of the AWS cloud operations. First, object storage (S3) … then compute (EC2) and on they went from there. Somehow along the way these internal elastic ‘cloud’ capabilities were exposed to external customers and the rest is history.

It’s indeed a fascinating inside look at how the AWS cloud was born. If you’ve wondered how come Amazon, an online book retailer, ended up being a cloud computing powerhouse then this video will give you some of the answers.

4 tell-tale signs of Entity Framework performance problems (for SQL Server DBAs)

Here’s the scenario: you are a SQL Server DBA and you manage at least one database server that’s used by custom applications created by in-house developers using Entity Framework. From your experience you believe that the database server is powerful enough (hardware-wise) to handle the applications using it but the fact is that the database server is actually struggling to keep up. You often see high CPU usage, high network traffic and a much larger volume of queries than you’d expect from the applications using it. Management wants to know what the problem is: is the server not powerful enough from a hardware point of view? Is it mis-configured? Is the application not properly coded?

You have a suspicion that the application code is not as tight as it could be but you’re not really a .NET developer and you don’t really know much about Entity Framework. What would you look for from the SQL Server side in order to make recommendations to management (and the development team maybe) about what steps to take to improve performance?

If your first instinct is to say that Entity Framework is evil, that all ORM (object-relational mapping) tools should be banned and that your databases can only be queried using stored procedures … then you’re probably on the wrong path and you’re not likely to make many friends with that approach.

Entity Framework is a tool – a very powerful tool I would say – in the .NET development stack. It enables developers to write applications faster by focusing on business logic and business models instead of having to worry about the low-level plumbing necessary to get data in and out of a database. The problem with Entity Framework (when it comes to performance) is that often it hides the database layer so well that developers forget that their various object manipulations end up generating all sorts of queries against the database – queries that they would probably be a lot more careful with if they actually had to write them from scratch in SQL code.

As is the case with most powerful tools Entity Framework has quite a set of instructions and best practices that must be followed in order to get the best performance out of it. The best single-page collection of Entity Framework performance tips (in my opinion) can be found at the location below:

Performance Considerations for Entity Framework 4, 5, and 6

That page is full of information and it can be quite intimidating. Much of it though applies to best-practices that belong in the .NET layer and would not directly be visible to a SQL Server DBA. If you want to see from the database side if best-practices were followed what are some of the tell-tale signs you should look for?

Some observations before we get started:

  • I mentioned that these tips would be for SQL Server DBAs because Entity Framework apps usually use SQL Server as a database backend. The truth is that probably most of these also apply to applications that use Entity Framework with other types of databases.
  • As any good IT troubleshooter will know one trick to solving IT problems is to try to isolate the actual issue from background noise. In the context of what we’re trying to do here you need to be able to run the offending application in a way that you can easily study its SQL Server activity – either against a SQL Server where you’re the sole active user or maybe against a staging system with low activity. You would then proceed to use the application through its UI and record its SQL activity for later analysis using tools such as SQL Server Profiler, Extended Events or any other third-party tool you prefer.

What should you then look for?


Entity Framework will not exactly issue SELECT * FROM commands – what it will do though is have explicit SELECT statements that include ALL columns in a particular table. If you see that most SQL queries are selecting all columns this way (especially from large tables when it appears that the UI is not using all that data) then you know that developers got a little sloppy with their code. It’s very easy in Entity Framework to bring back all columns – it takes more work and thought to build a custom LINQ projection query to select only the needed columns.

The famous N+1 problem

Here’s how to spot this pattern: say that in the UI you go to a page that shows a grid of data with 100 rows. You would typically expect the application to be able to return all that grid data with a single query. Instead, from the SQL activity you captured you see an initial query that gets ‘most’ of the data and then … surprise … 100 other queries that all look very similar (usually with some different values in the WHERE clause).

In Entity Framework it’s very easy to fall into this trap that deals with loading related data. By default Entity Framework in an application has a setting called Lazy Loading set to true – you’re probably guessing already what this does. Let’s say you have one query where you request data for the grid in our example. If you display fields related to that data that were not directly included in the main query Entity Framework (out of the goodness of its heart) will just quietly go and grab the additional data from the database on a need-to-have basis. As the code loops over the various rows being displayed you’ll see additional queries being executed for each row. For N rows of data we then end up with N + 1 queries (or possibly even worse). This is mentioned in section ‘8 Loading Related Entities’ on the page above. The solution is to use Eager Loading (and possibly disable Lazy Loading) as a way to be very explicit about the data that is needed for display at the time when the original query is executed.

Implicit type conversions and unused indexes in queries against VARCHAR fields

This one might be harder to detect but in the ‘perfect storm’ kind of scenarios it can bring a server to its knees with high CPU usage and heavy IO activity. Here’s what’s going on: let’s say that you see queries that in the WHERE clause search against a VARCHAR column in a table (for example a GUID that’s used as a primary key). Nothing really wrong with that – the table has an index on that VARCHAR column so all should be well … except that it’s not. Simple queries from Entity Framework against that table take much longer than expected. You look at the query plan and you see that the index on that column is not being used and that SQL Server is warning you that an implicit type conversion took place.

What exactly is going on there? As you look at the original query more carefully you see that Entity Framework is passing that parameter as a NVARCHAR value. SQL Server must convert it to match the data type in the table and in the process it will not make use of that index. The problem comes from the fact that in Entity Framework all string fields by default are considered to be NVARCHAR types when queries are created. If you have VARCHAR fields in the database then the developers must specifically declare the column types in the Entity Framework data model as VARCHAR. This way Entity Framework will use the proper data types when running the query and all will be well again.

Missing caching layer for metadata / lookup tables

This is not exactly an Entity Framework problem but rather an application architecture issue that’s easy to overlook when using an ORM tool. Let’s say that you have data in the database that does not change often – such as the list of US states. Once the application gets it once it should hold on to it and use it from some sort of memory cache layer whenever it needs it. Instead, you see that queries are being made against such tables all over the place. The fastest query to execute is the one that does not need to execute at all. Developers should implement some sort of caching layer in the application and store such data locally next to the application for long term use.

So there you have it. I’ve outlined here a few usage patterns along with a very detailed page to help you, as a SQL Server DBA, determine if the Entity Framework application hitting one of your databases is probably lacking some of the performance best-practices mentioned above. Sometimes just throwing more hardware at the problem might be cheaper in the short term but as it usually happens bad application code combined with enough user activity will eventually bring even powerful servers to their knees.

Resources for Amazon Web Services (AWS) migrations from EC2-Classic to EC2-VPC

AWS EC2 (Elastic Compute Cloud) is one of the best known cloud services delivered by AWS. It is a service that allows customers to purchase resizable cloud hosting resources. It is – in my opinion – the best current implementation of IaaS (Infrastructure as a Service) from all cloud service providers. It truly delivers on the promise that within minutes you could spin up a new server instance and proceed with meaningful work without having to wait for days/weeks for a vendor (or IT department) to purchase and deliver properly configured hardware systems.

The original implementation of EC2 (which over time became known as EC2-Classic) had one interesting shortcoming that only became more and more apparent as customers continued to build solutions increasing in complexity: EC2 instances for ALL customers in a given AWS geographic region share private IP addresses in the 10.x.x.x space (technically there are some ranges from that Class A network that are not being used but that’s not relevant for this discussion). For example you could have in your account a web server with the IP address of and another web server with the IP address of (and you basically had no idea or control over who would be using the private IP address of

AWS did provide the technology of security groups (basically software firewalls that wrap around instances) to allow customers to group together EC2 instances of similar functions (and not allow intruders access) but as customers were building more and more complex solutions it was getting harder and harder to manage the instances that one owned in AWS.

This model where one’s servers were spread all over the 10.x.x.x range was not the way that networking professionals were used to run networks in their own data centers.

In 2009 AWS introduced an improvement to the original EC2 approach – VPC (Virtual Private Cloud). In a VPC customers now had the ability to use their own private IP range, divide the network as they saw fit and pretty much come back to networking models that they were used to. There are many, many more features to AWS VPC that make it a clear winner over EC2-Classic but those are not the main focus of this article.

For a while then we had the two technologies side by side – EC2-Classic and EC2-VPC. Customers were able to create EC2 instances using either model. It was becoming clear though that EC2-VPC was the superior technology and AWS proved that on 2013-12-04 because after that date all new AWS accounts only supported EC2-VPC. In new accounts created after that date AWS automatically creates a default VPC and places all EC2 instances in that context.

Many of the older AWS customers were slowly faced with a dilemma: what were they supposed to do with their aging EC2-Classic instances? AWS is constantly innovating, adding new instances types and new features but many of those are now only available on the EC2-VPC side. If customers want to take advantage of the latest AWS features then they need to consider migration paths from EC2-Classic instances to EC2-VPC.

So what are the migration options available?

Read more