Why create orgtomate?
One of our current projects uses Landing Zone to manage an estate of nearly 200 AWS Accounts and 50 Organizational Units. While ownership of delivery is devolved to workload teams, adherence to quality and standards must be maintained. A shared responsibility model exists where some resources are centrally managed while workload-specific resources are designed and implemented in their own way.
We’re often asked whether we can implement an asset database so that we can know exactly what we have and how it is configured. In theory, this is a valid question, but when you are working on a dynamic best-practice cloud platform your estate is defined not by your resources, but by the code that creates those resources. What you have now may not be what you have in five minutes, as automated deployments continually create, delete and update resources to meet demand or deliver change.
The question "Can we implement an asset database?" is not a problem statement, it is a solution statement. And it's a solution to a problem that has not been defined. This kind of precedent-led solutioneering approach can, and often does, lead to a lot of wasted delivery time and resource cost in cloud projects.
This is usually most relevant in the realm of Observability, i.e., Logging, Monitoring, Reporting & Alerting. The approach often taken is to produce as many metrics and logs as possible, attempt to centralise them all into a single tool, create one or two simple dashboards, create lots of alerts that mostly are informational only, and then justify the very high costs of both engineering the solution and storing and processing all the data with the word “auditability”. The desire to easily see anything and everything the service does is understandable, but the result is a white elephant.
What the business really needed was the ability to ensure the product was secure, healthy, available and performant. If Service Delivery Managers, in conjunction with Architects, Engineers and Delivery Managers, can declare the definitions of sufficient security, health, availability and performance, then only the metrics and logs required to determine those values need be created, and delivered only where they are required. Alerts to the teams who will react to them can be created upon their bounds. We can ensure that the amount of time and money spent delivering Observability solutions is no more than is necessary and sufficient.
In this sense we need to respond to the request for an asset database by focusing on the problem first, rather than the proposed solution.
The Problem Statement?
The truth is there are many problems to be solved and they are potentially unknown. The problem statements are unknown because we don’t know what we might need to know in the future.
The most generic version of a problem statement is one of resource discovery: "As a cloud platform stakeholder, I want to be able to get answers to unknown future questions as to what resources exist and how they are configured in all accounts and regions".
Here are some real-world examples of questions that have been asked:
- "A patch has been released for a critical Windows exploit. Where are all of the Windows EC2 instances in the estate?"
- "Where are there AWS Certificate Manager certificates that are not going to automatically renew so that their renewal can be assured before they expire and cause an incident?"
- "What is our total consumption of AWS Workspaces in the London AWS region?"
- "In which accounts do we currently have AWS SageMaker Notebooks deployed?"
- "What AWS Support cases do we have open in the estate, and what is their status?"
Each of the example questions above could potentially be answered using an asset database, presuming the database was: a) up to date, b) configured to support the services in question and c) configured to track all the data needed to answer the question.
It’s possible, for example, to configure an AWS Config Aggregator to aggregate information from multiple accounts and regions. However, the information you can ascertain depends on exactly what information AWS Config collects and your information is only as current as the latest updates to the aggregator. For the EC2 and ACM examples, AWS Config may be sufficient, but the number of services AWS Config supports is surprisingly small. You could not answer the queries about Workspaces, SageMaker or Support Cases. On the face of it, when presented with only the EC2 question, it might be easy to look at AWS Config as a solution to the problem; but in fact it only solves some examples of the problem.
This will be true of any third-party asset inventory tool. You can only get out of it what you put in. Total coverage is prohibitively expensive to implement. It also has the potential to impact production services when using a significant proportion of the AWS API Request quotas shared by production services. Therefore, compromises made in the implementation of such a tool to make it viable would impact its utility. This issue is not unique to this problem statement; as in Observability, the desire to capture and centralise every possible log entry is soon quashed when it is understood just how much effort and cost is involved in handling every type of log – just so that some of them may be used one day.
Is the time and effort required to deliver a full-coverage asset database solution, as well as the expense of its data collection, storage and indexing, worth the ability to answer the problem statement if it can be answered just as efficiently by other means? This is not to suggest that asset inventory databases are of no value; for some much more well-defined problem statements they absolutely can be and are the reason AWS Config exists. They're just not the necessarily the answer in this case. The problem statement contains the word "unknown", and that is its downfall.
Is there a quick and efficient way to get the information we need? All of the information we need is already stored in a database, and it is a database we have direct access to via the AWS APIs. All we must do is query the database for the information we need.
AWS is its own asset database!
AWS already provides a command-line tool to query this database, and effect change in it at the same time: the AWS Command Line Interface (CLI). It is an exceptional tool that has made the lives of AWS Platform Engineers significantly easier in both manual and automated operations. I would argue it has been a driving force behind the adoption of AWS.
However, the AWS CLI has failed to keep up with the scaling demands of Enterprise workloads. The CLI lets you operate on a single AWS account in a single region and does not offer much in the way of extensibility, designed as a self-contained tool with private libraries for its own usage. It’s possible to use the AWS CLI to meet the problem statement, but it’s cumbersome and slow, and the larger you scale your Organization, the slower it becomes. Every AWS Platform Engineer is familiar with:
This is how you get a response from three regions: one after the other, with one output per region, for you to process as needed once the three loops eventually complete. But let’s say you wanted to query multiple regions and multiple accounts. Assuming you have the functions assumerole() and droprole() pre-defined in your shell, you might use something like the below:
It's elegant in its own way, but it's complicated and it's slow - and have you ever tried it on 175 AWS Accounts? Have you tried wrapping it all in GNU Parallel just to get a result before tea-time, which failed or wasn’t quite right, leaving you to modify it and try again?
Orgtomate is the solution to the problem statement and, with the power to make changes as well as read data, it’s an answer to other problems, too.
Installation – Orgtomate is available from npm Registry https://www.npmjs.com/package/orgtomate and can be installed globally with npm or Yarn, e.g., npm install -g orgtomate
Documentation – https://bjsscloud.github.io/orgtomate-js
Collaboration – https://github.com/bjsscloud/orgtomate-js