Using automation effectively to enhance your business
Oct 10, 2019 - Brian Jones
At AlasConnect we have embraced a software development culture which allows us to create and deploy relatively bug free software with a high level of confidence. As a result, we spend less time on fixing bugs and manually deploying software, which provides more time for building features our customers love.
The following are some anecdotes about AlasConnect’s use of software development automation which may serve as a guideline to the reader for further insights into the topic.
These days software development isn’t just about the code written to solve a business problem (though a focus on business problems is critical). Excellence in software development also requires strong technology operations. Building proper environments, robust infrastructure, extensive test pipelines, and automation around the software are also primary criteria for success. This topic is typically encompassed with the relatively new term “DevOps”, however, it should be noted that this is a loosely interpreted philosophy and not a job title.
As the timeline for delivering business features shrinks (and competition increases), be it new features or maintenance, the chance significantly increases that a developer will inadvertently introduce a bug which can detract from the team shipping value added features. Worse, a critical web page not loading, due to a bug or improperly deployed software, can potentially cost a business thousands of dollars per minute of revenue on a high traffic website. A team’s best course of action to mitigate this risk is to use modern day tools and practices which help them avoid introducing buggy software into production, and automate away the tedious repeatable steps which are highly prone to error.
A prime example where lack of proper infrastructure management failed was Knight Capital back in 2011, a company who lost $440 million dollars in 45 minutes due to 1 of 8 high speed trading servers being configured improperly. DevOps, as a practice, started gaining traction as a serious methodology around the time of this highly visible software failure and this even is one of the myriad of scenarios that prompted better infrastructure and deployment management within the software development community.
In a report by Tricentis, the industry reportedly lost an estimated $1.1 trillion dollars due to software failure in 2016. While it is hard to pinpoint where exactly in the process a high number of failures are introduced, at the very least we can work towards reducing the chances of a failure occurring.
End to End Automation
While it is impossible to completely eliminate the chance of introducing costly bugs into your software, there are a handful of ways to mitigate the risk posed by software changes. Part of the equation is having a proper development and user testing culture built around the entire software development process, and another part of the equation is using automation tooling to reduce the chance of human error, especially for tasks which are easily repeatable and manual in nature.
The following describes some of the tools and processes we use at AlasConnect to help deliver quality software to our customers.
Distributed Version Control Systems like git allow developers to track their code changes over long periods of time, make branches from the primary development code base to quickly test and iterate on new features, and cleanly merge the changes back with little hassle. More importantly it allows multiple developers to collaborate on large and complex projects, giving them the ability to cleanly share and merge changes between each other.
If a bug is discovered a team can go through the code history for their project to pinpoint the exact point in time where the breaking change was introduced and quickly fix it. If git is used on one of the 3rd party collaboration websites such as GitHub, the code can also be linked to issues (tickets) which were created before the code was written, acting as documentation that gives the developers a higher probability of understanding why code written in the distant past came into existence.
Infrastructure as Code
A phrase which gets commonly thrown around in technology circles these days is “infrastructure as code”. In essence what this means is that you encode your actual infrastructure (networking, servers, OS configuration, application deployment) in human readable configuration languages, which in turn can be executed repeatably to build the same exact environment for your software over and over again.
At AlasConnect we use Chef for server provisioning, however, other tools such as Ansible and SaltStack are popular as well. Terraform allows us to spin up different network and server VM architectures, either on our virtualized data center servers, or remotely on cloud services such as Microsoft Azure, Amazon Web Services (AWS), or Google Compute Engine (GCE).
Most importantly, the configuration files/code which we write to build these environments goes through the same exact process as our software, ultimately ending up in a git code repository which acts as living documentation as to how the entire server infrastructure is currently configured.
Even more interesting, what traditionally took weeks to build and configure can now be built up and torn down dozens of times per day at the push of a button (or better yet by an automated platform). Further, this provides unique opportunities for dynamic scaling of services based on utilization and demand, made possible by cloud infrastructure, which were difficult to achieve in traditional enterprise Data Centers. Our ability to quickly iterate on not just code, but server and network infrastructure as well, enhances the pace at which we provide value to our customers.
The end goal of automating our infrastructure is to give us the ability to consistently deploy the same exact server and network configurations in a reproducible and repeatable fashion. Regardless of whether we are deploying to dev or prod, Azure or AWS, our software will continue to run as intended and with no second guessing.
Two heads are better than one.
A code review is the process in which peers on a team read and give feedback on code written by their teammates.
When a developer is satisfied with their new code, they push the changes to a repository such as GitHub and open what is commonly referred to as a Pull Request. This process attaches their code to a reviewable issue for the purposes of merging it into the primary code base. This Pull Request (or PR for short) is then scrutinized by the other members of their team, ideally people with domain knowledge of the problem being solved. Before the changes are merged, they are carefully analyzed for correctness, readability, style, documentation, and whether they have accompanying tests. Reviewers may comment on each individual line of code, thereby providing direct feedback to the original developer in the form of suggested changes. A feedback cycle develops wherein the developer iterates on their work until the reviewers are satisfied, at which time the PR is approved, and the code is merged into the primary code base.
This process adds an extra level of scrutiny on a code base long before deployment into production.
Without proper tests all is lost. Even with the most modern and rigorous statically typed programming language capable of catching errors at compile time, there is always a chance that you will ship code with a flipped equality check, incorrect domain assumption, or complete oversight.
At AlasConnect we bundle tests along with our code commits which verify that the business logic, the core chunk of code which actually manipulates data that the user interacts with to achieve their goals, gets ran in an automated fashion and helps prove that our software works as intended.
Running these tests locally isn’t enough though, because there may be other in-flight changes from other developers which can’t be predicted. How do we guarantee that the final version of the code is always properly tested and functional?
This leads us to Continuious Integration and Continuious Deployment (CI/CD) pipelines.
The first half of a good DevOps process is to add CI (Continuous Integration), being a system which automatically reads code developers check in to a git code repository and attempts to build it, run associated tests, and trivially run the application to check for configuration errors, etc. If anything throughout the whole process is wrong the build is marked as failed and the developer notified that their changes did not work. They can then iterate on and push their changes until the CI build is successful.
The second half of the process is CD (Continuous Deployment), in which the system automatically deploys properly built and tested software to the relevant development, testing, and production environments without any human intervention. As part of this process various other tasks are completed:
- Running any database migrations to catch the database schema up
- Reconfiguring servers and services (infrastructure as code)
Other miscellaneous tasks such as notifications or data migration scripts
- Note: In practice deploying to production may have a manual human step, since you may want to do user acceptance testing in your testing environment first, and then deploy the user verified release build later. That said, there are a lot of variations to this methodology, some of which are fully automated end to end.
The key takeaway is that this CI/CD architecture allows a team to quickly iterate on their software and get those changes into production on the very same day, with a high level of confidence that the software functions as intended due to automated testing, config checking, etc. Long gone are several month build and release cycles.
Other automation ideas
With the power of a fully configurable and automated pipeline the list of things a team can do is limitless. The following are just a few such ideas:
- Code linting and style consistency
- Documentation generation
- Static site deployment (such as this blog itself)
- Server and network configuration
- Infrastructure and application security analysis
- Integration and microservice architecture testing
- Dynamic Application Security Testing (DAST)
- Automated compliance reporting
- Artifact generation and publishing (i.e. docker containers, deb/rpm/tar.gz)
There are numerous ways to approach automation, and a lot of problems that can be solved by automation. AlasConnect attempts to leverage automation to reduce the time invested in manual tasks, and more importantly to reduce the chances of people introducing errors into our customer’s software solutions.
The lower the chance of bugs being introduced into our end to end software processes, the higher the chance of long term business success, whether it is our customer’s or our own.
AlasConnect is a technology support and consulting business that helps many organizations leverage the benefits of automation to improve their outcomes. If you are facing challenges scaling your enterprise software systems or just need an outside perspective, please contact us. We offer an array of services from initial assessments to full scale CI/CD deployment to help you bring your software platform to market faster and improve your user experience.