Deployment - Consider the detailsBack to Unsung Developer Thoughts
This section covers the questions around the details of a deployment. Some questions may not apply to your application or a particular change. The goal here is to help you think about how your change gets deployed, what is impacted and to consider both the before and after deployment steps.
A change to the data model may require an update to the data itself. This consideration should occur and be documented so that your teammates (or future self) can review them later. This allows another person to look for errors in your logic. Additionally planning out what data needs to change allows you to practice a deployment if you can create a production-like environment.
An example is if you change a relationship between two models from a One to Many to a Many to Many, at some point that new intermediate table will need to be populated with the existing relationship data.
The other facet of this question is how much data has to change and how does it need to be incorporated into your deployment process. I’ve worked on an application that limited database migrations to one hour of run time. This was perfectly reasonable, but occasionally there would be a data migration that would take hours. To handle these cases, we had background worker update data as a part of the application’s flow rather than be a part of the deployment itself.
Finally, you need to keep in mind what your production database looks like. Your local and testing environments may very different from the production environment. If this is the case, and you’re unsure of how something will perform on production, create a temporary production-like environment to confirm before deploying to production.
While we strive for fully automatic deployments, inevitably there will be some manual steps. Perhaps it’s something easy like enabling a feature flag. Sometimes it may be a little more involved, such as creating a maintenance window for your alert system, or posting a notification for your users. Regardless of what the steps actually are, you should document and include them on the pull request. This helps your future self and any teammates understand the change.
If you have any amount of paying customers, you should make significant efforts to keep your system online. However, keeping your application scaled up during a deployment and hoping nobody will use or notice transient issues can be worse. Consider what changes you’re making and what the impact will be. Perhaps you’re doing something obviously difficult like migrating your database to another server or provider. Or maybe it’s something a little more complicated and rather than run a 10-day data migration, you’d rather have an hour of downtime during a period of low usage. In either case, you should be aware of the impacts and you should communicate those impacts to your users.
A relatively straightforward option is to include a notice on the application itself when systems are degraded. You may also consider notifying your customers directly if the issue will be particularly inconvenient for them.
Does your application have background workers that run independent of each other and the web processes? Or maybe you’re using a series of APIs as microservices. If the change impacts the interface between two services, it must be evaluated for impact. You need to determine if the change is backwards and forwards compatible on each end. You also should determine how much downtime is allowable or what would occur if there’s an incompatibility temporarily.
Sometimes interfaces have retry logic configured so that nothing is ever lost. In that case a temporary incompatibility may result in some noise, but no functional problems. In that case, perhaps you don’t need a mitigation strategy. On the other hand, if you can’t have any outages between services, you must find a way to remain compatible through the process of deployment. This can be achieved by including what version of the interface each side is using. Then update the consumer first to be forwards compatible with the publishers future change. Once that’s been rolled put, you can deploy the publishers changes. As you can see, sometimes a deployment is actually a series of physical code deployments.
This is a fairly technical question about databases. You may not need to know it, but knowing it won’t hurt you. There are some database changes that require locks on entire tables or other significant chunks of code. If that lock prevents reads and/or writes, you would be disabling your site for the time the lock is held.
A common example is adding a not nullable column to a table. This takes up a lock on the entire table while it validates that there all rows have a value for that column. The typical approach is to either add nullable columns, add the constraint in times of low usage or use a check constraint to prevent nulls from being written in the future.
There are probably other ways to manage this. I’m not a database expert. The point of this is to be aware of where you may lock up the database and to check in the cases where you’re not sure. When you’re not sure, search for your particular use case. It’s very likely it has been done before and published elsewhere.
If you have identified that a lock is being held, the next step is to understand the implications. Are you rebuilding a materialized view? Is it adding a unique constraint? Or maybe adding a multi-column index?
You need to determine how long the lock will be held so that you’re aware of how long the rest of your application will be stalled. This is where having a staging environment where you can work against a clone of a production database comes in handy. Sometimes you can take a whole table lock if it’s only held for 100ms. But if it takes ten minutes, you better find a different way.
When manipulating something in the database it’s worth being aware what that is in your production environment. Your local or development environment may be a fraction of the size of production. It may have tables populated in an entirely different ratio than production too. If your production environment has millions of rows, you need to keep that in mind when planning changes to those objects. You don’t need a perfect memory but a general sense of “oh that’s a big table, I should be cautious here”. The bigger the object, the longer things take to run and the worse performance is. You need to be aware of that to avoid releasing a flub.
If an object in the database is rarely used, you can be more aggressive about locks and updates to it. On the flip side, if the object is used constantly and is critical to your application, you need to be cautious. Most databases offer some type of analytics package to help determine this. Or you can add an Application Performance Monitor (APM). There are probably dozens of products out there dedicated to performance analytics. Find one that looks suitable to you and test it out. Then refer to the metrics occasionally to be aware of what’s going on.
[^1] Bonus, now you have documentation for your deployment process.