The How Itself

Marcel Sanders
Oct 22, 2021
11 min read

We've spoken of vision, cadence, and strategy - but we have not yet talked through building out a picture of how to actually accomplish any of these things with code and tools. That then is the subject of this post. This question - how to build something - is one of my favorites because of just how rich the question is. How you build something is about more than just the ingredients required to build to your use cases, it is also requires that you think about the developers themselves so that you can understand what kind of design and tools will enable them to build well and build efficiently. It's about understanding how they'll debug issues, transition new features from development to production, deal with failures, reuse and recycle work, and just generally be creative in their problem solving. Because of this, thinking through the how is as much an exercise in thinking about humans as it is thinking about the features themselves.

Being such a rich problem, it also turns out to be a huge space to work through and solve. Like any other parts of our cartography, carefully thinking through the how is no trivial feat. So, as with all of our other big problems, we'll work through it by breaking it into smaller and smaller pieces until we have a clear hierarchy of questions that need to be answered. Let's dive in!

Three Phases, One End

The end aim of all of this is to answer a seemingly simple question - how will I accomplish the things I've set out in my vision. Yet this sense of "accomplish" really divides into two kinds of thing. First is material or logical, literally what will be in the final solution. Think of this as the logic required to make things work, the scale needed to make things work effectively, security requirements, etc. On the other hand there is a human side of "accomplish" that also needs to get sorted out. Every feature we build is going to start in a development phase and end up in production. Not only do these two realms have completely different properties and requirements to be sound and efficient but there has to be a bridge between them that gets managed as well. Thinking through each of these phases is therefore also a part of "accomplishing" our use cases. Therefore we can start by breaking things out into these four categories - the material along with the three phases - development transition, and production.

The Material

Within the material, everything starts with the abstract-logical question of how any of this will actually work. Think of the logical as primarily about information and resource flow. Given the series of requirements built out in our vision, how will we actually build out resources and hook them together in order to achieve those requirements? This will build out a skeleton of a design for the material that we can start layering specific tissues onto.

Once we have the logical flow sorted out, we next need to consider the physics of the situation. How much information is flowing around? How many API calls do we expect? What are we looking at in terms of resourcing in order to meet this demand? How much money do we actually have lying around. Put another way, this stage is all about attaching numbers to everything we possibly can. In the logic stage we qualified, now we quantify. With this quantification in hand we'll know what we need to supply what we're going to build.

So far we've spoken about the system when things are going well. It's therefore a logical step to now consider what happens when they don't. To me there are two kinds of "bad day" that can arise - one coming from chaos and the other arising from malicious intent. Chaos is all about the unexpected - you have a server that goes out, the data for your model suddenly changes, an ETL doesn't run for some unknown reason, you get more load than you were ever expecting to have to deal with. These are things that arise from the ether of randomness rather than any kind of intent. To make sure your design can deal with chaos you have to think about what can go wrong and then design safety mechanisms in your product to handle it.

On the other hand there's the bad day that comes from malicious intent - someone intentionally trying to break or bring down your system. These really are matters of security and protection and cannot be ignored when building out a design. Consider here the ways in which others will try to wreck you and build accordingly. Develop defensively.

At this point we've got a pretty great sense of the material side of things. We've been able to describe things both qualitatively and quantitatively while also having a clear idea of how to deal with situations where things go wrong. So, with the material side sorted, let's move onto the first of our phases - development.

Enabling Development

This phase is all about the question - how will I build it? In other words, this part is all about making your developers as effective and efficient as possible at being creative. In my experience this breaks into three parts - simplification, scalability, and iteration.

Simplification

Your developers are going to be dealing with large, complex systems. For the same reasons that we divide and conquer with the rest of this stuff, you're going to want to ensure your developers are dividing and conquering as well. As a general rule, the simpler you can make the pieces you're working on the better those pieces will be and the better the whole will be as a result. Therefore the very first question you should ask yourself is how do I make my system as modular as possible.

Once you've broken your problem into smaller parts, the other thing you want to consider to simplify the tasks ahead is what can I borrow rather than build? What tools can already fill out some of the modules (or perhaps cover broad swathes of them)? What projects can I borrow ideas from? What tools are going to make my developers lives easier? How can I build reusable components across modules and projects? If you can turn large parts of the problem from things that need to be solved into things that need to just be integrated you'll have greatly simplified the space your developers are trying to work in thus allowing them to use their time much more effectively.

Scalability

Related to simplification is the notion of scalability. You want to design your projects in such a way that you can scale across many developers. The more people you're able to pull on the greater your theoretical efficiency and development speed. Modularization and simplification helps here but there are some other things to be considered as well.

First, how hard would it be to train up someone new? I've often seen developers scoff at documentation until they become overwhelmed and need help. Then suddenly they wish they had more documentation because they don't have enough time to even teach new people how the code works. Documentation is essential to ensuring that you're able to bring new people on quickly and get others using and developing your projects.

The second issue is version control. When you have a lot of cooks in the kitchen there needs to be clear tracking of who's done what and when and what's the agreed upon source of truth. Without this, chaos ensues and good code, ideas, and data end up getting lost or corrupted. Version control is key to managing large projects with many developers. Solving the version control problem though is not always as simple as choosing a technical solution like GIT. Often times there are other things that need to get version controlled like data or documentation, and questions about how to move in and out of production need to get answered as well. So generally speaking, go through all the artifacts and resources you have and create a clear strategy for version controlling them all.

Iteration

Every creative process is a process of iteration and development is no exception. The faster you can iterate the more ideas people will explore and the better the results will be. It can also be the difference between development being a boon or a bane. So it follows that ensuring fast and easy iteration is pretty essential.

The first thing we have to ensure to even allow for iteration is a development environment. This is an environment people can either set up or have quick and easy access to that acts as a sandbox - putting all the tools and data they need in easy reach so they can spend their time being creative and not trying to wrangle everything into place over and over again. Think data sources, tools, debuggers, logging, etc. here.

With a development environment in place the next thing you want to ensure is rapid iteration. Think through all the things that would make an iteration cycle long and get rid of all of them. I say all of them because there's usually no real excuse for long iteration cycles. Data's too large? Create a sample subset or bring in big data tooling. The faster you drive those cycles the more creative and effective your developers can become.

All of our projects get founded on certain hypotheses and the list grows over time as the project gets larger and larger. Not all of these hypotheses can get held in our brains at all times and in order to ensure you're building something that's actually tenable you will have to test them. Just assuming your code works is insanity. Therefore you can either make the process of testing tedious and error prone or you can make it fast and standardized by building automated testing (think unittests and the like). Consider your suite of hypotheses, think about what can go wrong, and then build out a test suite that's automatic and has the highest coverage it can. Once again you have to tests things one way or another, it's just a question if it's painful and slow or not.

Alright, so we've now got a beautiful environment for developers. We've simplified the problem by making it modular and reusing and recycling as much as we can. We've made it scalable through simplification, version control, and documentation. And we've ensured a fast iteration cycle with a development environment, automated testing suite, and by smoothing out wrinkles that would otherwise slow us down. Now that we've got sweet features pouring out of our developers, we've got to get them into production. So let's move onto our transition phase.

Managing Risk

The production environment and the development environment are two totally different worlds. All of those unknown unknowns that have been lurking around are going to rear their ugly heads when we push to production, so the whole point of this transition stage is to try to gradually move from one world into the other so that things don't simply crash and burn. As a result there are going to be three things we want to consider here - stages, monitoring, and pushes and reversions.

The first is pretty obvious. We want to think about the whole suite of changes that make production different from our development world and we want to create a series of transitory stages that allow us to bring on these changes slowly rather than all at once. We also want to organize these changes from least to highest risk in our estimation so that we're managing the risk as well as we can.

Second these stages are going to do us no good unless we can actually monitor them. In the spirit of helping our developers be as creative as possible, we want to ensure this monitoring is automatic so that they don't have to be spending cycles re-monitoring each time. Not only will this reduce the long term workload but it will also mean that there's a smaller chance something gets missed. We also want to ensure that our monitoring makes debugging the problems we do find easy, as we need to be able to efficiently create a solution at the end of the day.

In a similar fashion we want quick and easy tooling for moving through the stages both in terms of pushing forward and reverting backward. If we do catch a problem we want to be able to move quickly and efficiently to remedy the situation and we also don't want pushing forward to be onerous either.

Alright so we've got a pipeline that's largely automated, gets us the monitoring we need, and allows us to manage risk as move from the development environment out into production. Our features our now making their way to our users, so let's think through the production environment.

Out in the Real World

Once what we have is out in production there's really only two things (generally) that we're worried about - are the assumptions we made about how the product would work actually true and are users having a good time with it? You might wonder why security or chaos is not also a concern but assuming you've designed well and thought these things through managing security and chaos is just another part of your assumptions so in terms of what to ensure we have in production it really comes back to user happiness and constantly checking assumptions.

For the latter it's all about monitoring. We want to have every assumption we have, every form of chaos, every form of security covered by thorough, automated monitoring that gives us everything we need to debug if something does go wrong. Without this kind of monitoring we are blind.

In terms of user happiness we want to think through two things. First, is our tool easy to use? Walk in the user's shoes and determine if the product actually does work as expected and has the documentation and guidance to make usability straightforward. Second, provide a mechanism for feedback and encourage it. Indeed there should be some way you know you'll get feedback from your users because without it you'll be blind to any issues (or strengths) in your product.

Production then is all about monitoring. Are things working as expected and what do the users actually think. With this monitoring in place you'll be able to detect issues and respond to them quickly. And obviously, make sure there's a means for quick reversions.

Tying It All Back Together

Phew, that was a lot. But we've now got our map of the how and the questions to answer to ensure we've got a good overall design:

Consider the material
1. Understand the qualitative/logical design by thinking through information and resource flow from beginning to end and everything in between.
2. With your qualitative skeleton in place quantify everything. Think both in terms of scale and cost and in terms of space and time.
3. Now consider what happens if things go wrong. Think through the chaos that can occur as well as how people with malicious intent may try to wreck your product. Design and build defensively.
Consider the process
1. Development - creatively building features
  1. Simplify the project
    1. How can you modularize into the least common denominators and build good interfaces between them?
    2. How can you reuse and recyle both your tools and externally built ones
  2. Design for development scalability
    1. Ensure ramp up time is short with great documentation
    2. Catalog all of your resources and artifacts and have a version control strategy for all of them
    3. Yet another reason for modules
  3. Drive fast iterations
    1. Think through how to build a development environment that allows people to focus on being creative rather than worrying about getting set up and hooked in
    2. Find every wrinkle that might slow things down and find ways to speed it up
    3. Build an automated testing framework
2. Transition - managing risk
  1. Catalog through the differences between production and development and break things up into stages that allow you to slowly increment the risk
  2. Build automated, thorough monitoring that gives you all the information you need to debug raised issues
  3. Ensure your push and revert actions are as easy and automatic as possible
3. Production - getting feedback and checking assumptions
  1. Build automated monitoring here too
  2. Ensure there's a mechanism for feedback from your users and pursue that feedback
  3. Think carefully through the usability of your product and iron out any wrinkles you find

With all of that now laid out in one place we can see just how non trivial putting together a complete, thorough design can be. If we don't answer the material questions we won't know whether what we're building will meet our requirements. Yet, if we do not answer the rest of the questions we put at ourselves at risk of low efficiency, unnoticed bugs, risky deployments, and unhappy users. Designing the how is as much about how we will build it as it is about how it will work.