Developing software in the face of uncertainity

April 04, 2024

Introduction

This is the advice I wish I had been given when I first started writing code so that other developers can avoid years of incredible pain. My coding journey began at the age of 12, before I knew about UML diagrams or engineering processes, Scrum or Kanban. Those days were filled with wild hacking, cobbling things together until it became such a ball of mud, no mud is too kind; a ball of shit is more accurate. Eventually I would be forced to start a new project as the old one became immovable and the end result was unusable.

In college and my first few years as a pro developer I liked to design and draw UML diagrams, and other formal specification artifacts. I would spend whole days on designs. I respected people who would come up with designs that I could then go implement. I thought I was finally doing "true" engineering, rigorous development; science. Eventually I became as frustrated with this approach as I did with the wild hacking. Somehow this seemed to be creating just as big of a ball of mud as the first method? Why? The reason, I think, was some piece didn't work as I assumed it would and so I had to go back and rethink giant chunks that relied on that one assumption and use it in a completely different way. Often I didn't even realize I was making these assumptions. Days later, unit tests rewritten, code re-thought and docs updated, only to discover another assumption was wrong as well. Except now there is not enough time to do the repairs, stakeholders are getting impatient because they haven't seen anything and this needs to get done now. So, despite my best effort I created a ball of mud anyways. This was doubly frustrating. I had done a large amount of work upfront to avoid the ball of shit and failed miserably.

Over the years I have found another way, a process for exploring designs without relying on hidden assumptions, based on years of training in both technical and creative disciplines. I form well defined questions to answer with code. Questions like, "how does this API work?", "How do I return a list of x, given a list of y? How do I draw images to the screen?" I then pick the code I am most unsure about, the riskiest and toughest part of the project and write the smallest amount of code needed to answer that question without complicating factors like testing infrastructure, databases and UI windows. Once I have little pieces of code that answer specific questions outside the context and complications of the larger program, keeping track of what I have learned then, and only then do I put them together with a well thought out design. With this method I have found you can quickly create software with less bugs, and I even gasp started to finish sprints and side projects on time. This method works because code does not lie, code gives you something concrete to work with instead of trying to imagine what it might look like one day.

Scope of the process

This process is not for every situation under the sun. There are jobs to be done out there with technology we already know about, where the scope is small and the change is well defined and there isn't much question about how it should work.

The advantage of this process is most apparent when at least a few of the following conditions are met;

You have the technology picked out, but you don't know what the end result will look like.
You don't know the algorithm to use.
You are uncertain about how an API works.
You've never used a certain API before.
The entire program itself is brand new

Or said another way, when a large portion of the problem in front of you requires a large amount of creativity to solve.

The Process at a high level

This process was developed over time using several sources of inspiration. The creative and technical disciplines I have learned have given me unique lessons on how too approach difficult problems that require creativity and ingenuity to solve.

Any drawing course will tell you don't "overwork" an area too soon. If you do overwork too soon the work will come out misshapen because you don't know where any of the pieces fit yet, so it's too difficult to size appropriately. When I first heard this piece of advice I could think of a multitude of examples during software development where I had overworked an area before I knew how other pieces were going to fit and it ended up feeling very lopsided, both when writing code to go on top and when using the end result. Something didn't quite fit. The APIs were difficult to change. The performance was not good enough for a great user experience.

As developers we are often given vague requirements from a user, and can sometimes be left with feeling of "oh shit, how do I build this thing? And what do they actually want? shrug I think this is what they meant?" I went from dreading this feeling to relishing it, by embracing the philosophy of other disciplines that also dealt with abstract domains and very large questions that the pro's didn't know how to solve, physics and mathematics. When physcists and mathematicians are faced with a very large question they will come up with "simplifying assumptions", that make the problem tractable. Physcists will assume no friction, mathematicians will aim to prove something about not any polynomial but about a specific type of polynomial. Only after the case with the simplifying assumption is analyzed will they move on to handling more details and more general cases.

The final piece of inspiration is from Pixar. Pixar has a motto when creating their movies; its better to have something real to fix than to have something perfect in your head. So if you find your first ieration sucks that is great news. Because now you have something real to fix! Even the desire to fix something is indicative of having something of high enough value to be worth fixing. Otherwise you could just leave it and you wouldn't care. The very fact you care means its providing some value to somebody. If more time had been spent drawing diagrams and writing docs, the software would have had less time being proven out by real customers in real use cases.

From these pieces of inspiration the main ideas of the process were born;

Build everything from end to end, so we don't overwork the front or the backend and have everything come out lopsided.
Simplify the discovery on how each piece should work by making simplifying assumptions that reduce the number of cases we needed to handle. One effective method of reducing the number of cases is to forget about cases that cause errors. Or cases that seem a little strange, or are difficult to come up with. When you see such a case that you don't want to forget about you can always make a task or a story in devops or Jira to go back and fix that case later.
Try to get something in front of stakeholders as soon as possible. Note, for those of you whoe like me are in a highly regulated, complex domain where we can't just throw spaghetti at the wall. This does not mean fully released. A demo to a key stakeholder counts, such as the product owner or a sales manager or even just QA counts.
Then and only then should we add complications in, one at a time. The complications we choose to deal with, and in what order they are dealt with should be decided by either the user or a representative of the user.

The steps of the process are;

Generate initial questions; usually by brainstorming then going back over the questions and refining until you have something an intern could answer.
Simplify the question until it's something a motivated intern could answer.
Write code that JUST answers the question and no more.
Document your lessons; ideally in an integration test, but a shareable OneNote page or a devops task or a Jira ticket can work

if a good integration test is infeasible at this time.

Potential Blockers to using the process

There are two big issues I have seen with people trying to use this process for the first time. One is people tend to get bogged down in trying to handle errors too soon. Don't do that. Error cases are almost always tangential to the problem at hand, they don't illuminate anything and they have a limited number of ways they'll be handlded;

An exception crashing the program.
A message in a log.
An error message in a window displayed to the user.

Another issue is with testing. The issues with testing can broadly be split into two categories. Either no automated tests are written at all every change neeeds to be retested, which becomes tedious and error prone pretty quickly. Or people write a bunch of unit tests really early and end up rewriting unit tests over and over. You need to ensure this does not happen. The idea here is to remain flexible in case the definition of correctness changes as you are writing the code, but to defend against regressions when tacking on new cases or new pieces in an automated way. Snapshot tests are great for this purpose. Snapshot tests let us define what the output should look like without getting too specific. They are not effective for correctness but they are really effective to defend against regressions.

Example Problem Statement

The rest of this post is an example of how I applied this process to develop some code that I knew nothing about before diving in. The project was successful and the project was used to take some pretty cool pictures by myself and my wife.

The software I wrote helps with long exposures. Long exposures are a giant pain in the ass, because you can't see the end result until 15 minutes in. So it could be 15 minutes just to figure out a camera setting is wrong and your photos are green. However long exposures create beautiful breathtaking photos such as;
.

The main intuition behind the program was that one long exposure is equal to a bunch of short exposures. Then by combining a bunch of short exposures and showing the photographer the intermediate result they could get feedback sooner. Combining a bunch of short exposures is nothing new and is known as photostacking. The problem is without a program like the one I wrote, you have to photostack in photo editing software once you get home and not while you're in the field with the camera in front of you. I also wanted to be able to blend two photos together so that you didn't need to have a subject sit perfectly still for 15 minutes while capturing the world moving around them.

Things I knew for sure before diving in

I wanted to use C++ because Rust doesn't have nice UI's for what I wanted to do.
C++ had nice bindings for OpenCV
C++ had QT a nice mature framework and I didn't really feel like playing with frameworks.

Initial set of Questions

How do I control the camera?
Is it true that a large number of short exposures equals one large exposure?
How do I belend two photos together seamlessly, under two different lighting conditions?
How do I stack the photos in an efficient way?
How do I draw to the screen using QT?

The astute reader will notice these questions have one big selling feature. Each question has nothing to do with the others. For example Controlling the camera is a different issue from stacking and poisson blending is different from controlling the camera. A big team could take each question and answer them individually without relying on one another. I have found this to be a good litmus test to determine if you at a decent level of grandularity, if questions depend on each other you probably need to make them less granular. A side effect of having you questions defined like this is your design will come out being decoupled as well, because each piece was made to work without the other pieces being there.

Testing Strategy

I did in fact practice what I preached and applied snapshot testing. Apply some sort of processing over the a set of pictures then use OpenCV to print the result. Manually tell if they look good enough then; put them in "target images" and compare to that. It is not difficult to imagine that we might get approval from somebody like a product owner or a domain expert to ensure we are on the right track. I liked this testing strategy because it is simple to understand covers a lot of code with very little test code.

Answering the questions

Controlling The camera

The question of how I would control the camera was done in two commits; https://github.com/seadavis/PhotoC/blob/6481e7b29749eec46a396ca5d8465c080f527fb2/main.c And; https://github.com/seadavis/PhotoC/commit/896eb9889ba62292905d8218306ed85e17c295af#diff-a0cb465674c1b01a07d361f25a0ef2b0214b7dfe9412b7777f89add956da10ec. The acceptance criteria for the corresponding user story was simply, can take a picture and a preview with the program and display it to the screen. Nothing about stacking or saving or sizing or any of those finnikcy issues. This was just about discover.

You will notice at this point I only have one file. There's nothing in here that's extensible we don't have a good class structure. At this point key pieces of infrastructure are missing that I'll add in much later things like CMake and a build script, automated testing. Without excess infrastructure I can focus on the algorithm that needs to get written, I avoid the risk of over engineering entirely and I can start to feel good about the progress I make instead of getting bogged down in minute details. Here we can't really automated test this since it requires a USB. So I just drop it entirely, until a good place to reveal the tests is figured out.

Drawing images to QT

This was answer in one Commit; https://github.com/seadavis/PhotoC/blob/9028be672e8727f76e9a5865b8f5c0fde69e3334/main.cpp. The acceptance criteria was "

Here I'm using source control to act like a journal. And I erased the other portions because it's completely irrelevant to the question of how do I display photo. Complications like the photo's coming from a camera are absent as they are irrelevant Now when I go to put the two together I have a point of abstraction; mainly the idea of an image. As can be seen with this commit; https://github.com/seadavis/PhotoC/commit/b342a53463fc65b7b5a5b8006d417c8476e707c6#diff-8ba8edb3dd24d08481efa83edb4bd0279b6c61aa66c2fd3ef06642a82f2aa895.

Heck when I go to move the camera stuff into it's own library; https://github.com/seadavis/PhotoC/commit/864df28be0767cc19070c19715adc6393936108e#diff-2bb8def89d3344dddedf8972976fa28ec2d48a9f463ed4196118723e3202d825, the code form the first commit stays pretty intact, which tells me that the first commit was actually pretty close to a good abstraction and more importantly it actually works. This brings me to another benefit of this approach; if a stakeholder asks to see progress I can actually show them something after a day or two. From the stake holders point of view this is a huge win and they'll be much more likely to invest in something

Blending

When it came time to start blending I dropped the camera portion entirely to focus solely on the problem of blending two images together. For this I settled on using poisson image editing, as the algorithm was fairly well explained with a decent code example.

In the first commit; https://github.com/seadavis/PhotoC/commit/5812810395d9c314792dc0d71efa2ab2276ce4b2#diff-e37b1f8a7fbafcedc4239544672ee4975703d539629f6048310c79c5ec527b9d, I stub out where I want the actual algorithm to go. Now, the project is a little bit more mature and I have some infrastructure in place so I can start to use things like CMake turn them into seperate libraries. And we have the first hints of it working; https://github.com/seadavis/PhotoC/commit/e772484b9fb3d066ff25447427ca7bf05cb15377 Notice that I don't worry about every possible category of image. Instead I add images in one at a time and handle the errors as they come up until I am satisfied that the software is robust.

It isn't until I am deep into the project that I really start worrying about error cases; https://github.com/seadavis/PhotoC/commit/3a5777c60684934bc76c87320a671a49ae813334. But by this point error cases are built around the project. And it isn't until wayyy later that I start adding abstractions; https://github.com/seadavis/PhotoC/commit/078183c4b1902ca0e0a0df270db3a01f770d542b. By the time I do get around to adding the abstraction they actually have a purpise. I found that these abstractions felt very natural and didn't get in the way of the essential algorithm the code was trying to express.

Photo Stacking

Now this is where things get interesting. I want to see if the approach will work, but at this point in the project there is a lot of infrastructure. I can't just plop the stacking algorithm anywhere and then spend a lot of time coding boiler plate just to find out the algorihm fundamentally doesn't work and rethink thing. To solve the problems I write the basic algorithm in a unit test; https://github.com/seadavis/PhotoC/commit/8ee816e47bee54e524380bd62f8341c009846456#diff-5b7e79fbd4d12fc3a63bb0bd63850d3f2e69f0b171d53c4479c78451032dedbc

There are a few things missing like, controlling the camera and parallel processing of the camera. However these are things I can tackle with some grinding. The big thing I was uncertain of is if stacking would work.

Indeed the algorithm did not work right off the bat. I thought averaging each pixel would work and it did not. Instead what did work was the idea of a "brightest pixel". You can see early "brightest pixel" experiments here; https://github.com/seadavis/PhotoC/commit/50ad070832eab9f163133b62dc451b41d91f332b#diff-137017cff9006843eee7c5b32e7a21f768a09b0f446d0dc575faa2c59310274e

Connection to Scrum and Other Processes

Essentially forming the basis of a Scrum iteration to be hourly and having you as the reviewer;
- Think; Fractally nesting the ideas in Scrum.
If you're finding your team cannot finish in one spritn, this can be applied to sizing stories as well.
Throw out the errors and focus just on what you neeed.
Then each person can take a different error case

Final Thoughts and Further Applications

This is a very good process that I apply to a lot of new development. It can answer questions very quickly, and it is very simple to use. It also enhances Scrum. This will increase transparency and enable experimentation. Transparency is improved, since everyone can see what is implemented play with what is implemented and, with integration tests see what cases are currently covered by the code. If you kept track over what is not done yet by making new stories then at the sprint review the team has the opportunity to debate whether or not an edge case need be done, or if it can safely be ignored for this version. This process enables the experimentation part of Scrum by giving everyone guidelines over what the experiments should look like. Finally, this process enhances the fast feedback of Scrum if you follow the sizing guidelines given in this post. If the team thinks it doesn't have enough time for review and other activities that need to be fit into a Sprint, one good way to trim down the story so it does fit is to eliminate error and pathological cases, or focus on one key component of the implementation then review and talk about that instead of the full thing.

This process is not perfect and there are a couple of pitfalls to watch out for. First, at some point things do need to be put together. Overusing the process does end up with pieces that are too disconnected and very difficult to put back together. As long as you apply this when the circumstance are appropriate, I think you'll find this is a very good method for software development.