Green Software Development

    By Carl Austin and Ricky Barefield, CTO and Engineering Lead

    Carl Austin and Ricky Barefield

    The next blog in this series follows from on from green architecture, to discuss green software development – where designing and writing efficient code is critical to reducing energy use.

    Simplicity and Efficiency Trade-offs

    There are many trade-offs when developing software - an important one in green software development is between simplicity and efficiency. One we’ve seen examples of already in the architecture section, and closely aligned to the box-out on cost and sustainability. For example, consider two sorting algorithms - insertion sort and merge sort.

    Insertion sort takes each element of the unsorted list and moves it into the correct place in a new list. The code will be simple for most developers to understand. This will likely mean faster development, few functional bugs, and an easily maintainable piece of code. But insertion sort has a worst-case time complexity of O(n^2), meaning that as the amount of data increases, the time taken increases proportionally to the square of the increase in data. This can be highly inefficient in terms of running time and thus energy use.

    Merge sort, in contrast, is more complex to explain, involving splitting a list into several sublists, recursively sorting them, then merging the result. The code for this will take longer to develop, be more complex, harder to understand and maintain, and potentially more likely to contain bugs. But it has a time complexity of O(n log n), meaning the time taken to sort a list increases linearly with the number of elements to sort, resulting in less energy use, perhaps meaningfully so for large amounts of data.

    Which is the correct choice for software engineers is not straightforward. They will need to consider the system's non-functional requirements, the knowledge level of the developers likely to be maintaining the system, and the variance in data the system will process.

    Of course, this is a contrived example. It would be unusual in an enterprise setting to implement a sorting algorithm. Instead, the complexity tends to be hidden in a library that we expect to be well tested. However, in a particular domain, many comparable trade-offs can be made. It’s likely down to the developer to understand how to apply this trade-off, and their experience will play a significant part in the outcome.

    Over time and due mainly to Moore’s Law, these trade-off decisions have tended to favor simpler and quicker to write code as memory has become cheap and plentiful and processing becomes ever-more efficient. But it’s now time to revisit this, considering the knowledge that these decisions also impact our environment.

    Efficient Software
    Algorithms and data structures

    The example used above is one of algorithmic efficiency. Choosing more efficient algorithms for the expected data will likely have the most significant impact on carbon emissions at the lowest level of the stack.

    We do not always need the perfect result and can sometimes balance the quality of the outcome with the algorithm's efficiency. These are known as approximation algorithms and are very popular in some areas, such as route finding.

    Outcome quality vs. efficiency is also an essential factor in machine learning, where additional algorithmic improvement can come at the cost of exponential increases in model training time, energy, and emissions. It’s not unusual that even a small percentage increase in the performance of an already well-tuned model may require an order of magnitude more processing power.

    Hand in hand with algorithmic efficiency is choosing appropriate data structures. For instance, if, as part of an algorithm, it will be necessary to search for elements in a list, we may consider using collections based on hashing functions and ensure the hashing function has an appropriate distribution over the expected data set. This will reduce processor cycles as we will not need to traverse the entire collection.

    We may also consider an algorithm's trade-off between processing and memory usage. We may have opportunities to store derived data which will reduce duplicate processing. We may do this in the form of a cache or possibly at the database layer using materialized views.

    An algorithm's true efficiency will depend on the actual data it processes. Understanding the characteristics of production data is, therefore, vital. In particular, you need to understand the data variety and expected volume.

    Time efficiency ! = carbon savings

    We must also remember that while the time efficiency of algorithms and software carbon intensity are related, they are not always directly proportional. For example, greater parallelism can improve time efficiency, but greater parallelism means more energy use, which may not decrease our software’s carbon intensity.

    Software reduction

    Reduction in software engineering is the practice of reusing a library that solves a more general problem before reducing the result to what is needed rather than writing more specific new code. This approach is vital to productivity and sometimes security in enterprise engineering but does have a software efficiency trade-off.

    Again, the decision for the software engineer is to determine when the impact of reduction is too great. One way to help make this decision is to understand the library code and how it works and make profiling tools part of the engineering process. There is further discussion on the use of profiling tools below.

    Engineer for change

    As established, there are many trade-offs during development. Often it is hard to quantify a development decision’s carbon intensity impact. But many software development principles can make it easier to defer some of the decisions.

    Most fundamentally, the principle to separate units of code around responsibilities and reasons to change is commonly referred to as the Single Responsibility Principle or Composite Reuse Principle.

    Applying this principle will ensure that when an engineer does decide that a more efficient algorithm is needed, the existing algorithm is not intertwined with other code areas. If we know upfront that change is likely, it would also be wise to ensure you develop a specific interface for the behavior (following the Interface Segregation Principle). Perhaps the specific implementation can be quickly injected into the codebase using an Inversion of Control container or an Abstract Factory if we’re looking to keep things simple.

    Efficient Continuous Delivery
    Efficient integration and delivery pipelines

    The product running in production is not the only contributor to carbon emissions and, therefore, not the only opportunity for reduction. We must also consider:

    • Non-production environments, including various flavors of test environments
    • The integration and continuous delivery pipelines and processes
    • Development machines.

    In this article, we do not consider development machines further though these and associated peripherals, such as monitors, account for some of the carbon cost of delivering software.

    Environments

    What is necessary for a production environment may not be required for each environment on the path to live.

    Production aside, performance testing environments tend to be the only environments that need to mirror the scale. With the advent of cloud computing and Infrastructure as Code (IaC), it has become simpler to create standardized environments. With a little more intelligence in that code, we can consciously vary those environments and scale them for a given usage. For example, Terragrunt extends the out-of-the-box capabilities of Terraform to offer this flexibility.

    Given that IaC lessens the burden of environment creation (and destruction), we should also consider creating environments on demand and destroying them when we are finished.

    Phantom power usage is generally associated with equipment around the home that is on but unused, continually drawing energy but not providing benefit. This term is also relevant to the many cloud environments which run unnecessarily.

    Moving to an on-demand model can also remove process bottlenecks associated with pre-provisioned environments.

    Pipeline processes

    Another path to live carbon cost is in the processing associated with running pipelines. Again, machines doing this processing should be ‘right-sized’ and ephemeral where possible. Even better, you could switch to cloud-native technologies that can make hardware more efficient.

    As stated earlier in this article, the largest carbon saving is to ask the question as to whether the product or functionality is necessary in the first place.

    Applying this to pipelines, we should therefore question whether each build we run is necessary or whether we trigger the build automatically because we can.

    You can design the development team’s branching strategy to consider this and build automatically only for shared branches or when you raise a merge request.

    Build carbon efficiency into quality processes

    Understanding what makes efficient software and what makes efficient delivery pipelines is not necessarily enough. Many software engineers claim to have a good knowledge of software design and maintainability. Still, many software systems suffer from problems in these areas and much unknown technical debt.

    To address this, we follow quality processes and bake them into our delivery approach and culture. These should be updated to include carbon efficiency as a desired outcome.

    Code and design review

    At BJSS, we advocate thorough code and code design reviews. We prefer to shift this left where possible to avoid redundant work or the temptation to stay with a suboptimal solution.

    Such reviews should also consider the abovementioned aspects, such as algorithmic and API efficiencies, and the importance of ensuring that you size environment provisioning appropriately.

    Making technical decisions consciously and allowing scrutiny

    As the green architecture section discussed, you should document decisions using Key Design Decisions (KDDs) or Architectural Decision Records (ADRs). These needn’t be architectural in level but could be lower-level choices. While documentation is prudent, you must make these decisions consciously and with the best information available.

    ‘Cargo cult programming’ - the practice of blindly copying code or approaches without understanding - is all too commonplace in software development. Questioning such methods and insisting on the need to document key decisions can help avoid this.

    Questioning requirements

    In an agile environment, software requirements are not the preserve of business analysts. Engineers should be included in gathering and refining requirements and will bring a unique perspective.

    They should be encouraged to bring an environmental lens to those discussions and use their knowledge to make suggestions early in the process and throughout development to offer more carbon efficient or aware solutions.

    Sustainability debt

    Technical debt is inevitable on projects. We can accrue technical debt consciously or by accident. Perhaps we become aware of alternative, more appropriate technologies or were unaware of all the facts during the initial development of a feature.

    Given the difficulty of estimating carbon costs during development, we could choose to delay adding complexity – see engineer for change. If we do so, we must re-evaluate those decisions at a sensible cadence. The most natural way to do this seems to be to treat ‘sustainability debt’ as a type of technical debt.

    Once you’ve included this new debt, ensuring that your usual processes around technical debt management are robust is essential. You should evaluate, prioritize, and rectify that debt appropriately before the interest rate is overly onerous.

    Profiling tools and static analysis

    Depending on the system’s nature, performance, and load requirements, performance profiling tooling may be used during development or live running.

    These tools can be either intrusive or non-intrusive and can present information showing how load affects different parts of the system, highlighting bottlenecks and sub-optimal code. Although commonly run locally or in conjunction with a performance test suite in a dedicated environment, they can also continuously run during production and connect to automated alerting systems.

    Including these in your system and its associated delivery – even without stringent performance requirements - could have an environmental benefit.

    Similarly, and with less investment, static code analysis can often spot suboptimal code. Developers can use static code analyzers, integrate them with IDEs, or run them in integration pipelines, with results inspected in dashboards.

    There are some early attempts to include green considerations in static analysis. CAST GreenIT Index is an automated green source code measure. ecoCode plugin for SonarQube looks to codify 115 rules based on a French book on green web development, and the green software foundation has plans to develop solutions in this space too.

    It’s important to note that we do not claim to have tried any of these static analysis solutions; they are still very early in their maturity.

    Thanks for reading this blog on green software development. The next article in this series will explore green software testing, a subject that is less commonly discussed when compared to development.

    Sustainability-Lead-Magnet