Scheduled IT jobs are automated processes that run on servers, often hosted in a data centres, which consume energy to carry out computation calculations, data processing and network transfers. This energy consumption has an associated carbon footprint which can vary based on the energy sources – or sources - being used to provide the power.
In BJSS’ work creating a data platform for the Retail Trust, a charity with the mission of creating hope, health and happiness for everyone in the retail industry, we are conscious that this platform has a carbon footprint. Our vision has been to ensure that we reduce the carbon footprint of our data platform as much as possible.
Retail Trust has a Databricks data pipeline hosted in Microsoft Azure that currently runs on a static daily schedule. The pipeline uses Delta Live Tables and Databricks Notebooks to ingest data from multiple external data sources, then processes and aggregates data resulting in a set of tables for consumption by downstream services.
Power BI reports are setup to refresh their data from the delta live tables daily.
What is Carbon Intensity, and why is this important?
Carbon intensity is a measure of the amount of carbon dioxide emissions produced per unit of energy consumed. It varies depending on the energy source, with renewable sources such as wind and solar having a much lower carbon intensity than fossil fuel sources like coal or natural gas.
For this to be useful though, future predictions on carbon intensity are needed. and these services can predict the carbon intensity into the future, based on an understanding of the weather.
The Carbon Intensity API is freely accessible without sign-up. Other services require sign-up and some require payment to use.
How we made a difference to the retail trust data platform
Retail Trust and BJSS decided to run a Hackathon to see if we could reduce the carbon footprint of both the Databricks refresh and the Power BI refresh.
We built a small API which, when called with the original refresh time and a future time window (for when the job must be run), would return the best time to run the job (the time with the lowest carbon intensity) and the percentage reduction in carbon this would achieve.
The API was deployed as a serverless (lambda) function using the serverless framework and deployed into an BJSS AWS instance with swagger endpoints made available for easy consumption.
A simple return of data shows the proposed time to run the job, and the carbon intensity at that time. It also returns the carbon intensity saving (31), and the percentage reduction in carbon emissions (25.2%).
Joining the services together
With an API that allowed us to easily calculate the best time to run a service, it was time to connect the services together by using a simple Azure DevOps Pipeline to orchestrate the refreshes by:
- Calling the BJSS API to find the greenest time to run the service
- Updating the Power BI Data Refresh Schedule
- Updating the Azure Databricks Refresh Schedule
- Publishing the reduction in carbon statistics to a Power BI Dashboard.
How much carbon can we save?
What we’ve been able to demonstrate from this is that it’s very much possible to reduce the carbon footprint of a data platform by running at the greenest time possible - when carbon intensity is at its lowest.
By following the technical steps outlined above, any organisation that has batch or scheduled jobs can retrieve real-time carbon intensity data, analyse the data to identify optimal scheduling times, reschedule jobs accordingly and monitor the impact.
We would encourage businesses to adopt this approach to make their IT operations more environmentally friendly and contribute to a greener future. And, as these approaches continue to advance, we plan to constantly monitor and review the carbon footprint of the platform we are building and make technology decisions that are as green as possible.
Find out more about how we can help your organisation better action your sustainability ambition at https://www.bjss.com/sustainability