Data & AI Success Story

Creating a hadoop-based big data system

A global agriculture commodities company wanted to develop a system for forecasting crop yields based on weather patterns.

Our client ran an existing process in Excel and VBA where Analysts would spend several days executing jobs to complete a single forecast for an entire region. This resulted in credible insights only being delivered towards the end of the crop season, leading to Traders having insufficient time to properly hedge against Government-issued forecasts.

We built a Hadoop-Based Big Data System hosted on Microsoft Azure Virtual Machines.

The BJSS platform acquired more granular data from various sources such as the NOAA, as well as automated ingest systems for forecast weather, and high parallelisation of the forecasting algorithms. The base forecasting algorithm was re-written into R and SparkR, with data held in Impala for quick interactive access.

The R version of the code identified several bottlenecks and streamlined the algorithm from approximately 12 hours per execution to approxima­tely 90 seconds. In addition to this, the algorithm was written in a parallelisable nature, using Hadoop Streaming, to allow for it to be executed across all cores of the Hadoop cluster simultaneously, with results being written back to ­­Impala and generating reports in dashboards in Tableau.

With the distributed nature of processing, our client was able to execute more granular forecasts at farm rather than field level, providing more accurate insights.

A 10,000-times improvement was achieved over the existing BAU process. This was accomplished by increasing the speed of data acquisition – capturing and processing several billion rows every day – and optimising the algorithm. The cost to the client of running each forecast fell from dozens of dollars to a few cents.

With the BJSS rebuild, the client increased its forecast granularity

While it delivered reductions in CapEx and OpEx spend, the BJSS delivery also provided the client with an important competitive advantage. This Big Data solution allowed our client to generate complete reports within 30 minutes of new forecast weather data being published, enabling profitable positions to be taken against official published forecasts.