Major technological shifts tend to have two effects: they level the playing field, while simultaneously requiring a new set of skills to fully take advantage of the new opportunities they bring.
That is particularly true for cloud-based Data Warehouse (DWH) solutions. On the one hand they come with a suite of integrated tools that are designed to simplify data ingestion and transformation. They also eliminate the need for major up-front architecture or licensing investments. At the same time, to utilize their full potential in more complex situations (high data volumes, tight time constrains) custom scripting is often necessary – with Python being the preferred language for many.
This case study covers how we created a custom-scripted Python solution to achieve what would not have been possible with the existing tools in an architecture that combines cloud-based DWH platform with an on-prem BI component, for our client Exante.
Redesigning Exante’s Data Analytics Ecosystem
Exante is a global investment company that provides direct access to the financial markets through client-centric trading and investment platform, as well as other solutions. Established in 2011, Exante now encompasses more than 30 locations and offers more than 1’000’000 instruments to its growing client base.
We started our cooperation with Exante in 2022 with a goal to redesign data analytics ecosystem from the ground up. The project started with a pre-study and interviews of the main stakeholders. These discussions gave us input to prepare potential architecture scenarios that would fulfill Exante’s needs in the most efficient manner. At the end of the pre-study Exante chose to go to cloud with data warehouse but stay on-prem with BI. Hence the unique technology stack – Google Cloud Platform (GCP) in combination with Power BI Report Server and on-prem SQL Server Analysis Services (SSAS).
As one can imagine, a global trading platform like Exante generates a massive volume of transactions each day, and the business needs timely access to this data for operational and analytical purposes. Consequently, full nightly loads were out of question, and we had to design a solution that would facilitate incremental loads form the PostgreSQL source database multiple times per day. Simple business incremental logic would not suffice here, so we turned to PostgreSQL Write-Ahead Logging (WAL) files to incrementally track and load all changes – including updates and deletes - in the source.
Our initial idea was to use a tool called Airbyte for this purpose. That, however, turned out not to be feasible as it came with too many limitations for our setup. Data load speeds were still too slow as it didn’t offer any parallel processing, and adding extra tables was too cumbersome as it would trigger full loads not just for the newly added tables but for all tables.
Creating a Tailored Data Solution with Custom Python Scripting
This is where Python knowledge was paramount to our success. In short, we created custom Cloud Functions. WAL files, in json format (via WAL2JSON plugin), were stored in Cloud Storage. From there, they were imported already in a tabular format to pre-staging and staging areas on GCP. As a reference, with this custom approach we were able to get the load time of ~1mil rows (300MB json) down to about 3 minutes.
But we didn’t stop there. Since Exante chose to go with on-prem BI component – PowerBI Report Server and on-prem SSAS - we extended the load mechanism to also cover SSAS. This ensured that only partitions with changed or deleted data would be refreshed in the SSAS models.
The primary motivation behind this solution was to significantly speed up the load process, but it was not the only benefit. GCP offers a pricing mechanism that is based on on-demand compute utilization. Thus, significantly reducing data volumes to be transferred with each load (full vs incr.), we also achieved considerable cost savings.
After successfully completing the initial project in 2022 and setting up the new architecture, we have continued our work with Exante and completed numerous other projects, addressing a range of business needs: Finance, HR, Customer Support and Jira analytics, to name a few. But that is a story for another day…