Courtney Bormann
Senior Manager, FinOps & Cloud Governance at General Mills
Sometimes cloud cost anomalies can appear from workloads involved with data analysis. Here’s a case of anomalous data analysis costs that occurred at General Mills and how we added means to better monitor this moving forward.
This was part of Courtney’s FinOps X ‘22 talk.
At General Mills, the team was working on implementing reporting with a 3rd-party Business Intelligence (BI) solution. While working on this level of informing the team, the datasets needed for this tool needed to be optimized.
During this development, on-demand instances of Google Cloud Platform (GCP) BigQuery services were spun up. These instances were not optimized and ran on this very large dataset, causing an anomaly.
The dataset being used for the BI reporting was a 64-TB dataset that was not partitioned or clustered. During the development of the report, this dataset was being reloaded and worked causing a very large spike in usage and costs.
During this development cycle, the team’s normal reporting caught this increase in usage within three days. This was also caught by the Google Team Manager and an email was sent to the General Mills team. This incident led to an increase of $88,000 USD over 3 days or approximately $30,000/day.
The team did not have any proactive alerting in place as the reporting was being worked on was the initial work for reporting. As a result of this anomaly, GCP quotas were implemented on the environment including a standard and exceptions level.
This allows for a daily limit of $5,000 in a 24 hour period per project with the ability to request exceptions for projects that need the ability to exceed their daily limit temporarily.