First of all, we’d like to apologize for the inconvenience this outage has generated to each of our impacted users.
Straight to the point: The outage we’ve just experienced today was due to the container fleet responsible for traffic ingestion being resource constrained. Due to the high number of users activating and using our instant preview features, the amount of entries on it had to keep track of skyrocketed. That led the containers to use much more memory than what it was allowed for. The Kubernetes, the container scheduler we use for this feature, kills off all pods that cross that resource allocation thresholds. And because of that, it ended up killing all the container fleet.
In order for this not to take place again, we’ve bumped the resource allowance for these pods in the cluster that ingests and process these requests and put up alerts for when those resource come close to their limits.
Thanks for using Forestry!
We ❤️ you