Elevated API errors

Incident Report for Ryft

Postmortem

Summary

The root cause of the incident was due to a faulty configuration update during a deployment.

This lead to a period of time whereby the deployment partially served traffic prior to being classified as unhealthy.

The impacted API resources were as follows:

Timeline

The erroneous deployment went live at 5:21pm UTC. Live traffic was switched to the new instances at 5:23pm.

On-site developers noticed elevated errors originating from the new nodes at 5:25pm and initiated a rollback at 5:30pm.

The rollback was completed at 5:49pm and saw an instant reduction of the errors introduced by the previous deployment.

The total impact time was approx 25 minutes.

What are we doing about it?

Developers have introduced additional measures to detect faulty configuration updates. These steps will prevent bad configuration being deployable going forward.
Improvements to our rollback policies will ensure a more timely rollback in the future
The team will make adjustments to our rolling deployments whereby live traffic will be served for a longer period of time prior to being switched over to the latest deployed instances. This gives a larger window of time in which bad updates can be detected and averted before impacting our customers.

Posted Nov 25, 2025 - 17:03 UTC

The incident has now been resolved.

Posted Nov 24, 2025 - 17:55 UTC

The elevated error rates lasting approx 20 minutes have now been resolved.
We apologise for any inconvenience caused

Posted Nov 24, 2025 - 17:51 UTC

The issue has been identified and a fix is being implemented.

Posted Nov 24, 2025 - 17:51 UTC

We are currently investigating this issue.

Posted Nov 24, 2025 - 17:40 UTC

This incident affected: Core Products & Services (Payments API).