Elevated API errors

Incident Report for Ryft

Postmortem

Summary

The root cause of the incident was due to a faulty configuration update during a deployment.

This lead to a period of time whereby the deployment partially served traffic prior to being classified as unhealthy.

The impacted API resources were as follows:

  • v1/payment-sessions

Timeline

The erroneous deployment went live at 5:21pm UTC. Live traffic was switched to the new instances at 5:23pm.

On-site developers noticed elevated errors originating from the new nodes at 5:25pm and initiated a rollback at 5:30pm.

The rollback was completed at 5:49pm and saw an instant reduction of the errors introduced by the previous deployment.

The total impact time was approx 25 minutes.

What are we doing about it?

  • Developers have introduced additional measures to detect faulty configuration updates. These steps will prevent bad configuration being deployable going forward.
  • Improvements to our rollback policies will ensure a more timely rollback in the future
  • The team will make adjustments to our rolling deployments whereby live traffic will be served for a longer period of time prior to being switched over to the latest deployed instances. This gives a larger window of time in which bad updates can be detected and averted before impacting our customers.
Posted Nov 25, 2025 - 17:03 UTC

Resolved

The incident has now been resolved.
Posted Nov 24, 2025 - 17:55 UTC

Monitoring

The elevated error rates lasting approx 20 minutes have now been resolved.
We apologise for any inconvenience caused
Posted Nov 24, 2025 - 17:51 UTC

Identified

The issue has been identified and a fix is being implemented.
Posted Nov 24, 2025 - 17:51 UTC

Investigating

We are currently investigating this issue.
Posted Nov 24, 2025 - 17:40 UTC
This incident affected: Core Products & Services (Payments API).