Reduced balances

Incident Report for Alpaca

Postmortem

What happened

We were experiencing hardware failures in our underlying cloud infrastructure, which was mostly invisible due to planned redundancy in our architecture. However, the underlying job scheduler did not recover its internal state properly and when it recovered from the failure, it started to rerun one of the jobs that is originally scheduled before trading hours. This job is responsible for updating cash balances and equity data from the ledger system. Part of its processing, it resets these values to 0 in case of no records for a given trading day. Since the job was running unexpectedly during trading hours, it started to zero out accounts mistakenly.

We detected the discrepancy within a couple of minutes and interrupted the unexpected process, and started to recover the proper account data from the ledger system.

Remediation

Although these failures are unlikely to happen again, we are evaluating several options to make our services more resilient from such failures:

Upcoming versions of Kubernetes (base infrastructure component in our system) will come with an improved support of job status tracking (to prevent such failure scenarios)
We are evaluating a more robust workflow engine
We are evaluating to introduce further failure prevention logic to avoid running such jobs during trading hours

We sincerely apologise for the inconvenience this issue may caused to you and we remain committed to continuously improve our services.

Posted Aug 22, 2022 - 15:56 EDT

Resolved

This incident has been resolved.

Posted Aug 19, 2022 - 16:14 EDT

Monitoring

We have identified the cause of the issue and have implemented a fix. We are monitoring for any lingering issues.

Posted Aug 19, 2022 - 15:38 EDT

Update

We are continuing to investigate this issue. We don't currently have a timeline for when this will be resolved.

Posted Aug 19, 2022 - 15:27 EDT

Investigating

We are currently investigating an issue which has reduced account equity balances for Trading and Broker API accounts. It doesn't seem to be currently affecting Broker Sandbox. We are looking into it and working to resolve it as soon as possible.

Posted Aug 19, 2022 - 15:20 EDT

This incident affected: Live Trading API (Account API, Orders API).