We were experiencing hardware failures in our underlying cloud infrastructure, which was mostly invisible due to planned redundancy in our architecture. However, the underlying job scheduler did not recover its internal state properly and when it recovered from the failure, it started to rerun one of the jobs that is originally scheduled before trading hours. This job is responsible for updating cash balances and equity data from the ledger system. Part of its processing, it resets these values to 0 in case of no records for a given trading day. Since the job was running unexpectedly during trading hours, it started to zero out accounts mistakenly.
We detected the discrepancy within a couple of minutes and interrupted the unexpected process, and started to recover the proper account data from the ledger system.
Although these failures are unlikely to happen again, we are evaluating several options to make our services more resilient from such failures:
We sincerely apologise for the inconvenience this issue may caused to you and we remain committed to continuously improve our services.