Postmortem -
Read details
Mar 13, 12:40 EDT
Resolved -
This incident has been resolved.
Mar 12, 16:18 EDT
Update -
Operational cleanup of pending_cancel is still going on.
Mar 12, 15:40 EDT
Update -
The current phase is operational cleanup and client impact mitigation: normal order flow is working, and the focus is on systematically clearing pending replace/pending cancel states (including crypto) while continuing elevated monitoring.
Mar 12, 15:22 EDT
Update -
We are still doing cleanup of backlog of all pending cancel and pending replace orders.
Mar 12, 15:10 EDT
Update -
Our messaging system is operating normally as of now. Our team is now focused on cleaning up the remaining orders that were impacted during the disruption. We appreciate your continued patience.
Mar 12, 14:56 EDT
Update -
We are still working on the issue and monitoring system closely. We will write next update soon.
Mar 12, 14:37 EDT
Update -
The fix for the primary affected queue has been successfully applied , messages are flowing and the backlog is cleared. During this process, two additional queues were found to have similar synchronisation issues. Our team is now applying the same fix to these remaining components to restore full stability.
In parallel, we are evaluating longer-term improvements to our messaging infrastructure to prevent recurrence.
We will provide an update once all components are fully restored.
Mar 12, 14:26 EDT
Update -
The majority of services have been restored. Our team has identified one remaining synchronisation issue within the messaging infrastructure and is applying a targeted fix. As a precaution, affected services have been temporarily scaled down during the repair. A small number of orders that did not reach their intended venues are being re-processed. We will confirm resolution once this final fix is verified.
Mar 12, 14:13 EDT
Update -
After restart of problematic component in messaging cluster The messaging system has been stabilised and key trading services are back online and operating normally.A final reconnection step for our crypto trading service is currently being completed to ensure all systems are fully restored.
We expect full recovery shortly and will confirm once everything is back to normal. Thank you for your patience.
Mar 12, 14:01 EDT
Update -
The root cause of the instability has been traced to a specific node within our messaging cluster. A targeted restart of that component is being prepared and will be executed shortly.
In the meantime, the order backlog is actively being cleared , pending orders are decreasing and filled orders are increasing as manual re-routing continues.
The crypto exchange component restart has been completed successfully.
The next step to restore full system stability is the execution of this remaining component restart. We will confirm once it is complete.
Mar 12, 13:48 EDT
Update -
We are currently experiencing intermittent delays in order processing due to a system connectivity issue. Our team is actively working to resolve this. Here's what's happening:
1.Orders that are stuck in "pending" status are being manually processed and re-routed.
2.For any orders pending cancellation, we are coordinating directly with our trading partners to ensure they are properly canceled.
3.Crypto order processing is being restored as part of our ongoing recovery efforts.
Our engineering team has identified the underlying cause and is working on a permanent fix. We will share another update once the issue is fully resolved. We apologize for the inconvenience and appreciate your patience.
Mar 12, 13:38 EDT
Update -
While we were able to restart the main messaging infrastructure, we are currently managing some recurring connection instability and have identified a backlog of orders that have not yet been processed.
Our engineering team is fully engaged and focused on clearing this backlog and addressing the underlying causes of the intermittent system disruptions to ensure a complete and stable resolution. We will provide another update once the system has fully stabilised.
Mar 12, 13:25 EDT
Update -
This is due to a crash in our main messaging system (RabbitMQ).
We have restarted the messaging system and are now restarting the associated programs to re-establish stable connections. We are actively monitoring the system to ensure it is back to normal as quickly as possible.
Mar 12, 13:13 EDT
Identified -
We are still looking into the issue ,will provide next update shortly.
Mar 12, 13:05 EDT
Update -
We are continuing to investigate this issue.
Mar 12, 12:55 EDT
Investigating -
We are currently seeing intermittent issues with orders being filled. Our team is actively working on the issue with full attention.
Mar 12, 12:54 EDT