API Service Degradation

Incident Report for Alpaca

Postmortem

We are committed to providing a reliable and stable platform for our B2B partners. We sincerely apologize for the disruption experienced on December 1, 2025, which resulted in intermittent API service degradation. We have fully resolved the issue and restored all services to normal operation.

What Happened

At approximately 10:37pm ET on December 1, 2025, our internal monitoring systems detected an elevated error rate in our primary production API services.

The disruption was triggered by the unexpected failure of several core internal compute resources (nodes) within our cloud infrastructure. This failure led to the abrupt restart of various underlying financial services and associated processing jobs, which in turn caused the elevated error rate and connectivity issues for our clients.

Our engineering teams immediately mobilised to isolate the failure and restore system health. We quickly identified the cause as a resource scheduling constraint preventing new nodes from scaling up to replace the failed ones.

Impact

The incident was mitigated within 17 minutes.

  • API Connectivity: Clients experienced brief connection failures and elevated 5xx error responses when making API requests.
  • Transaction and Data Impact: Approximately 56,400 API requests failed during the brief period of the incident.
  • Crucially, we can confirm that no client data was lost, and all funds and transactions remain secure and properly accounted for. Any critical service components that experienced downtime were successfully restarted and recovered without data corruption.

Resolution

Our primary focus was on rapidly mitigating the service impact:

  1. Stabilisation and Service Restoration: The affected systems automatically restarted after new compute resources became available.
  2. Full Service Verification: We confirmed the complete resolution of the underlying node issues and verified the full functionality of our core API and trading systems.

Preventative Measures and Commitment

To prevent a recurrence of this issue and strengthen the resilience of our platform, our teams are prioritizing the following action items:

  • System Resilience & Monitoring: We will conduct a thorough review of the replication and redundancy settings for affected systems and dependencies. We are increasing the number of instances for some services to reduce the impact of similar failures.
  • Service Logging and Observability: We identified several logs that could be adjusted to accelerate diagnosing outages in the future and are making changes to them.
  • Minor Bug Fixes: We identified a few minor improvements in the way our services handled the failure and are updating those services, which should accelerate recovery in the future.

We are committed to maintaining the highest standards of reliability and transparency. We appreciate your patience and understanding as we continue to invest in our platform's stability

Posted Dec 02, 2025 - 13:24 EST

Resolved

The root cause has been identified as an issue with the cluster nodes. Our team will be conducting internal investigation into how we can make the system more resilient to these types of failures in the future.
Posted Dec 02, 2025 - 00:15 EST

Update

We see errors have dropped to normal levels.
Posted Dec 01, 2025 - 23:54 EST

Investigating

We are currently investigating an incident impacting our API services.
Clients may be experiencing issues with API requests, including failures or unexpected service restarts. This is due to the failure of several k8s nodes.
Posted Dec 01, 2025 - 23:52 EST
This incident affected: Live Trading API (Account API, Orders API, Positions API, Assets API, Trade Update Streaming, Fractional Orders) and Broker API (broker.accounts.get, broker.accounts.account_id.transfers.get, broker.assets.get, broker.trading.accounts.account_id.account.get, broker.trading.accounts.account_id.orders.get, broker.accounts.account_id.documents.get, broker.journals.get, broker.ledgers.get, broker.accounts.activities.get, broker.accounts.account_id.recipient_banks.get, broker.events.accounts.status.get, broker.events.trades.get, broker.events.journals.status.get, Instant funding & Settlements, Sweep and cash interest, FPSL Program, Funding Wallets, Assets, Calendar and Clock, Accounts Events (SSE), Journal Events (SSE), Transfer Events (SSE), Trade Events (SSE), Admin Action Events (SSE), Non Trade Activities (NTAs) Events (SSE), Funding Status Events (SSE), OAuth endpoint, Watchlist Endpoint, Corporate Action Endpoint, KYC Endpoint, Logos Endpoint, Reporting - Aggregate Position endpoint, Reporting - EOD Cash Interest Details, Portfolio History Endpoint, Rebalancer, Crypto Funding).