Trading and Broker API Partial Outage
Incident Report for Alpaca
Postmortem

Market Data Unavailability

Affected APIs: Market Data API, Trading API, Broker API, Trading Dashboard

Executive Summary:
At 16:58:30 EDT (~1 hour after regular market session close), our alerting indicated services dependent on our real time market data service, including many Trading and Broker API endpoints, were unable to process requests due to the unavailability of our internal price caching service. Alpaca Engineering quickly determined that our primary public ingress/egress connectivity from our Market Data systems at Equinix NY2 was unavailable due to an upstream provider issue. At 17:05 we began our playbook to failover streaming market data access to a private interconnect but while executing the playbook to bring our Broker and Trading services to a fully operational state. While executing the failover work, our alerting notified us that the public egress upstream provider connection came back up and all services resumed normal operation by 15:18:00 EDT.

Further Context:
While most of our infrastructure is hosted on Google Cloud Platform (GCP), our real time market data services are hosted on-premise in a private cage at a data center in Seacacus, New Jersey. Due to the time sensitive nature of asset pricing and also because we receive raw SIP Market Data via UDP multicast, we need to be directly cross connected to official Exchange data providers. Our market data center is connected to our cloud infrastructure via redundant private interconnects however we currently consume streaming market data for our Trading API and Market Data API over the same public egress that services our Streaming Market Data offering. As noted above, we do have the ability to consume market data directly over private interconnect but this currently requires manual failover procedures. Introducing graceful automatic failover is currently being tested in our staging environment and was to be provisioned imminently.

Next Steps:

  1. Provision the service that automates failover between public and private connectivity to our streaming market data center. This will ensure our Trading and Broker APIs. This will be completed by December 10, 2021.
  2. Add multiple distinct + redundant public connectivity providers to ensure our Market Data Streaming API is highly available.
  3. Export metrics from our on-premise routers to our centralized metrics and alerting system.
  4. Improve upon our Status Page’s component monitoring to automatically alert on streaming data regressions.
Posted Dec 07, 2021 - 00:12 EST

Resolved
This incident has been resolved.
Posted Dec 06, 2021 - 17:18 EST
Update
We are continuing to investigate this issue.
Posted Dec 06, 2021 - 17:05 EST
Investigating
We are currently investigating this issue.
Posted Dec 06, 2021 - 16:58 EST
This incident affected: Live Trading API (Account API, Orders API, Positions API, Assets API, Trade Update Streaming).