Replies: 2 comments 2 replies
-
Thanks @jakirkham! The incident is being tracked by the Anaconda Infra team since around the time you posted here, @barabo is the incident commander. @jakirkham Do you know if there were any unusual things happening on the conda-forge side at the time like larger mirror processes or similar? |
Beta Was this translation helpful? Give feedback.
-
Sorry for the delay on this reply. We had an incident Monday, and then again the following morning (but of lesser duration). The root cause of the outage was related to MongoDB. Our server clusters have always been configured for redundancy, with our primary serving all reads and writes. Lately that's been harder to maintain that setup (expensive read queries can affect writes), so we have configured the replica nodes to serve read requests, too. In the two outages Monday and Tuesday, we saw (in both cases) a very expensive read operation being performed on the master node start right before, and last throughout, the impacted server availability. We believe the expensive query was caused inadvertently by normal user operations, and we're looking into next steps once we've had a chance to evaluate how the new cluster configuration performs. It might not be an issue in the future, but we're looking into it in the meantime. |
Beta Was this translation helpful? Give feedback.
-
Am seeing degraded cloning with the conda-forge CDN. Currently ~25mins since last successful clone.
Also had a bit of trouble reaching anaconda.org that eventually cleared up. Not sure if that is related though.
Beta Was this translation helpful? Give feedback.
All reactions