Understanding BGP session failures in a large ISP


The current global Internet routing frequently suffers from cascading routing changes and slow routing convergence. Such instability can significantly affect the performance of real-time Internet applications such as VoIP, multimedia conferencing and online gaming. One major cause of routing instability is the failure of BGP peering sessions, but there has been little understanding of the factors that contribute to the failures of operational BGP sessions. In this paper, we present a systematic study on the failures of a few hundred BGP sessions using data collected in a tier-1 ISP network over a 9-month period. We first quantify the impact of the session failures on both the control plane and the data plane. We then use syslog events to identify the direct triggers of session failures. Furthermore, we use several heuristics, including link failure information, session down time and traffic level, to identify the root problems that led to these session failures. We found that the major root causes are administrative session resets and link failures, each contributing to 46.1% and 30.4% of the observed session failures. © 2007 IEEE.

Publication Title

Proceedings - IEEE INFOCOM