The internet is still broken, guys…
I complained on December 10th 2008 that The Internet was broken for 4-byte ASN speakers. Rob Shakir, Jonathan Oddy, and I have been researching in detail the mechanism by which a faulty announcement by an end-site network in the Ukraine was able to break BGP (the protocol that glues different networks on the internet together, one of the most significant building blocks of the internet) for hosts that supported ASN4 – the evolution of the protocol to support ‘large’ AS numbers (unique network IDs).
Some history in very brief terms – all networks on the internet that participate in BGP need a number to identify themselves. On the public internet, this number normally needs to be globally unique. The number can be between 1 and 65,535, and we have close to 50,000 of these numbers in use. To grow past this number, the BGP standard needs to be modified. The modification is described in a document called rfc4893, and this document was accepted by the community last May.
The first incarnations of router software that support these large AS numbers is now circulating. Due to flaws in the standards that exist in January 2009, if you install one, you may become disconnected from the internet.
Why? Some more background, first: BGP allows for large networks to configure ‘hints’ in their router configuration, by dividing their network into several small networks (confederations). The information about the ‘virtual’ divisions of the network should be removed from the BGP messages which are sent to other networks, but if a network supports large ASN in some parts, and not in others, the routers in the legacy part of the network may not know to test the ‘large number’ section of a BGP message for the presence of an internal confederation ID. The standard tries to take this into account by explicitly forbidding that confederation ID be passed between networks in the asn4 part of the BGP message.
However, should this occur by accident, what are the effects? Well, elsewhere in the Large ASN standard, it states that the connection between two networks should be severed if Confederation ASN appear to be leaked in the ASN4 part of the BGP message. This means that networks which do not understand large ASN can forward a broken message to a network which does understand large ASN. At which point the network which does understand large ASN should tear down the session.
Since this message can be delivered over a transit session, this means the receiving ISP loses their connection to the internet via that ISP. If it learns the router over every ISP, then the network can lose its connection to the internet entirely.
The message that I reported was leaking in December is still leaking. AS196629 (AS3.21 in legacy asdot notation) is announcing to AS35320, who are not stripping their confederation information from the large-asn section of BGP messages. If you learn the prefix via AS196629’s other transit, AS6886, then you are fine. If you learn the prefix via AS35320, you are (today) receiving a broken message.
We tested out how Cisco IOS is coping with this broken message using the first generation of code for the cisco 7200 router that understands ASN4. We peered the router to NetSumo’s research and development network, AS15653. Cisco honours the standard/rfc, and breaks the session. Since it learns the dirty message on the transit session, the router disconnected our test network from the internet entirely :
- *Jan 16 11:29:58.531: %BGP-5-ADJCHANGE: neighbor 193.239.32.2 Up
- *Jan 16 11:30:02.595: %BGP-6-ASPATH: Invalid AS path (65044 65048 65062) 3.21 23456 received from 193.239.32.2: Confederation found in AS4_PATH
- *Jan 16 11:30:02.595: %BGP-5-ADJCHANGE: neighbor 193.239.32.2 Down BGP Notification sent
- *Jan 16 11:30:02.595: %BGP-3-NOTIFICATION: sent to neighbor 193.239.32.2 3/1 (update malformed) 27 bytes E0111803 030000FE 140000FE 180000FE 26 FFFF FFFF FFFF FFFF FFFF FFFF FFFF FFFF 0050 0200 0000 3540 0101 0240 020C 0205 3D25 2114 89F8 5BA0 5BA0 4003 04C1 EF20 02E0 1118 0303 0000 FE14 0000 FE18 0000 FE26 0202 0003 0015 0000 5BA0 175B CFDA
If you work in this field, I implore you to read the more thorough analysis on the nanog list, and participate in the discussion to work out how we should correct the standard, to allow routers to behave differently when a dirty message is received. If we do not, then there is a simple, easy to understand, and easy to implement mechanism to break the internet, as soon as networks upgrade to the current version of their routing software.
Leave a Comment
Comments
Leave a Reply
You must be logged in to post a comment.