// you’re reading...

The 'net

Internet broken for ASN32 speakers today.

Not trying to point fingers or name-and-shame, just to raise the profile of a nasty little bug handling breaches of RFC4893.  This post is basically shaped from a message I posted to nanog earlier.

AS196629 (3.21 in asdot) announce 91.207.218.0/23.  Experienced eyes will notice that this is quite a large as number.  It’s a ‘new’ 4-byte ASN.  When an OpenBGPd speaker with 4-Byte ASN support receives the update for this message, the session is torn down with the daemon logging a ‘fatal error’. Why?
OpenBGPd is checking AS4_PATH to ensure that it contains only AS_SET and AS_SEQUENCE types, as per RFC4893.  When processing the UPDATE for 91.207.218.0/23 it sees :

91.207.218.0/23
Path Attributes – Origin: Incomplete
Flags: 0×40 (Well-known, Transitive, Complete)
Origin: Incomplete (2)
AS_PATH: xx xx 35320 23456 (13 bytes)
AS4_PATH: (65044 65057) 196629 (7 bytes)

See the confederation ASNs in the AS4_PATH ?  Thats forbidden :

To prevent the possible propagation of confederation path segments outside of a confederation, the path segment types AS_CONFED_SEQUENCE and AS_CONFED_SET [RFC3065] are declared invalid for the AS4_PATH attribute. RFC 4893.

The RFC does not suggest how to handle AS4_PATH violations, but if the bad path is learned on every upstream, this will cause a network with obgpd edges to disconnect from the internet…. Modifying the OpenBGPd software to permit AS_CONFED_SEQUENCE, AS_CONFED_SET in an as4_path causes the path to be accepted and the session is not torn down.  This isn’t a great fix.
The impact today is fairly limited as there are relatively few bgp speakers honouring the 4-byte ASN protocol extension rules, but as code that support these features creeps around the internet, the next time this happens the impact could be much greater, so we need to understand which implementation of which BGP software caused this illegal origination.

From a software point of view, I want to see a configurable option to reject the route but keep the session, reject the route and drop the session, accept the route but log/send trap, etc.

In any case we need to publish the arrangement that has led to this mistake so that other networks using the same toolset to originate prefixes can avoid the same situation happening.  I have made contact with an engineer at the NOC who are investigating.

Discussion

4 Responses to “Internet broken for ASN32 speakers today.”

  1. thanks for posting this – I was tearing out hair this morning trying to figure out what was causing the session to die every 10 seconds (serves me right for not staying current on NANOG lately). I am hopeful that this has been fixed in OpenBSD-4.4, but if not, I’m sure a patch will be forthcoming quickly.

    Posted by darkuncle | December 11, 2008, 1:48 am
  2. Hi, Darkuncle

    I had an email from the AS196629 NOC saying “WE have stopped sending anounces and will work with our uplink to solve the problem”.

    Best wishes
    Andy

    Posted by andy | December 11, 2008, 8:39 am
  3. Details regaridng what implementation is used by 196629 as well as 35320 would be useful, as both originating and propagating the invalid attribute is clearly a mistake. Possibly a classic interop issue that one implementation assumse no one will originate crap and the other assumes no one will propagate crap? This appears to be differrent prefixes than your NANOG post; I presume this is the actual root issue that appeared at first to be as65000 in-path?

    Cheers!

    jzp

    Posted by JZP | December 11, 2008, 2:57 pm
  4. No new info has come out from either of those networks yet (that I have seen), but 196629 did tell me they don’t use confeds internally.

    It was my second post of yesterday which detailed this error – the 65000 appearing in the dfz was throwing me off course at the start of the debugging.

    Posted by andy | December 11, 2008, 3:00 pm

Post a comment

You must be logged in to post a comment.