Google recently announced that they’d done a front-to-back implementation of IPv6, using engineers’ spare time, in 18 months. Cue well over 100 comments on slashdot claiming that this goes to show how hard implementing any sort of v6 service is at all, given it takes a company known for hiring smart people as long as 18 months.
I decided to put the timetable to the test. On Wednesday at 2:30PM uk time, I applied for a /32. One hour later, we were allocated 2a02:c30::/32. I straight away assigned a /48 for our network infrastruture, and another for our production hosting lan, another for our development hosting lan. From these /48s, several /64s were reserved, one for router loopbacks, another for point to point links, more for individual hosting applications. An hour later, this was implemented on our network – routers had loopbacks, and a v6 IGP was up and running, and working. I filed a ticket with our upstreams, and the first announcement was turned up minutes later – check BGPlay for exact times. Around 2 hours after making our application to RIPE, we were participants on the IPv6 internet.
Now this is not a front-to-back implementation, but in just two hours we had something to hand over to our systems teams for testing and training. If you rely on the internet for your business, this is the stage you need to get to urgently. In fact, by close of play Wednesday, some of our simpler services were already running dual stack, and additionally we are now running Dual-stack for DNS – Nominet having the simplest method for adding ipv6 glue.
Full disclosure: In reality our v6 rollout started months ago, by monitoring advice on operational mailing lists, attending v6 seminars, and in fact we had been engaged in the rollout of ipv6 on several customer networks to date, so this rollout was not frightening for us.
We are now in the position that we can integrate IPv6 addressing as part of every configuration refresh or maintenance on our services, so that v6 is rolled out in a controlled, monitored, and careful manner. By moving now, we have bought ourselves time – a luxury, and firms waiting longer to start their v6 rollout will have a harder time, with the whole migration feeling like a ‘y2k bug’.
I complained on December 10th 2008 that The Internet was broken for 4-byte ASN speakers. Rob Shakir, Jonathan Oddy, and I have been researching in detail the mechanism by which a faulty announcement by an end-site network in the Ukraine was able to break BGP (the protocol that glues different networks on the internet together, one of the most significant building blocks of the internet) for hosts that supported ASN4 – the evolution of the protocol to support ‘large’ AS numbers (unique network IDs).
Some history in very brief terms – all networks on the internet that participate in BGP need a number to identify themselves. On the public internet, this number normally needs to be globally unique. The number can be between 1 and 65,535, and we have close to 50,000 of these numbers in use. To grow past this number, the BGP standard needs to be modified. The modification is described in a document called rfc4893, and this document was accepted by the community last May.
The first incarnations of router software that support these large AS numbers is now circulating. Due to flaws in the standards that exist in January 2009, if you install one, you may become disconnected from the internet.
Why? Some more background, first: BGP allows for large networks to configure ‘hints’ in their router configuration, by dividing their network into several small networks (confederations). The information about the ‘virtual’ divisions of the network should be removed from the BGP messages which are sent to other networks, but if a network supports large ASN in some parts, and not in others, the routers in the legacy part of the network may not know to test the ‘large number’ section of a BGP message for the presence of an internal confederation ID. The standard tries to take this into account by explicitly forbidding that confederation ID be passed between networks in the asn4 part of the BGP message.
However, should this occur by accident, what are the effects? Well, elsewhere in the Large ASN standard, it states that the connection between two networks should be severed if Confederation ASN appear to be leaked in the ASN4 part of the BGP message. This means that networks which do not understand large ASN can forward a broken message to a network which does understand large ASN. At which point the network which does understand large ASN should tear down the session.
Since this message can be delivered over a transit session, this means the receiving ISP loses their connection to the internet via that ISP. If it learns the router over every ISP, then the network can lose its connection to the internet entirely.
The message that I reported was leaking in December is still leaking. AS196629 (AS3.21 in legacy asdot notation) is announcing to AS35320, who are not stripping their confederation information from the large-asn section of BGP messages. If you learn the prefix via AS196629′s other transit, AS6886, then you are fine. If you learn the prefix via AS35320, you are (today) receiving a broken message.
We tested out how Cisco IOS is coping with this broken message using the first generation of code for the cisco 7200 router that understands ASN4. We peered the router to NetSumo‘s research and development network, AS15653. Cisco honours the standard/rfc, and breaks the session. Since it learns the dirty message on the transit session, the router disconnected our test network from the internet entirely :
If you work in this field, I implore you to read the more thorough analysis on the nanog list, and participate in the discussion to work out how we should correct the standard, to allow routers to behave differently when a dirty message is received. If we do not, then there is a simple, easy to understand, and easy to implement mechanism to break the internet, as soon as networks upgrade to the current version of their routing software.
Not trying to point fingers or name-and-shame, just to raise the profile of a nasty little bug handling breaches of RFC4893. This post is basically shaped from a message I posted to nanog earlier.
AS196629 (3.21 in asdot) announce 91.207.218.0/23. Experienced eyes will notice that this is quite a large as number. It’s a ‘new’ 4-byte ASN. When an OpenBGPd speaker with 4-Byte ASN support receives the update for this message, the session is torn down with the daemon logging a ‘fatal error’. Why?
OpenBGPd is checking AS4_PATH to ensure that it contains only AS_SET and AS_SEQUENCE types, as per RFC4893. When processing the UPDATE for 91.207.218.0/23 it sees :
91.207.218.0/23
Path Attributes – Origin: Incomplete
Flags: 0×40 (Well-known, Transitive, Complete)
Origin: Incomplete (2)
AS_PATH: xx xx 35320 23456 (13 bytes)
AS4_PATH: (65044 65057) 196629 (7 bytes)
See the confederation ASNs in the AS4_PATH ? Thats forbidden :
To prevent the possible propagation of confederation path segments outside of a confederation, the path segment types AS_CONFED_SEQUENCE and AS_CONFED_SET [RFC3065] are declared invalid for the AS4_PATH attribute. RFC 4893.
The RFC does not suggest how to handle AS4_PATH violations, but if the bad path is learned on every upstream, this will cause a network with obgpd edges to disconnect from the internet…. Modifying the OpenBGPd software to permit AS_CONFED_SEQUENCE, AS_CONFED_SET in an as4_path causes the path to be accepted and the session is not torn down. This isn’t a great fix.
The impact today is fairly limited as there are relatively few bgp speakers honouring the 4-byte ASN protocol extension rules, but as code that support these features creeps around the internet, the next time this happens the impact could be much greater, so we need to understand which implementation of which BGP software caused this illegal origination.
From a software point of view, I want to see a configurable option to reject the route but keep the session, reject the route and drop the session, accept the route but log/send trap, etc.
In any case we need to publish the arrangement that has led to this mistake so that other networks using the same toolset to originate prefixes can avoid the same situation happening. I have made contact with an engineer at the NOC who are investigating.