I complained on December 10th 2008 that The Internet was broken for 4-byte ASN speakers. Rob Shakir, Jonathan Oddy, and I have been researching in detail the mechanism by which a faulty announcement by an end-site network in the Ukraine was able to break BGP (the protocol that glues different networks on the internet together, one of the most significant building blocks of the internet) for hosts that supported ASN4 – the evolution of the protocol to support ‘large’ AS numbers (unique network IDs).
Some history in very brief terms – all networks on the internet that participate in BGP need a number to identify themselves. On the public internet, this number normally needs to be globally unique. The number can be between 1 and 65,535, and we have close to 50,000 of these numbers in use. To grow past this number, the BGP standard needs to be modified. The modification is described in a document called rfc4893, and this document was accepted by the community last May.
The first incarnations of router software that support these large AS numbers is now circulating. Due to flaws in the standards that exist in January 2009, if you install one, you may become disconnected from the internet.
Why? Some more background, first: BGP allows for large networks to configure ‘hints’ in their router configuration, by dividing their network into several small networks (confederations). The information about the ‘virtual’ divisions of the network should be removed from the BGP messages which are sent to other networks, but if a network supports large ASN in some parts, and not in others, the routers in the legacy part of the network may not know to test the ‘large number’ section of a BGP message for the presence of an internal confederation ID. The standard tries to take this into account by explicitly forbidding that confederation ID be passed between networks in the asn4 part of the BGP message.
However, should this occur by accident, what are the effects? Well, elsewhere in the Large ASN standard, it states that the connection between two networks should be severed if Confederation ASN appear to be leaked in the ASN4 part of the BGP message. This means that networks which do not understand large ASN can forward a broken message to a network which does understand large ASN. At which point the network which does understand large ASN should tear down the session.
Since this message can be delivered over a transit session, this means the receiving ISP loses their connection to the internet via that ISP. If it learns the router over every ISP, then the network can lose its connection to the internet entirely.
The message that I reported was leaking in December is still leaking. AS196629 (AS3.21 in legacy asdot notation) is announcing to AS35320, who are not stripping their confederation information from the large-asn section of BGP messages. If you learn the prefix via AS196629′s other transit, AS6886, then you are fine. If you learn the prefix via AS35320, you are (today) receiving a broken message.
We tested out how Cisco IOS is coping with this broken message using the first generation of code for the cisco 7200 router that understands ASN4. We peered the router to NetSumo‘s research and development network, AS15653. Cisco honours the standard/rfc, and breaks the session. Since it learns the dirty message on the transit session, the router disconnected our test network from the internet entirely :
If you work in this field, I implore you to read the more thorough analysis on the nanog list, and participate in the discussion to work out how we should correct the standard, to allow routers to behave differently when a dirty message is received. If we do not, then there is a simple, easy to understand, and easy to implement mechanism to break the internet, as soon as networks upgrade to the current version of their routing software.
The behaviour of Asterisk has been altered since 1.4.21, possibly in error, with regard to answering calls from call queues.
There is a feature that requires agents to press # when they are ready to speak to a caller. Since we forward calls to agents via their mobiles, rather than auto-answer calls in a desk environment, we disabled that feature with ackcall=no in agent.conf.
After upgrading to 1.4.22 we see this configuration is nolonger honoured. Diffing chan_agent.c between version 1.4.21 and 22 shows a new section of code saying (in English) that ‘if there is no per-channel override specified in the dialplan, default the configured variable’ (line 2048). I looked at where the default was read from the config file and it looks like a lot of different chunks of chan_agent want to set the ackcall default!
The bug shows up in the asterisk console as :
– Agent/xxx is ringing
– SIP/voip-out-081e5a88 is making progress passing it to Local/447xxxxxxxxx@uk_all-a667,2
– SIP/voip-out-081e5a88 answered Local/447xxxxxxxxx@uk_all-a667,2
– Local/447xxxxxxxxx@uk_all-a667,1 answered, waiting for ‘#’ to acknowledge
The workaround is that the only safe place to set the default ackcall behaviour is for each channel in the dialplan. If you want to disable the ‘waiting for ‘#’ to acknowledge’ behaviour, configure your dialplan as such :
exten => 1701,1,Answer()
exten => 1701,n,Set(AGENTACKCALL=no)
exten => 1701,n,Queue(noc|r|||40)
exten => 1701,n,Voicemail(xxxxx)
exten => 1701,n,Hangup
This is a response to Lee Dryburgh’s article on Skype. We had a debate on Twitter, but I have not yet mastered the art of debate in 140 characters!
Lee’s premise is that “Certainly Skype is not a walled garden. All things being relative, it’s certainly not overly closed either.” Lee claims that the accusations of closeness are unfair, because they are levied by commentators who advocate SIP based addressing and dialing rather than any other system.
This is not my premise. I claim that Skype is closed because calls are signalled and completed using protocols that are entirely secret as a matter of policy. Skype’s founder presented at Spring VON 2007 and stated that if Skype did not keep their protocols entirely secret, then Skype would be full of spam and attack like email is. I think this is a poisonous claim, telephone networks have been interconnecting around the world since telephony was conceived. By not allowing telecoms firms to interconnect between the skype namespace and other networks, Skype have prevented openness to develop and maintain a monopoly position. That’s perfectly acceptable business, but it is not in the slightest bit open.
Randy Bush googled Walled Garden for a recent presentation and found this cartoon. I like this definition because it’s correct. Is Skype a Walled Garden ? Lee says a Walled Garden is a commercial restriction, for example, “sharing of ringtones via Bluetooth, using WiFi from a PDA, having access to all Web sites“. I think that only allowing interconnection with the purchase of an upgrade like SkypeOut is a restrictive or practice that suggests Skype is a Walled Garden. Worst of all a call between two VoIP networks using this method requires default PSTN routing, which harms signal quality, and prevents the expansion of next-generation services such as Wideband/High Definition audio.
The meshing of networks, whether they are traditional voice or IP networks, leads to higher audio quality and increased reliability. Keeping telephony systems and protocols secret in order to prevent meshing may well be a viable business model, but it is not an open business model.
Not trying to point fingers or name-and-shame, just to raise the profile of a nasty little bug handling breaches of RFC4893. This post is basically shaped from a message I posted to nanog earlier.
AS196629 (3.21 in asdot) announce 91.207.218.0/23. Experienced eyes will notice that this is quite a large as number. It’s a ‘new’ 4-byte ASN. When an OpenBGPd speaker with 4-Byte ASN support receives the update for this message, the session is torn down with the daemon logging a ‘fatal error’. Why?
OpenBGPd is checking AS4_PATH to ensure that it contains only AS_SET and AS_SEQUENCE types, as per RFC4893. When processing the UPDATE for 91.207.218.0/23 it sees :
91.207.218.0/23
Path Attributes – Origin: Incomplete
Flags: 0×40 (Well-known, Transitive, Complete)
Origin: Incomplete (2)
AS_PATH: xx xx 35320 23456 (13 bytes)
AS4_PATH: (65044 65057) 196629 (7 bytes)
See the confederation ASNs in the AS4_PATH ? Thats forbidden :
To prevent the possible propagation of confederation path segments outside of a confederation, the path segment types AS_CONFED_SEQUENCE and AS_CONFED_SET [RFC3065] are declared invalid for the AS4_PATH attribute. RFC 4893.
The RFC does not suggest how to handle AS4_PATH violations, but if the bad path is learned on every upstream, this will cause a network with obgpd edges to disconnect from the internet…. Modifying the OpenBGPd software to permit AS_CONFED_SEQUENCE, AS_CONFED_SET in an as4_path causes the path to be accepted and the session is not torn down. This isn’t a great fix.
The impact today is fairly limited as there are relatively few bgp speakers honouring the 4-byte ASN protocol extension rules, but as code that support these features creeps around the internet, the next time this happens the impact could be much greater, so we need to understand which implementation of which BGP software caused this illegal origination.
From a software point of view, I want to see a configurable option to reject the route but keep the session, reject the route and drop the session, accept the route but log/send trap, etc.
In any case we need to publish the arrangement that has led to this mistake so that other networks using the same toolset to originate prefixes can avoid the same situation happening. I have made contact with an engineer at the NOC who are investigating.
Yesterday I gave a talk to Sheffield GeekUp on preparing enterprises for IPv6 [download]. The premise of the talk was :
The advice I gave was :
My hope is that this talk is improved upon and delivered internationally to enterprises.
These are the slides that I presented at NANOG44 in Los Angeles on Sunday, “VoIP For Network Operators“.
This talk was for network operators looking to build voice segments of their network, and the slides cover
I tried to open some dialogue with colleague members of the ITSPA about Vodafone’s legal challenge to Ofcom’s two-hour number port ruling. Instead I got a number of offlist replies suggesting Vodafone’s challenge is still news to many in the industry.
Today, if you want to port your number from one service provider to another, it relies on two major coincidences – firstly that your old and new provider have an agreement in place to manage the technical transfer between the two networks, and secondly that your old provider remains fully willing to forward all calls destined from your old number, to your new service provider.
There are several issues with such a system – the first is that your old provider are still very much involved, so their technical or commercial failure causes a problem long after you have ported away, another is that the process is slow and manual, and a third is that not all service providers have agreements to permit number porting (called a Mutual Porting Agreement in the industry).
Vodafone are concerned about the costs of the new system, even though an industry group UKPorting has only just begun to gather information about how the system should work. I think that it’s a flawed premise to argue that a system is too expensive before a system is selected (and associated costs are announced). Instead Vodafone should get involved with designing a perfect system.
The UKporting system to facilitate fast, reliable, and simple porting must happen, and must succeed. We have to protect consumers who port their number from failures caused by their former service provider.
I am concerned that the system may mean all multihomed telephone networks will need to move to any all-call-query model that’s run by one natural monopoly. If a single entity holds the industry to ransom, we have not moved forward – there’s still a single commercial or technical position that can fail to break your port. The single All-Call-Query model also lends itself well to governments having access to a single point where recording of most call attempts can be made.