// archives

Archive for 2009

The internet is still broken, guys…

I complained on December 10th 2008 that The Internet was broken for 4-byte ASN speakersRob Shakir, Jonathan Oddy, and I have been researching in detail the mechanism by which a faulty announcement by an end-site network in the Ukraine was able to break BGP (the protocol that glues different networks on the internet together, one of the most significant building blocks of the internet) for hosts that supported ASN4 – the evolution of the protocol to support ‘large’ AS numbers (unique network IDs).

Some history in very brief terms – all networks on the internet that participate in BGP need a number to identify themselves.  On the public internet, this number normally needs to be globally unique.  The number can be between 1 and 65,535, and we have close to 50,000 of these numbers in use.  To grow past this number, the BGP standard needs to be modified.  The modification is described in a document called rfc4893, and this document was accepted by the community last May.

The first incarnations of router software that support these large AS numbers is now circulating. Due to flaws in the standards that exist in January 2009, if you install one, you may become disconnected from the internet.

Why? Some more background, first: BGP allows for large networks to configure ‘hints’ in their router configuration, by dividing their network into several small networks (confederations).  The information about the ‘virtual’ divisions of the network should be removed from the BGP messages which are sent to other networks, but if a network supports large ASN in some parts, and not in others, the routers in the legacy part of the network may not know to test the ‘large number’ section of a BGP message for the presence of an internal confederation ID.  The standard tries to take this into account by explicitly forbidding that confederation ID be passed between networks in the asn4 part of the BGP message.

However, should this occur by accident, what are the effects?  Well, elsewhere in the Large ASN standard, it states that the connection between two networks should be severed if Confederation ASN appear to be leaked in the ASN4 part of the BGP message.  This means that networks which do not understand large ASN can forward a broken message to a network which does understand large ASN.  At which point the network which does understand large ASN should tear down the session.

Since this message can be delivered over a transit session, this means the receiving ISP loses their connection to the internet via that ISP.  If it learns the router over every ISP, then the network can lose its connection to the internet entirely.

The message that I reported was leaking in December is still leaking.  AS196629 (AS3.21 in legacy asdot notation) is announcing to AS35320, who are not stripping their confederation information from the large-asn section of BGP messages.  If you learn the prefix via AS196629′s other transit, AS6886, then you are fine.  If you learn the prefix via AS35320, you are (today) receiving a broken message.

We tested out how Cisco IOS is coping with this broken message using the first generation of code for the cisco 7200 router that understands ASN4.  We peered the router to NetSumo‘s research and development network, AS15653.  Cisco honours the standard/rfc, and breaks the session.  Since it learns the dirty message on the transit session, the router disconnected our test network from the internet entirely :

  • *Jan 16 11:29:58.531: %BGP-5-ADJCHANGE: neighbor 193.239.32.2 Up
  • *Jan 16 11:30:02.595: %BGP-6-ASPATH: Invalid AS path (65044 65048 65062) 3.21 23456 received from 193.239.32.2: Confederation found in AS4_PATH
  • *Jan 16 11:30:02.595: %BGP-5-ADJCHANGE: neighbor 193.239.32.2 Down BGP Notification sent
  • *Jan 16 11:30:02.595: %BGP-3-NOTIFICATION: sent to neighbor 193.239.32.2 3/1 (update malformed) 27 bytes E0111803 030000FE 140000FE 180000FE 26 FFFF FFFF FFFF FFFF FFFF FFFF FFFF FFFF 0050 0200 0000 3540 0101 0240 020C 0205 3D25 2114 89F8 5BA0 5BA0 4003 04C1 EF20 02E0 1118 0303 0000 FE14 0000 FE18 0000 FE26 0202 0003 0015 0000 5BA0 175B CFDA

If you work in this field, I implore you to read the more thorough analysis on the nanog list, and participate in the discussion to work out how we should correct the standard, to allow routers to behave differently when a dirty message is received.  If we do not, then there is a simple, easy to understand, and easy to implement mechanism to break the internet, as soon as networks upgrade to the current version of their routing software.

Preventing Mailman annoyances

Inspired by TheHodge’s “After you install WordPress” article, I made a note of the things I did to configure a Mailman mailing list, after creating it.  Much of this is to make the look-and-feel replicate how I used to run Majordomo lists.

Firstly, I like the Bounce handling and web-interface to Mailman, so this is why I don’t just run Majordomo for lists any more.  Its worth pointing this out, in case you wonder why I still use tool B, even though I have to do lots of work to make it work like tool A !

After running newlist, I recommend the following configuration changes (defaults which are changed assume you are running the Debian packaged Mailman) :

  • General Options – make the administrators email address a role account that you will not subscribe to the mailing list. This is basically so that if you bounce an administration message from Mailman, due to your spam filters or an error, Mailman wont decide to unsubscribe you from the mailing list!  I have had this happen to me before when I used to hand-check spam directed at uknot.
  • Decapitalise public name of the list – to make it look neater, and more like the output of the ‘lists’ command in majordomo.  Don’t forget to decapitalise the subject line tag if you do this.  I also tend to make the subject line tag very small so that on little displays, it’s still possible to scan a folder and read threads of interest on subject alone.
  • Disable monthly reminders – they are really annoying to your subscribers, and Debian’s default position is disabled, but some implementations do not disable subscription reminders.
  • My users get confused by the Filter out duplicate messages to list members option.  When a list subscriber is cc:d to a list post, users tend to expect to see a copy of the mail in their inbox, and their mailing list archive.  I turn the filter off so that this happens.
  • I tend to enable the Should administrator get notices of subscribes and unsubscribes option, so that I can track whether promoting a list in a certain place has worked!
  • In “Non Digest options”, if I am migrating for a Majordomo list, I empty the box for message footer, and also tend to remove it when its a geek mailing list, as to most high volume mail readers, its obvious when an email has been posted to a mailing list, because its filtered into the correct mailbox !  For low volume lists that are intended for low volume mail readers, the footer might be useful.
  • Check the reply to option, so that mailing lists that are intended to promote on-list discussion have a header that directs conversation back to the list, and mailing lists that will yield a high proportion of off-list mail do not have this header.
  • Be slightly annoyed with me that the only English option in ‘Language Options’ is English (USA) when there is no English (UK).

Happy list-administration!

And you may find the following lists interesting to your work :

  • mailop, for those who work in the field of mail systems administration.
  • experts, for those who work in expert e-commerce roles.
  • uknof, for those who work in network engineering roles, or systems/ISP environments.

Asterisk 1.4.22 Agent call acknowledgement bug

The behaviour of Asterisk has been altered since 1.4.21, possibly in error, with regard to answering calls from call queues.

There is a feature that requires agents to press # when they are ready to speak to a caller.  Since we forward calls to agents via their mobiles, rather than auto-answer calls in a desk environment, we disabled that feature with ackcall=no in agent.conf.

After upgrading to 1.4.22 we see this configuration is nolonger honoured.  Diffing chan_agent.c between version 1.4.21 and 22 shows a new section of code saying (in English) that ‘if there is no per-channel override specified in the dialplan, default the configured variable’ (line 2048).  I looked at where the default was read from the config file and it looks like a lot of different chunks of chan_agent want to set the ackcall default!

The bug shows up in the asterisk console as :

– Agent/xxx is ringing
– SIP/voip-out-081e5a88 is making progress passing it to Local/447xxxxxxxxx@uk_all-a667,2
– SIP/voip-out-081e5a88 answered Local/447xxxxxxxxx@uk_all-a667,2
– Local/447xxxxxxxxx@uk_all-a667,1 answered, waiting for ‘#’ to acknowledge

The workaround is that the only safe place to set the default ackcall behaviour is for each channel in the dialplan.  If you want to disable the ‘waiting for ‘#’ to acknowledge’ behaviour, configure your dialplan as such :

exten => 1701,1,Answer()
exten => 1701,n,Set(AGENTACKCALL=no)
exten => 1701,n,Queue(noc|r|||40)
exten => 1701,n,Voicemail(xxxxx)
exten => 1701,n,Hangup

Internet TV is ace

Lots of people have been telling me that IP delivered video will be big.  For a long time, I have disagreed because innovations like the PVR (and therefore simple timeshifting) and the coming of age of multiplexing (and therefore multi channel tv) have expanded choice and allowed me to fit good TV that I like around my schedule.

I disagreed on the grounds that unicast IP is a fairly bad way to broadcast large volumes of data.  When wrapped with TCP to guarantee delivery and ensure quality, the infrastructure to handle the volumes of viewers for major tv series or live events, makes it hard to imagine it being possible to deliver real-time internet TV. This is before we consider the bandwidth requirements for high-definition video and audio.
So why the change of mind ?

I previously assumed internet delivery was the Raison d’etre for Net Video.  Its not.  The internet might not have proved itself as the most viable platform for broadcast video, but it has proved itself time and again as the perfect platform for publishing content.  Internet TV is going to mean that “TV” gets some of the “wisdom of crowds” treatment, and that organisations with interesting things to say will be able to launch video content to worldwide audiences.  Previously, content makers would have needed to work with a huge machine of tens of organisations before video hit the airways, and they’d need to work with hundreds if they wanted to distribute to a worldwide audience.  Today ?  All you need is a bit of content, a bit of tallent and some hosting.

The second benefit to internet video is how portable it is.  I have syndicated three shows from Revision3 (Diggnation, Systm, and Scam School), Scobleizer TV, and some other geeky video stuff.  It is downloaded to my laptop, and copied to my iPhone.  I can watch this on a normal TV at home in high quality, via a cheap phone to tv cable, or on the move via laptop or phone.  I can watch my usual TV if I am staying away by plugging the phone into a tv in a hotel room.
In the future, I expect to watch niche stuff that is downloaded, and popular things that are spooled to disk via a broadcast system.  I would like to see some really neat devices shipping this year which allow me to combine recorded broadcast TV with downloaded Internet video, and let me carry broadcast TV in the same portable manner.

Also would love to hear what internet TV people are watching via the comments on the blog!

Openness and telecoms

This is a response to Lee Dryburgh’s article on Skype.  We had a debate on Twitter, but I have not yet mastered the art of debate in 140 characters!

Lee’s premise is that “Certainly Skype is not a walled garden. All things being relative, it’s certainly not overly closed either.”  Lee claims that the accusations of closeness are unfair, because they are levied by commentators who advocate SIP based addressing and dialing rather than any other system.

This is not my premise.  I claim that Skype is closed because calls are signalled and completed using protocols that are entirely secret as a matter of policy.  Skype’s founder presented at Spring VON 2007 and stated that if Skype did not keep their protocols entirely secret, then Skype would be full of spam and attack like email is.  I think this is a poisonous claim, telephone networks have been interconnecting around the world since telephony was conceived.  By not allowing telecoms firms to interconnect between the skype namespace and other networks, Skype have prevented openness to develop and maintain a monopoly position. That’s perfectly acceptable business, but it is not in the slightest bit open.

walled.jpgRandy Bush googled Walled Garden for a recent presentation and found this cartoon.  I like this definition because it’s correct.  Is Skype a Walled Garden ?  Lee says a Walled Garden is a commercial restriction, for example, “sharing of ringtones via Bluetooth, using WiFi from a PDA, having access to all Web sites“.  I think that only allowing interconnection with the purchase of an upgrade like SkypeOut is a restrictive or practice that suggests Skype is a Walled Garden.  Worst of all a call between two VoIP networks using this method requires default PSTN routing, which harms signal quality, and prevents the expansion of next-generation services such as Wideband/High Definition audio.

The meshing of networks, whether they are traditional voice or IP networks, leads to higher audio quality and increased reliability.  Keeping telephony systems and protocols secret in order to prevent meshing may well be a viable business model, but it is not an open business model.