SCTP Association Setup Collision

SCTP handles association setup collision such that only a single association is established in case both peers try establishing an association to the other peer, provided that both peers try connecting to an SCTP port which was opened by the peer SCTP instance.

The association setup sequence looks as follows:


SCTP association setup collision sequence


The corresponding Wireshark trace looks as follows:

SCTP association setup collision Wireshark trace

SCTP association setup collision Wireshark trace

The trace is available for download from


Veröffentlicht unter Uncategorized | Hinterlasse einen Kommentar

MQTT Protocol Length Field Encoding

MQTT is a relatively new MQ protocol for communication between distributed systems based on messages (message oriented middleware).

MQTT has been jointly developed by Eurotech and IBM as a future standard protocol for Internet of Things and M2M scenarios. OASIS has adopted the standard (published as version 3.1 under so future updates to the protocol will take place under OASIS’ auspices.

The idea of MQTT is simple: Sensor and actor nodes and applications communicate through message queues (topics in MQ speak). This setup has a very low coupling between the nodes and apps. The only thing apps and nodes need to know in order to exchange data is the address and topic of a message broker.

MQTT was designed for devices with very limited capabilities such as battery-driven sensor nodes and wireless devices. This implies that the protocol needs to be very efficient (low protocol overhead) because any excess byte transmitted over a wireless link would consume precious battery capacity. As a consequence, many fields in the MQTT protocol are optional, so they are only transmitted when needed.

Another interesting field is the remaining length field indicating the actual length of an MQTT message. The length of the remaining length field is between 1 and 4 bytes depending on the payload size (the actual user message).

The most significant bit of a remaining length field byte has the meaning «continuation bit» (CB). If more bytes follow, it is set to 1. Remaining length is encoded as a * 128^0 + b * 128^1 + c * 128^2 + d * 128^3 and placed into the RL field bytes as follows:

MQTT remaining length field              encoding

MQTT remaining length field encoding

MSB: Most Significant Byte
LSB: Least Significant Byte

The following examples show the encoding for single byte (1) and multi-byte remaining field lengths (2).

Example 1: RL = 364 = 108*128^0+2*128^1, a=108, CB0=1, b=2, CB1=0 (c, d, CB2=0)

Example 2: RL = 25’897 = 41*128^0 + 74*128^1 + 1*128^2, a=41, CB0=1, b=74, CB1=1, c=1, (CB2, d=0)

The encoding of the remaining length field requires a bit of additional bit and byte handling, but the benefit is that only a single byte is needed for most messages while preserving the capability to send larger message up to 268’435’455 bytes.

See more details on the MQTT protocol in this slide presentation.

Veröffentlicht unter Uncategorized | Hinterlasse einen Kommentar

Switzerland to lead in IPv6 adoption

To say that IPv6 is a new Internet protocol is a bit distorting historical facts. Work on IPv6 started back in the old millenium (1995 to be precise) and has since reached status of an Internet standard. Nevertheless, IPV6 adoption has been slow over the years despite the impending exhaustion of IPv4 address space.

However, IPv6 is picking up steam recently. Google’s IPv6 adoption statistics (see shows the percentage of IPv6 availability of different countries.

Switzerland is in the lead with over 10% of Internet users having native IPv6 connectivity (no tunneling). Apart from Romania which is a close follower with slightly more than 9% IPv6 connectivity as of the writing of this article, only a few countries have appreciable IPv6 adoption. The high percentage of IPv6 adoption in Switzerland is owed to the fact that Swisscom (the largest ISP in Switzerland) blazed a trail by introducing native IPv6 for its large DSL customer base. This ups the ante for other Internet providers like cable operators. The largest cable provider in Switzerland (UPC Cablecom) is apparently busy with installing the required networking gear to provide native IPv6, but no firm release date has been given thus far.

IPv6 adoption is a classical chicken and egg problem. If ISPs don’t provide it, users will not use it. If users don’t use it, ISPs have no incentive to provide it. So it’s up to the providers to do the first step and provide native (and reliable) IPv6 connectivity. On the user side, applications like Internet of Thinks (IoT) will fuel usage, provided that native and reliable IPv6 service is available.

Veröffentlicht unter Uncategorized | Hinterlasse einen Kommentar

SCTP Congestion Control

Congestion control tries avoiding overload situations in network components like routers.
Congestion in network components can lead to packet loss which is handled by the error control function of SCTP (see below). The goal of congestion control is to avoid packet loss in the first place.

SCTP congestion control key facts

•SCTP congestion control is based on RFC2581 with some minor modifications.
•Congestion control is applied to the entire association, not individual streams.
•SCTP maintains a separate cwnd parameter for each peer destination address in multihomed scenarios.
•As defined in RFC2581, the transmission rate starts slowly at the beginning (slow start phase), based on feedback provided by received SACK chunks. After the slow start phase, SCTP enters the congestion avoidance phase. In case of congestion in the network, SCTP immediately reverts back to the slow start phase.


SCTP Slow Start and Congestion Avoidance Phases

SCTP congestion control window cwnd

cwnd and rwnd (or a_rwnd = advertised receiver window size) define 2 windows where the smaller of the 2 determines the maximum amount of data that can be sent.

After the slow start phase, cwnd is large so that rwnd becomes the dominant window size.


Congestion Control Window

Slow start and congestion avoidance phases

The general mechanism applied in SCTP congestion control (as per RFC2581) is to slowly increase the congestion window size cwnd, but to rapidly collapse the window when there are signs of congestion.

Packet loss is deemed a sign of congestion. Note, however, that this is not always true (e.g. on wireless links there may be packet loss due to radio signal interferences).

Congestion control is applied to the entire association, not individual streams. Nevertheless, cwnd is maintained per destination transport address (multihomed scenarios).

Slow Start and Congestion Avoidance Phases

Slow Start and Congestion Avoidance Phases

More details are available in the SCTP slide presentation under

Veröffentlicht unter Uncategorized | Hinterlasse einen Kommentar

Stream Control Transmission Protocol

SCTP (Stream Control Transmission Protocol) is a transport layer protocol (layer 4 in the OSI stack) that corrects some of the shortcomings of TCP and provides additional features like multihoming and data streams.

SCTP has its origins in telecommunication network signalling transport. The protocol has been in existence for some time, but is gaining popularity in recent years in other applications as well. It is also an integral part of the SDN (Software Defined Network) standard, so packet flows can be specified based on SCTP port numbers too.

Summarizing, SCTP provides the following key features:

  • No head-of-line blocking
  • Message-based data transfer
  • Multihoming
  • Protection against connection flooding
  • Fragmentation in the transport layer
  • Flow control, congestion control, error control (like TCP does)

The following figure depicts the conceptual model of SCTP.


SCTP Conceptual Model

Basically, SCTP sits on top of the IP layer and offers a message passing interface to the application (ULP – Upper Layer Protocol). The messages pertain to a logical stream and the different streams are independent of each other. This means that a packet loss in one stream does not negatively affect other streams.

SCTP offers a wealth of additional functionality. An overview presentation of SCTP can be found under

Veröffentlicht unter Uncategorized | Verschlagwortet mit , | Hinterlasse einen Kommentar

PATRICIA tree/trie simulator for Internet Protocol route lookups

PATRICIA stands for Practical Algorithm To Retrieve Information Coded in Alphanumeric. It’s a binary tree (or trie as in retrieval) based search algorithm devised by D.R. Morrison eons ago (1968 to be more precise).

PATRICIA is a general algorithm for storing {key,value} pairs in a tree. PATRICIA has some very nice characteristics, namely:

  • Nodes are only added to the tree when a bit position in the key of 2 nodes differs, thus keeping the tree small and speeding up lookups
  • Only 1 type of node required thus simplifying implementation (no external nodes as in radix search trees)

I do not want to delve into the details of PATRICIA here, these can be gleaned from numerous books or web pages (e.g. Robert Sedgewick’s “Algorighms in C++” or Wikipedia). However, to demonstrate how PATRICIA works, particularly when used for IP route lookups, I programmed this little simulator (to download save target as…). The PATRICIA tree simulation shows how the search tree is built up and how searches iterate through the tree nodes.

How to use the PATRICIA search tree simulator

To run the PATRICIA tree simulator, install Java 1.6 and optionally JMF (Java Media Framework) for creating JPEG images and movies of the animation. You know where you get these things.

The left hand side of the GUI contains the control elements:

PATRICIA tree simulator controls

PATRICIA tree simulator controls

 Control elements:

  • Load Route File from Disk…: Select a route file from disk (details see below)
  • Save Route File to Disk…: Save a selected route file to disk for modifications
  • Drop-down box with built-in route files
  • Build Tree Animation: Start animating the build-up of the tree
  • Search Tree Animation: Start animating searching the tree for keys (if the Search Key field is empty, all keys of the built-in route file are searched)
  • Clear Search Key: Clear Search Key, Mask and Info fields
  • Search Key, Mask and Info (target) fields: Show key, mask and info encountered during search
  • Stop Animation: Stop a running animation
  • Animation Speed: Well, speed of the animation
  • Sound!: Emit a beep for each node traversed in the search key animation
  • Show Node Info …: Show node detail information during the search key animation
  • Finally controls for saving the current screen as a JPEG file or saving the entire animation as a movie (uses JMF)

To run the PATRICIA tree animation, first load a route file from disk or select one of the built-in route files. Then press the Build Tree Animation button to start the animation of the build-up of the tree. Change the animation speed to your needs.

The tree is built in the draw panel with horizontal lines indicating the bit position in the key that the PATRICIA algorithm evaluates:

PATRICIA tree view

PATRICIA tree view

The blue balls represent PATRICIA tree nodes, connected to other nodes with left and right “pointers” indicated with white and black lines.

The horizontal lines indicate the bit position for which a node was inserted by the PATRICIA algorithm. In the picture above, nodes have been inserted at positions 30, 29, 28, 26, 24 and 22 (see bit position on the left hand side). The node Western union money transfer at bit position b=32 is the head node (tree root node) th Western union money transfer at is always cre Western union money transfer at ed irrespective of the route file chosen.

The tree and sub-trees can be collapsed and expanded by double-clicking a node. A collapsed sub-tree is indicated with a green ball.

PATRICIA tree node information

When the mouse is hovered over a node in the tree panel, the color of the node changes to orange and node information is displayed as follows:

Detail view of PATRICIA tree node

Detail view of PATRICIA tree node

In the example screenshot above, the selected node is inserted for bit position 28. Len is the distance, i.e. number of bit positions, from the head node (4 in the screenshot above). The key for the node is 0×11000000 while the mask (network mask for IP address) is 0xFF000000. The search information stored in the node is 0×11000000 (info field). The pointers node.l and node.r point to other nodes depending on the PATRICIA algorithm.  Finally, node.p points to the parent node.

How the search works

When the search animation starts, either all keys in the route file or the single key in the key field are searched for. The tree iterates through the nodes of the tree, starting at the head node (bit position b=32).

The search wanders down the tree starting at the head node moving towards the leave nodes. The search algorithm compares the bit of the search key at each encountered node’s bit position with the corresponding bit in the node’s key. If they are both 1, the algorithm navigates to the node referenced by node.r (“right” pointer), otherwise the search continues with the node pointed to by node.l (“left” pointer). The beauty of PATRICIA lies in the fact that only a limited number of nodes need to be traversed, in contrast to fully populated binary trees that have a node at each bit position.

When the search arrives at a leaf node (both node.l and node.r point to either the node itself or to nodes with bit positions higher up in the tree towards the head node), the search key is compared to the leaf node’s key (full match comparison, i.e. all bits are compared for equality). If both keys equal, the search was successful and the found target (info) is displayed right to the node with the node’s appearance changed to inverted orange:

Successful key search

Successful key search

So far, things were pretty standard PATRICIA algorithm. In Internet Protocol routing lookups, the netmask defines the bits that are compared in an IP address (search key) and a route table’s route prefix (node’s key). Even though non-contiguous netmask bits are possible in IP routing (and supported by this PATRICIA simulator), it is customary (and wise) to use contiguous netmasks starting at the left side (high order bits).

When the search hits a leaf node, the bits used for comparison of the search key and the node’s key are defined by the mask. When the comparison shows equality, the search was successful and the info (target) is displayed. When the comparison fails, the search backs up towards the root following the node.p (parent) pointers and performing a full match comparison using the mask in each node.

This behavior shows how a route lookup in IP routing works (IP forwarding is the correct term as routing means the protocols used to exchange IP routes between routers). Even though a specific route entry may match a given IP address using the netmask, there may be a better route where more bits match (a more specific route). The longest prefix match is what a route lookup should yield. The PATRICIA algorithm enhanced with evaluating netmasks is a practical algorithm to achieve this in software.

Structure of the route file

The PATRICIA simulator comes with 5 built-in route files:

RouteFile_large.txt Large route file with 88 route entries and default route
RouteFile_small.txt Small route file with 18 route entries and default route
RouteFile_minimal.txt Very small route file with 8 route entries and default route
RouteFile_shallow.txt Route file with few routes resulting in a small and shallow PATRICIA tree
RouteFile_noDefaultRoute.txt Very small route file with 8 route entries but no default route

The route files used by this PATRICIA simulator contain route entries on each line as follows:

# Key       Mask        Info
0x7f000001, 0xffffffff, 0x7f000001
0x11000000, 0xff000000, 0x11000000

Key, mask and info are separated by commas. Lines that start with a hash sign (#) are comment lines. These lines are ignored by the PATRICIA simulator. The Key entry corresponds to an IP address to be looked up, Mask corresponds to the netmask defining the prefix length and Info is the target of the lookup (in a real IP router this would be the interface over which to forward the IP packet). For the sake of simplicity, Info equals the key in the built-in route files.

A final word on default routes. A default route is identified with key=0×00000000 (IP address does not matter) and mask=0×00000000 (no bits are compared thus every IP address looked up matches). When such a default route is present in the route file, it is added to the head node. When an inexistent key (IP address) is input to the search algorithm, there will be no match in any of the tree’s nodes, so after having reached a leaf node, the search will back up towards the root of the tree (head) where there will be a match due to the default route (both key and the node’s key masked with 0×00000000 and compared to equality will match because all bits are masked out). So the default route is a route of last resort (matches if no other route matches). It is the last route to use because it has the shortest prefix length of all routes (zero bits).

This behavior can be demonstrated by loading e.g. RouteFile_minimal.txt. Search for a key (IP address) that none of the routes in the loaded route file matches (simply enter the key=IP address in the control panel’s Search Key field in hexadecimal format such as 0xCAFE1234 and press Search Key Animation). The search will only yield a match in the head node (bit index b=32) where the default route resides. Now load RouteFile_noDefaultRoute.txt which does not contain a default route. Again search for a non-matching key. The search will fail because there is no longest-prefix match in any of the tree nodes and no default route that would normally match in the head node. The search failure is shown as follows:

Failed search for a key (IP address)

Failed search for a key (IP address)

The search failure is indicated with “Not found!” in the Info (target) field in the control panel. Additionally, the information that is displayed for the head node shows info=infoNUL.

Wrap up

PATRICIA tree simulator is a simple Java-based GUI tool for demonstrating the PATRICIA algorithm for IP route lookups. The algorithm is enhanced with functions to evaluate the netmask as well (longest-prefix match). The key, mask and info length are all 32 bits in length as is the case for IPv4 addresses. With some minor modifications, the simulator would work with 128 bit keys as well thus mimicking IPv6 addresses.

Veröffentlicht unter Uncategorized | Verschlagwortet mit , , | Hinterlasse einen Kommentar