14 KiB
Table of Contents generated with DocToc
- P4 SIGCOMM 2015 Tutorial
P4 SIGCOMM 2015 Tutorial
The original webpage for the tutorial can be found here
Introduction
This repository include 2 exercises we presented at SIGCOMM: Source Routing
and Flowlet Switching. Both exercises assume that you possess basic networking
knowledge and some familiarity with the P4 language. Please take a look at the
P4 language spec and at the example simple_router
target on
p4lang.
Source Routing asks you to write a P4 program from scratch to implement a
custom source routing protocol. Flowlet Switching is more difficult: you will
start from a simple P4 routing program (with ECMP) and implement a version of
flowlet switching, which yields better load balancing for bursty TCP flows.
For both exercises, you will find a .tar.gz archive which contains the solution files.
Obtaining required software
To complete the exercises, you will need to clone 2 p4lang Github repositories and install their dependencies. To clonde the repositories:
git clone https://github.com/p4lang/behavioral-model.git bmv2
git clone https://github.com/p4lang/p4c-bm.git p4c-bmv2
The first repository (bmv2) is the second version of the behavioral model. It is a C++ software switch that will behave according to your P4 program. The second repository (p4c-bmv2) is the compiler for the behavioral model: it takes P4 program and output a JSON file which can be loaded by the behavioral model.
Each of these repositories come with dependencies. p4c-bmv2
is a Python
repository and installing the required Python dependencies is very easy to do
using pip
: sudo pip install -r requirements.txt
.
bmv2
is a C++ repository and has more external dependencies. They are listed
in the
README. If
you are running Ubuntu 14.04+, the dependencies should be easy to install (you
can use the install_deps.sh
script that comes with bmv2
). Do not forget to
build the code once all the dependencies have been installed:
./autogen.sh
./configure
make
You will also need to install mininet
, as well as the following Python
packages: scapy
, thrift
(>= 0.9.2) and networkx
. On Ubuntu, it would look
like this:
sudo apt-get install mininet
sudo pip install scapy thrift networkx
Before starting the exercises
You need to tell us where you cloned the bmv2
and p4c-bm
repositories
:). Please update the values of the shell variables BMV2_PATH
and
P4C_BM_PATH
in the env.sh
file - located in the root directory of this
repository. Note that if you cloned both repositories in the same directory as
this one (tutorials
), you will not need to change the value of the variables.
That's all :)
Exercise 1: Source Routing
Place yourself in the source_routing
directory.
In this problem, we will implement a very simple source routing protocol in P4. We will call this protocol EasyRoute. You will be designing the P4 program from scratch, although you are of course welcome to reuse code from other targets in p4lang. To test your implementation, you will create a Mininet network and send messages between hosts. We provide a skeleton program: source_routing/p4src/source_routing.p4, you need to implement the parser and the ingress control flow.
Description of the EasyRoute protocol
The EasyRoute packets looks like this:
preamble (8 bytes) | num_valid (4 bytes) | port_1 (1 byte) | port_2 (1 byte) |
... | port_n (1 byte) | payload
The preamble is always set to 0. You can use this to distinguish the EasyRoute packets from other packets (Ethernet frames) your switch may receive. We do not guarantee that your P4 switch will exclusively receive EasyRoute packets.
The num_valid field indicates the number of valid ports in the header. If your EasyRoute packet is to traverse 3 switches, num_valid will initially be set to 3, and the port list will be 3 byte long. When a switch receives an EasyRoute packet, the first port of the list is used to determine the outgoing port for the packet. num_valid is then decremented by 1 and the first port is removed from the list.
We will use the EasyRoute protocol to send text messages. The payload will therefore correspond to the text message we are sending. You do not have to worry about the encoding of the text message.
If I wish to send message "Hello" from h1 to h3, the EasyRoute packet will look like this:
-
when it leaves h1:
00000000 00000000 | 00000002 | 03 | 01 | Hello
-
when it leaves sw1:
00000000 00000000 | 00000001 | 01 | Hello
-
when it leaves sw3:
00000000 00000000 | 00000000 | Hello
Note that the last switch should not remove the EasyRoute header; otherwise the application running in the end hosts won’t be able to handle incoming packets properly.
Your P4 implementation needs to adhere to the following requirements:
- all non-EasyRoute packets should be dropped
- if a switch receives an EasyRoute packet for which num_valid is 0, the packet should be dropped
A few hints
- in the start parse state, you can use
current()
to check if the packet is an EasyRoute packet. A call tocurrent(0, 64)
will examine the first 64 bits of the packet, without shifting the packet pointer. - do not forget that a table can match on the validity of a header. Furthermore if a header is not valid, our software switch will set all its fields to 0.
- a table can "match" on an empty key, which means the default action will always be executed - if configured correctly by the runtime. Just omit the "reads" attribute to achieve this.
- you can remove a header with a call to
remove_header()
- when parsing the EasyRoute header, you do not have to parse the whole port list. Actually P4 is currently missing language constructs needed to parse a general Type-Length-Value style header1, and hence you’ll need to simply extract the first port of the list and ignore the rest (including the payload). Also preamble, num_valid and the port number don't have to all be placed in the same header type.
- finally, we advise you to put all your logic in the ingress control flow and leave the egress empty. You will not need more than 1 or 2 tables to implement EasyRoute.
1: Members of P4.org are working together to come up with language constructs needed to be able to parse TLV-style headers soon.
Populating the tables
Once your P4 code is ready (you can validate it easily by running p4-validate
on it), you need to think about populating the tables. We made it easy for you:
you just have to fill the commands.txt file with bmv2
CLI commands. We think
that you only need to know 2 commands:
table_set_default <table_name> <action_name> [action_data]
: this is used to set the default action of a given tabletable_add <table_name> <action_name> <match_fields> => [action_data]
: this is used to add an entry to a table
You can look at example commands in the flowlet_switching
directory:
flowlet_switching/commands.txt and match them
with the corresponding P4 tables
flowlet_switching/p4src/simple_router.p4.
Testing your code
./run_demo.sh will compile your code and create the Mininet network described above. It will also use commands.txt to configure each one of the switches. Once the network is up and running, you should type the following in the Mininet CLI:
xterm h1
xterm h3
This will open a terminal for you on h1 and h3.
On h3 run: ./receive.py
.
On h1 run: ./send.py h1 h3
.
You should then be able to type messages on h1 and receive them on h3. The
send.py
program finds the shortest path between h1 and h3 using Dijkstra, then
send correctly-formatted packets to h3 through s1 and s3.
Debugging your code
.pcap files will be generated for every interface (9 files: 3 for each of the 3 switches). You can look at the appropriate files and check that your packets are being processed correctly.
Exercise 2: Implementing TCP flowlet switching
Place yourself in the flowlet_switching
directory and run
sudo ./veth_setup.sh
.
What is flowlet switching?
Flowlet switching leverages the burstiness of TCP flows to achieve better load balancing of TCP traffic. In this exercise, you will start from a program that load-balances based on layer 4 flows: this is generally considered "classic" ECMP. To do this, we compute a hash over the 5-tuple and use this value to choose from a set of possible next hops. This means that all packets belonging to the same flow (i.e. with the same 5-tuple) will be routed to the same nexthop. You need to enhance this P4 code with additional logic to implement flowlet switching.
We suggest implementing flowlet switching as follows:
-
Compute a crc16 hash over the regular TCP 5-tuple, using the
modify_field_with_hash_based_offset()
P4 primitive. We already use this primitive in the ECMP starter code, so take a look. This hash will identify each TCP flow (note: we do not care about collisions in this case). -
For each flow, you need to store 2 things: a) a timestamp for the last observed packet belonging to this flow and b) a flowlet_id. Flowlet switching is very simple: for each packet which belongs to the flow, you need to update the timestamp. Then, if the time delta between the last observed packet and the current packet exceeds a certain timeout value (in our case, we suggest using 50ms), then the flowlet_id needs to be incremented. Note that in data centers with mostly short, high-speed links, this timeout value will typically be much smaller. With flowlet switching, packets belonging to the same TCP burst will have the same flowlet_id, but packets in 2 different bursts (i.e. separated by a timeout) will have a different flowlet_id. This also implies that we must maintain some state for each TCP flow. To maintain state in P4, you will need to use 'register' objects (look them up in the spec). In this case, you will need to use two separate registers for each packet (one for the timestamp and one for the flowlet_id). The software switch will generate a timestamp for each new packet and store it in the metadata field
intrinsic_metadata.ingress_global_timestamp
. This is a 32 bit value, expressed in microseconds. You can read it in the ingress pipeline, but don't try to write to it. -
Once you have obtained the flowlet_id, you can compute a new hash. This time, the hash will include the 5-tuple AND the flowlet_id. You will use this hash exactly like we used our hash in the starter code, as an offset into a nexthop table. This part of the exercise actually mostly reuses the starter code. Your changes to tables
ecmp_group
andecmp_nhop
should be minimal.
Running the starter code
To compile and run the starter code, simply use ./run_demo.sh
. This time we
will not be using Mininet, we will instead generate simple TCP test packets and
send them individually to the switch to observe how it behaves. run_demo.sh
will start the switch and populate the tables using the CLI commands from
flowlet_switching/commands.txt.
When the switch is running, you can send test packets with sudo ./run_test.py
. Note that this script will take a few seconds to complete. The
tests sends a few hundred identical TCP packets through the switch, in bursts,
on port 3. If you take a look at commands.txt, you will see that each TCP packet
can either go out of port 1 or port 2, based on the result of the hash
computation. The script prints the list of outgoing ports. Since all packets are
identical and we are using "regular" ECMP, all the packets should come out of
the same port and you will see either a thousand "1"s or a thousand "2"s when
you run the test. If you were to alter the test script (example: modify the TTL
value of the input TCP packets), the output should randomly choose between port
1 and port 2.
Note that the test script (and commands.txt) assume the following topology:
--------------------------------- nhop-0 10.0.1.1
| 00:04:00:00:00:00
1 - 00:aa:bb:00:00:00
|
-------- 3--sw
|
2 - 00:aa:bb:00:00:01
|
--------------------------------- nhop-1 10.0.2.1
00:04:00:00:00:01
Both nhop-0
and nhop-1
have a path to 10.0.0.1
, which is the final
destination of our test packet.
What you need to do
-
Update the provided P4 program to perform TCP flowlet switching. In our case, it requires adding 2 tables to the ingress pipeline. Remember that you can omit the 'reads' attribute for a table. In this case, providing you configure the default action of the table correctly, the default action will always be performed.
-
Update commands.txt to configure your new tables.
-
Run the above test again. Observe how the list of ports alternate between 1 and 2. You will need to edit the test script if you chose not to use a 50ms (50,000 microseconds!) timeout for the flowlet_id.