diff --git a/SIGCOMM_2015/README.md b/SIGCOMM_2015/README.md new file mode 100644 index 0000000..2e76ff2 --- /dev/null +++ b/SIGCOMM_2015/README.md @@ -0,0 +1,290 @@ +# P4 SIGCOMM 2015 Tutorial + +The original webpage for the tutorial can be found +[here](http://conferences.sigcomm.org/sigcomm/2015/tutorial-p4.php) + +## Introduction + +This repository include 2 exercises we presented at SIGCOMM: *Source Routing* +and *Flowlet Switching*. Both exercises assume that you possess basic networking +knowledge and some familiarity with the P4 language. Please take a look at the +[P4 language spec] (http://p4.org/spec/) and at the example `simple_router` +target [on p4lang] +(https://github.com/p4lang/p4factory/tree/master/targets/simple_router/p4src). +*Source Routing* asks you to write a P4 program from scratch to implement a +custom source routing protocol. *Flowlet Switching* is more difficult: you will +start from a simple P4 routing program (with ECMP) and implement a version of +flowlet switching, which yields better load balancing for long-lived TCP flows. + +For both exercises, you will find a .tar.gz archive which contains the solution +files. + +## Obtaining required software + +To complete the exercises, you will need to clone 2 p4lang Github repositories +and install their dependencies. To clonde the repositories: + +- `git clone https://github.com/p4lang/behavioral-model.git bmv2` +- `git clone https://github.com/p4lang/p4c-bm.git p4c-bmv2` + +The first repository ([bmv2](https://github.com/p4lang/behavioral-model)) is the +second version of the behavioral model. It is a C++ software switch that will +behave according to your P4 program. The second repository +([p4c-bmv2](https://github.com/p4lang/p4c-bm)) is the compiler for the +behavioral model: it takes P4 program and output a JSON file which can be loaded +by the behavioral model. + +Each of these repositories come with dependencies. `p4c-bmv2` is a Python +repository and installing the required Python dependencies is very easy to do +using `pip`: `sudo pip install -r requirements.txt`. + +`bmv2` is a C++ repository and has more external dependencies. They are listed +in the +[README](https://github.com/p4lang/behavioral-model/blob/master/README.md). If +you are running Ubuntu 14.04+, the dependencies should be easy to install (you +can use the `install_deps.sh` script that comes with `bmv2`). Do not forget to +build the code once all the dependencies have been installed: + +- `./autogen.sh` +- `./configure` +- `make` + +## Before starting the exercises + +You need to tell us where you cloned the `bmv2` and `p4c-bm` repositories +:). Please update the values of the shell variables `BMV2_PATH` and +`P4C_BM_PATH` in the `env.sh` file - located in this directory. Note that if you +cloned both repositories in the same directory as this one (`tutorials`), you +will not need to change the value of the variables. + +That's all :) + +## Exercise 1: Source Routing + +Place yourself in the `source_routing` directory. + +In this problem, we will implement a very simple source routing protocol in +P4. We will call this protocol EasyRoute. You will be designing the P4 program +from scratch, although you are of course welcome to reuse code from other +targets in p4lang. To test your implementation, you will create a Mininet +network and send messages between hosts. We provide a skeleton program: +[source_routing/p4src/source_routing.p4] +(source_routing/p4src/source_routing.p4), you need to implement the parser and +the ingress control flow. + +### Description of the EasyRoute protocol + +The EasyRoute packets looks like this: + +``` +preamble (8 bytes) | num_valid (4 bytes) | port_1 (1 byte) | port_2 (1 byte) | +... | port_n (1 byte) | payload +``` + +The preamble is always set to 0. You can use this to distinguish the EasyRoute +packets from other packets (Ethernet frames) your switch may receive. We do not +guarantee that your P4 switch will exclusively receive EasyRoute packets. + +The num_valid field indicates the number of valid ports in the header. If your +EasyRoute packet is to traverse 3 switches, num_valid will initially be set to +3, and the port list will be 3 byte long. When a switch receives an EasyRoute +packet, the first port of the list is used to determine the outgoing port for +the packet. num_valid is then decremented by 1 and the first port is removed +from the list. + +We will use the EasyRoute protocol to send text messages. The payload will +therefore correspond to the text message we are sending. You do not have to +worry about the encoding of the text message. + +![Source Routing topology](resources/images/source_routing_topology.png) + +If I wish to send message "Hello" from h1 to h3, the EasyRoute packet will look +like this: + +- when it leaves h1: +`00000000 00000000 | 00000002 | 03 | 01 | Hello` + +- when it leaves sw1: +`00000000 00000000 | 00000001 | 01 | Hello` + +- when it leaves sw3: +`00000000 00000000 | 00000000 | Hello` + +Note that the last switch should not remove the EasyRoute header; otherwise the +application running in the end hosts won’t be able to handle incoming packets +properly. + +Your P4 implementation needs to adhere to the following requirements: + +1. **all non-EasyRoute packets should be dropped** +2. **if a switch receives an EasyRoute packet for which num_valid is 0, the +packet should be dropped** + +### A few hints + +1. in the start parse state, you can use `current()` to check if the packet is +an EasyRoute packet. A call to `current(0, 64)` will examine the first 64 bits +of the packet, **without shifting the packet pointer**. +2. do not forget that a table can match on the validity of a header. Furthermore +if a header is not valid, our software switch will set all its fields to 0. +3. a table can "match" on an empty key, which means the default action will +always be executed - if configured correctly by the runtime. Just omit the +"reads" attribute to achieve this. +4. you can remove a header with a call to `remove_header()` +5. when parsing the EasyRoute header, you do not have to parse the whole port +list. Actually P4 is currently missing language constructs needed to parse a +general Type-Length-Value style header[1](#myfootnote1), and hence +you’ll need to simply extract the first port of the list and ignore the rest +(including the payload). Also preamble, num_valid and the port number don't have +to all be placed in the same header type. +6. finally, we advise you to put all your logic in the ingress control flow and +leave the egress empty. You will not need more than 1 or 2 tables to implement +EasyRoute. + +1: Members of [P4.org](http://p4.org) are working +together to come up with language constructs needed to be able to parse +TLV-style headers soon. + +### Populating the tables + +Once your P4 code is ready (you can validate it easily by running `p4-validate` +on it), you need to think about populating the tables. We made it easy for you: +you just have to fill the commands.txt file with `bmv2` CLI commands. We think +that you only need to know 2 commands: + +- `table_set_default [action_data]`: this is used to +set the default action of a given table +- `table_add => [action_data]`: this +is used to add an entry to a table + +You can look at example commands in the `flowlet_switching` directory: +[flowlet_switching/commands.txt](flowlet_switching/commands.txt) and match them +with the corresponding P4 tables +[flowlet_switching/p4src/simple_router.p4] +(flowlet_switching/p4src/simple_router.p4). + +### Testing your code + +./run_demo.sh will compile your code and create the Mininet network described +above. It will also use commands.txt to configure each one of the switches. +Once the network is up and running, you should type the following in the Mininet +CLI: + +- `xterm h1` +- `xterm h3` + +This will open a terminal for you on h1 and h3. + +On h3 run: `./receive.py`. + +On h1 run: `./send.py h1 h3`. + +You should then be able to type messages on h1 and receive them on h3. The +`send.py` program finds the shortest path between h1 and h3 using Dijkstra, then +send correctly-formatted packets to h3 through s1 and s3. + +### Debugging your code + +.pcap files will be generated for every interface (9 files: 3 for each of the 3 +switches). You can look at the appropriate files and check that your packets are +being processed correctly. + +## Exercise 2: Implementing TCP flowlet switching + +Place yourself in the `source_routing` directory. + +### What is flowlet switching? + +Flowlet switching leverages the burstiness of TCP flows to achieve better load +balancing of TCP traffic. In this exercise, you will start from a program +that load-balances based on layer 4 flows: this is generally considered +"classic" ECMP. To do this, we compute a hash over the 5-tuple and use this +value to choose from a set of possible next hops. This means that all packets +belonging to the same flow (i.e. with the same 5-tuple) will be routed to +the same nexthop. You need to enhance this P4 code with additional logic to +implement flowlet switching. + +We suggest implementing flowlet switching as follows: + +1. Compute a crc16 hash over the regular TCP 5-tuple, using the +`modify_field_with_hash_based_offset()` P4 primitive. We already use this +primitive in the ECMP starter code, so take a look. This hash will identify each +TCP flow (note: we do not care about collisions in this case). + +2. For each flow, you need to store 2 things: a) a timestamp for the last +observed packet belonging to this flow and b) a flowlet_id. Flowlet switching +is very simple: for each packet which belongs to the flow, you need to update +the timestamp. Then, if the time delta between the last observed packet and the +current packet exceeds a certain timeout value (in our case, we suggest using +50ms), then the flowlet_id needs to be incremented. Note that in data centers +with mostly short, high-speed links, this timeout value will typically be much +smaller. With flowlet switching, packets belonging to the same TCP burst will +have the same flowlet_id, but packets in 2 different bursts (i.e. separated by a +timeout) will have a different flowlet_id. This also implies that we must +maintain some state for each TCP flow. To maintain state in P4, you will need to +use 'register' objects (look them up in the spec). In this case, you will need +to use two separate registers for each packet (one for the timestamp and one for +the flowlet_id). The software switch will generate a timestamp for each new +packet and store it in the metadata field +`intrinsic_metadata.ingress_global_timestamp`. This is a 32 bit value, expressed +in microseconds. You can read it in the ingress pipeline, but don't try to write +to it. + +3. Once you have obtained the flowlet_id, you can compute a new hash. This +time, the hash will include the 5-tuple AND the flowlet_id. You will use this +hash exactly like we used our hash in the starter code, as an offset into a +nexthop table. This part of the exercise actually mostly reuses the starter +code. Your changes to tables `ecmp_group` and `ecmp_nhop` should be minimal. + +### Running the starter code + +To compile and run the starter code, simply use `./run_demo.sh`. This time we +will not be using Mininet, we will instead generate simple TCP test packets and +send them individually to the switch to observe how it behaves. `run_demo.sh` +will start the switch and populate the tables using the CLI commands from +[flowlet_switching/commands.txt] (flowlet_switching/commands.txt). + +When the switch is running, you can send test packets with `sudo +./run_test.py`. Note that this script will take a few seconds to complete. The +tests sends a few hundred identical TCP packets through the switch, in bursts, +on port 3. If you take a look at commands.txt, you will see that each TCP packet +can either go out of port 1 or port 2, based on the result of the hash +computation. The script prints the list of outgoing ports. Since all packets are +identical and we are using "regular" ECMP, all the packets should come out of +the same port and you will see either a thousand "1"s or a thousand "2"s when +you run the test. If you were to alter the test script (example: modify the TTL +value of the input TCP packets), the output should randomly choose between port +1 and port 2. + +Note that the test script (and commands.txt) assume the following topology: + +``` + --------------------------------- nhop-0 10.0.1.1 + | 00:04:00:00:00:00 + 1 - 00:aa:bb:00:00:00 + | +-------- 3--sw + | + 2 - 00:aa:bb:00:00:01 + | + --------------------------------- nhop-1 10.0.2.1 + 00:04:00:00:00:01 +``` + +Both `nhop-0` and `nhop-1` have a path to `10.0.0.1`, which is the final +destination of our test packet. + +### What you need to do + +1. Update the provided [P4 program] (flowlet_switching/p4src/simple_router.p4) +to perform TCP flowlet switching. In our case, it requires adding 2 tables to +the ingress pipeline. Remember that you can omit the 'reads' attribute for a +table. In this case, providing you configure the default action of the table +correctly, the default action will always be performed. + +2. Update [commands.txt] (flowlet_switching/commands.txt) to configure your new +tables. + +3. Run the above test again. Observe how the list of ports alternate between 1 +and 2. You will need to edit the test script if you chose not to use a 50ms +(50,000 microseconds!) timeout for the flowlet_id. diff --git a/SIGCOMM_2015/resources/images/source_routing_topology.png b/SIGCOMM_2015/resources/images/source_routing_topology.png new file mode 100644 index 0000000..cc71fca Binary files /dev/null and b/SIGCOMM_2015/resources/images/source_routing_topology.png differ