first pass at README
This commit is contained in:
parent
a67330c4f4
commit
2bb0d474ae
290
SIGCOMM_2015/README.md
Normal file
290
SIGCOMM_2015/README.md
Normal file
@ -0,0 +1,290 @@
|
||||
# P4 SIGCOMM 2015 Tutorial
|
||||
|
||||
The original webpage for the tutorial can be found
|
||||
[here](http://conferences.sigcomm.org/sigcomm/2015/tutorial-p4.php)
|
||||
|
||||
## Introduction
|
||||
|
||||
This repository include 2 exercises we presented at SIGCOMM: *Source Routing*
|
||||
and *Flowlet Switching*. Both exercises assume that you possess basic networking
|
||||
knowledge and some familiarity with the P4 language. Please take a look at the
|
||||
[P4 language spec] (http://p4.org/spec/) and at the example `simple_router`
|
||||
target [on p4lang]
|
||||
(https://github.com/p4lang/p4factory/tree/master/targets/simple_router/p4src).
|
||||
*Source Routing* asks you to write a P4 program from scratch to implement a
|
||||
custom source routing protocol. *Flowlet Switching* is more difficult: you will
|
||||
start from a simple P4 routing program (with ECMP) and implement a version of
|
||||
flowlet switching, which yields better load balancing for long-lived TCP flows.
|
||||
|
||||
For both exercises, you will find a .tar.gz archive which contains the solution
|
||||
files.
|
||||
|
||||
## Obtaining required software
|
||||
|
||||
To complete the exercises, you will need to clone 2 p4lang Github repositories
|
||||
and install their dependencies. To clonde the repositories:
|
||||
|
||||
- `git clone https://github.com/p4lang/behavioral-model.git bmv2`
|
||||
- `git clone https://github.com/p4lang/p4c-bm.git p4c-bmv2`
|
||||
|
||||
The first repository ([bmv2](https://github.com/p4lang/behavioral-model)) is the
|
||||
second version of the behavioral model. It is a C++ software switch that will
|
||||
behave according to your P4 program. The second repository
|
||||
([p4c-bmv2](https://github.com/p4lang/p4c-bm)) is the compiler for the
|
||||
behavioral model: it takes P4 program and output a JSON file which can be loaded
|
||||
by the behavioral model.
|
||||
|
||||
Each of these repositories come with dependencies. `p4c-bmv2` is a Python
|
||||
repository and installing the required Python dependencies is very easy to do
|
||||
using `pip`: `sudo pip install -r requirements.txt`.
|
||||
|
||||
`bmv2` is a C++ repository and has more external dependencies. They are listed
|
||||
in the
|
||||
[README](https://github.com/p4lang/behavioral-model/blob/master/README.md). If
|
||||
you are running Ubuntu 14.04+, the dependencies should be easy to install (you
|
||||
can use the `install_deps.sh` script that comes with `bmv2`). Do not forget to
|
||||
build the code once all the dependencies have been installed:
|
||||
|
||||
- `./autogen.sh`
|
||||
- `./configure`
|
||||
- `make`
|
||||
|
||||
## Before starting the exercises
|
||||
|
||||
You need to tell us where you cloned the `bmv2` and `p4c-bm` repositories
|
||||
:). Please update the values of the shell variables `BMV2_PATH` and
|
||||
`P4C_BM_PATH` in the `env.sh` file - located in this directory. Note that if you
|
||||
cloned both repositories in the same directory as this one (`tutorials`), you
|
||||
will not need to change the value of the variables.
|
||||
|
||||
That's all :)
|
||||
|
||||
## Exercise 1: Source Routing
|
||||
|
||||
Place yourself in the `source_routing` directory.
|
||||
|
||||
In this problem, we will implement a very simple source routing protocol in
|
||||
P4. We will call this protocol EasyRoute. You will be designing the P4 program
|
||||
from scratch, although you are of course welcome to reuse code from other
|
||||
targets in p4lang. To test your implementation, you will create a Mininet
|
||||
network and send messages between hosts. We provide a skeleton program:
|
||||
[source_routing/p4src/source_routing.p4]
|
||||
(source_routing/p4src/source_routing.p4), you need to implement the parser and
|
||||
the ingress control flow.
|
||||
|
||||
### Description of the EasyRoute protocol
|
||||
|
||||
The EasyRoute packets looks like this:
|
||||
|
||||
```
|
||||
preamble (8 bytes) | num_valid (4 bytes) | port_1 (1 byte) | port_2 (1 byte) |
|
||||
... | port_n (1 byte) | payload
|
||||
```
|
||||
|
||||
The preamble is always set to 0. You can use this to distinguish the EasyRoute
|
||||
packets from other packets (Ethernet frames) your switch may receive. We do not
|
||||
guarantee that your P4 switch will exclusively receive EasyRoute packets.
|
||||
|
||||
The num_valid field indicates the number of valid ports in the header. If your
|
||||
EasyRoute packet is to traverse 3 switches, num_valid will initially be set to
|
||||
3, and the port list will be 3 byte long. When a switch receives an EasyRoute
|
||||
packet, the first port of the list is used to determine the outgoing port for
|
||||
the packet. num_valid is then decremented by 1 and the first port is removed
|
||||
from the list.
|
||||
|
||||
We will use the EasyRoute protocol to send text messages. The payload will
|
||||
therefore correspond to the text message we are sending. You do not have to
|
||||
worry about the encoding of the text message.
|
||||
|
||||

|
||||
|
||||
If I wish to send message "Hello" from h1 to h3, the EasyRoute packet will look
|
||||
like this:
|
||||
|
||||
- when it leaves h1:
|
||||
`00000000 00000000 | 00000002 | 03 | 01 | Hello`
|
||||
|
||||
- when it leaves sw1:
|
||||
`00000000 00000000 | 00000001 | 01 | Hello`
|
||||
|
||||
- when it leaves sw3:
|
||||
`00000000 00000000 | 00000000 | Hello`
|
||||
|
||||
Note that the last switch should not remove the EasyRoute header; otherwise the
|
||||
application running in the end hosts won’t be able to handle incoming packets
|
||||
properly.
|
||||
|
||||
Your P4 implementation needs to adhere to the following requirements:
|
||||
|
||||
1. **all non-EasyRoute packets should be dropped**
|
||||
2. **if a switch receives an EasyRoute packet for which num_valid is 0, the
|
||||
packet should be dropped**
|
||||
|
||||
### A few hints
|
||||
|
||||
1. in the start parse state, you can use `current()` to check if the packet is
|
||||
an EasyRoute packet. A call to `current(0, 64)` will examine the first 64 bits
|
||||
of the packet, **without shifting the packet pointer**.
|
||||
2. do not forget that a table can match on the validity of a header. Furthermore
|
||||
if a header is not valid, our software switch will set all its fields to 0.
|
||||
3. a table can "match" on an empty key, which means the default action will
|
||||
always be executed - if configured correctly by the runtime. Just omit the
|
||||
"reads" attribute to achieve this.
|
||||
4. you can remove a header with a call to `remove_header()`
|
||||
5. when parsing the EasyRoute header, you do not have to parse the whole port
|
||||
list. Actually P4 is currently missing language constructs needed to parse a
|
||||
general Type-Length-Value style header<sup>[1](#myfootnote1)</sup>, and hence
|
||||
you’ll need to simply extract the first port of the list and ignore the rest
|
||||
(including the payload). Also preamble, num_valid and the port number don't have
|
||||
to all be placed in the same header type.
|
||||
6. finally, we advise you to put all your logic in the ingress control flow and
|
||||
leave the egress empty. You will not need more than 1 or 2 tables to implement
|
||||
EasyRoute.
|
||||
|
||||
<a name="myfootnote1">1</a>: Members of [P4.org](http://p4.org) are working
|
||||
together to come up with language constructs needed to be able to parse
|
||||
TLV-style headers soon.
|
||||
|
||||
### Populating the tables
|
||||
|
||||
Once your P4 code is ready (you can validate it easily by running `p4-validate`
|
||||
on it), you need to think about populating the tables. We made it easy for you:
|
||||
you just have to fill the commands.txt file with `bmv2` CLI commands. We think
|
||||
that you only need to know 2 commands:
|
||||
|
||||
- `table_set_default <table_name> <action_name> [action_data]`: this is used to
|
||||
set the default action of a given table
|
||||
- `table_add <table_name> <action_name> <match_fields> => [action_data]`: this
|
||||
is used to add an entry to a table
|
||||
|
||||
You can look at example commands in the `flowlet_switching` directory:
|
||||
[flowlet_switching/commands.txt](flowlet_switching/commands.txt) and match them
|
||||
with the corresponding P4 tables
|
||||
[flowlet_switching/p4src/simple_router.p4]
|
||||
(flowlet_switching/p4src/simple_router.p4).
|
||||
|
||||
### Testing your code
|
||||
|
||||
./run_demo.sh will compile your code and create the Mininet network described
|
||||
above. It will also use commands.txt to configure each one of the switches.
|
||||
Once the network is up and running, you should type the following in the Mininet
|
||||
CLI:
|
||||
|
||||
- `xterm h1`
|
||||
- `xterm h3`
|
||||
|
||||
This will open a terminal for you on h1 and h3.
|
||||
|
||||
On h3 run: `./receive.py`.
|
||||
|
||||
On h1 run: `./send.py h1 h3`.
|
||||
|
||||
You should then be able to type messages on h1 and receive them on h3. The
|
||||
`send.py` program finds the shortest path between h1 and h3 using Dijkstra, then
|
||||
send correctly-formatted packets to h3 through s1 and s3.
|
||||
|
||||
### Debugging your code
|
||||
|
||||
.pcap files will be generated for every interface (9 files: 3 for each of the 3
|
||||
switches). You can look at the appropriate files and check that your packets are
|
||||
being processed correctly.
|
||||
|
||||
## Exercise 2: Implementing TCP flowlet switching
|
||||
|
||||
Place yourself in the `source_routing` directory.
|
||||
|
||||
### What is flowlet switching?
|
||||
|
||||
Flowlet switching leverages the burstiness of TCP flows to achieve better load
|
||||
balancing of TCP traffic. In this exercise, you will start from a program
|
||||
that load-balances based on layer 4 flows: this is generally considered
|
||||
"classic" ECMP. To do this, we compute a hash over the 5-tuple and use this
|
||||
value to choose from a set of possible next hops. This means that all packets
|
||||
belonging to the same flow (i.e. with the same 5-tuple) will be routed to
|
||||
the same nexthop. You need to enhance this P4 code with additional logic to
|
||||
implement flowlet switching.
|
||||
|
||||
We suggest implementing flowlet switching as follows:
|
||||
|
||||
1. Compute a crc16 hash over the regular TCP 5-tuple, using the
|
||||
`modify_field_with_hash_based_offset()` P4 primitive. We already use this
|
||||
primitive in the ECMP starter code, so take a look. This hash will identify each
|
||||
TCP flow (note: we do not care about collisions in this case).
|
||||
|
||||
2. For each flow, you need to store 2 things: a) a timestamp for the last
|
||||
observed packet belonging to this flow and b) a flowlet_id. Flowlet switching
|
||||
is very simple: for each packet which belongs to the flow, you need to update
|
||||
the timestamp. Then, if the time delta between the last observed packet and the
|
||||
current packet exceeds a certain timeout value (in our case, we suggest using
|
||||
50ms), then the flowlet_id needs to be incremented. Note that in data centers
|
||||
with mostly short, high-speed links, this timeout value will typically be much
|
||||
smaller. With flowlet switching, packets belonging to the same TCP burst will
|
||||
have the same flowlet_id, but packets in 2 different bursts (i.e. separated by a
|
||||
timeout) will have a different flowlet_id. This also implies that we must
|
||||
maintain some state for each TCP flow. To maintain state in P4, you will need to
|
||||
use 'register' objects (look them up in the spec). In this case, you will need
|
||||
to use two separate registers for each packet (one for the timestamp and one for
|
||||
the flowlet_id). The software switch will generate a timestamp for each new
|
||||
packet and store it in the metadata field
|
||||
`intrinsic_metadata.ingress_global_timestamp`. This is a 32 bit value, expressed
|
||||
in microseconds. You can read it in the ingress pipeline, but don't try to write
|
||||
to it.
|
||||
|
||||
3. Once you have obtained the flowlet_id, you can compute a new hash. This
|
||||
time, the hash will include the 5-tuple AND the flowlet_id. You will use this
|
||||
hash exactly like we used our hash in the starter code, as an offset into a
|
||||
nexthop table. This part of the exercise actually mostly reuses the starter
|
||||
code. Your changes to tables `ecmp_group` and `ecmp_nhop` should be minimal.
|
||||
|
||||
### Running the starter code
|
||||
|
||||
To compile and run the starter code, simply use `./run_demo.sh`. This time we
|
||||
will not be using Mininet, we will instead generate simple TCP test packets and
|
||||
send them individually to the switch to observe how it behaves. `run_demo.sh`
|
||||
will start the switch and populate the tables using the CLI commands from
|
||||
[flowlet_switching/commands.txt] (flowlet_switching/commands.txt).
|
||||
|
||||
When the switch is running, you can send test packets with `sudo
|
||||
./run_test.py`. Note that this script will take a few seconds to complete. The
|
||||
tests sends a few hundred identical TCP packets through the switch, in bursts,
|
||||
on port 3. If you take a look at commands.txt, you will see that each TCP packet
|
||||
can either go out of port 1 or port 2, based on the result of the hash
|
||||
computation. The script prints the list of outgoing ports. Since all packets are
|
||||
identical and we are using "regular" ECMP, all the packets should come out of
|
||||
the same port and you will see either a thousand "1"s or a thousand "2"s when
|
||||
you run the test. If you were to alter the test script (example: modify the TTL
|
||||
value of the input TCP packets), the output should randomly choose between port
|
||||
1 and port 2.
|
||||
|
||||
Note that the test script (and commands.txt) assume the following topology:
|
||||
|
||||
```
|
||||
--------------------------------- nhop-0 10.0.1.1
|
||||
| 00:04:00:00:00:00
|
||||
1 - 00:aa:bb:00:00:00
|
||||
|
|
||||
-------- 3--sw
|
||||
|
|
||||
2 - 00:aa:bb:00:00:01
|
||||
|
|
||||
--------------------------------- nhop-1 10.0.2.1
|
||||
00:04:00:00:00:01
|
||||
```
|
||||
|
||||
Both `nhop-0` and `nhop-1` have a path to `10.0.0.1`, which is the final
|
||||
destination of our test packet.
|
||||
|
||||
### What you need to do
|
||||
|
||||
1. Update the provided [P4 program] (flowlet_switching/p4src/simple_router.p4)
|
||||
to perform TCP flowlet switching. In our case, it requires adding 2 tables to
|
||||
the ingress pipeline. Remember that you can omit the 'reads' attribute for a
|
||||
table. In this case, providing you configure the default action of the table
|
||||
correctly, the default action will always be performed.
|
||||
|
||||
2. Update [commands.txt] (flowlet_switching/commands.txt) to configure your new
|
||||
tables.
|
||||
|
||||
3. Run the above test again. Observe how the list of ports alternate between 1
|
||||
and 2. You will need to edit the test script if you chose not to use a 50ms
|
||||
(50,000 microseconds!) timeout for the flowlet_id.
|
BIN
SIGCOMM_2015/resources/images/source_routing_topology.png
Normal file
BIN
SIGCOMM_2015/resources/images/source_routing_topology.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 46 KiB |
Loading…
x
Reference in New Issue
Block a user