svn commit: r201728 - user/luigi/ipfw3-head/sys/netinet/ipfw

Luigi Rizzo luigi at FreeBSD.org
Thu Jan 7 09:54:31 UTC 2010


Author: luigi
Date: Thu Jan  7 09:54:31 2010
New Revision: 201728
URL: http://svn.freebsd.org/changeset/base/201728

Log:
  add some temporary documentation

Added:
  user/luigi/ipfw3-head/sys/netinet/ipfw/dummynet.txt   (contents, props changed)

Added: user/luigi/ipfw3-head/sys/netinet/ipfw/dummynet.txt
==============================================================================
--- /dev/null	00:00:00 1970	(empty, because file is newly added)
+++ user/luigi/ipfw3-head/sys/netinet/ipfw/dummynet.txt	Thu Jan  7 09:54:31 2010	(r201728)
@@ -0,0 +1,756 @@
+Notes on the internal structure of dummynet (2010 version)
+by Riccardo Panicucci and Luigi Rizzo
+
+*********
+* INDEX *
+*********
+Implementation of new dummynet
+    Internal structure
+    Files
+Packet arrival
+    The reconfiguration routine
+dummynet_task()
+Configuration
+    Add a pipe
+    Add a scheduler
+    Add a flowset
+Listing object
+Delete of object
+    Delete a pipe
+    Delete a flowset
+    Delete a scheduler
+Compatibility with FreeBSD7.2 and FreeBSD 8 ipfw binary
+    ip_dummynet_glue.c
+    ip_fw_glue.c
+How to configure dummynet
+How to implement a new scheduler
+
+
+
+
+Dummynet is a traffic shaper and network emulator. Packets are
+selected by an external filter such as ipfw, and passed to the emulator
+with a tag such as "pipe 10" or "queue 5" which tells what to
+do with the packet. As an example
+
+	ipfw add queue 5 icmp from 10.0.0.2 to all
+
+All packets with the same tag belong to a "flowset", or a set
+of flows which can be further partitioned according to a mask.
+Flowsets are then passed to a scheduler for processing. The
+association of flowsets and schedulers is configurable e.g.
+
+	ipfw queue 5 config sched 10 weight 3 flow_mask xxxx
+	ipfw queue 8 config sched 10 weight 1 ...
+	ipfw queue 3 config sched 20 weight 1 ...
+
+"sched 10" represents one or more scheduler instances,
+selected through a mask on the 5-tuple itself.
+
+	ipfw sched 20 config sched_mask yyy ...
+
+There are in fact two masks applied to each packet:
++ the "sched_mask" sends packets arriving to a scheduler_id to
+  one of many instances.
++ the "flow_mask" together with the flowset_id is used to
+  collect packets into independent flows on each scheduler.
+
+As an example, we can have
+	ipfw queue 5 config sched 10 flow_mask src-ip 0x000000ff
+	ipfw sched 10 config sched_mask src-ip 0xffffff00
+
+means that sched 10 will have one instance per /24 source subnet,
+and within that, each individual source will be a flow.
+	
+Internal structure
+-----------------
+Dummynet-related data is split into five main strucs:
+
+- struct new_pipe: contains data about the physical link such
+  as bandwith, delay, fields to simulate a delay profile and so on.
+
+- struct new_fs: describes a flowset. It contains template values for the
+  specified flowset, and a pointer (alg_fs) to an opaque struct that
+  can contain per-flowset scheduler-specific parameters, such as
+  weight, priorities, slot sizes and the like.
+  It also contains a flow_mask to allow to create more queues
+  depending of the flow id of the packet. All queues are stored into the
+  scheduler instance.
+
+- struct new_sch: it acts as a template for the scheduler used. It contains
+  enqueue and dequeue packet functions, a configure function for
+  possible global parameters, and two functions to create and destroy
+  the scheduler instance.
+  A scheduler can have more scheduler instance: a field sched_mask is
+  used to know how many instance could exist for this scheduler.
+  This struct also contains an hash table of queues pointers
+
+- struct new_sch_inst: it is the struct that represents the instance of the
+  scheduler. It has a pointer to the template, and some general parameter
+  and status variable relative to the single instance.
+  It also contains all queues associated with this instance and the delay line,
+  which is a list of packets that will be sent after a certain amount of time.
+
+- struct new_queue: it contains all data belonging to a single queue, as
+  total bytes, number of packets dropped, list of packet...
+  It can have some extra data about the scheduling algorithm.
+  XXX this is one instance of a flowset ?
+
+
+                                +----------------+
+      +--------+   ptr_sched    |                |         +----------+
+      | new_fs |--------------->|     new_sch    |-------->| new_pipe |
+      `--------'                |                |         +----------+
+          |                     +----------------+
+          |                             |       ^
+          |                             V        \___
+          |                      +------------+      |
+          |                      | hash table |      |
+          |                      |    (m1)    |      |
+          |                      +------------+      |
+          |                         |  ..... \_      |
+   -----------+     ...             v          \     |
+  | new_queue |<-----|          --------------  |    |
+  +-----------+      |         |              | |    |
+       ....          |         | new_sch_inst | |    |
+   -----------+      |         |              | |    |
+  | new_queue |<-----|          --------------  |    |
+  +-----------+      |                          v    |
+      |       .--------------.       --------------  |
+      |       |  hash table  |      |              | |
+      |       |     (m2) +   |<-----| new_sch_inst |_/
+      |       | new_sch_inst |      |              |
+      |       `--------------'       --------------
+      |                                 ^
+      |                                 |
+      `---------------------------------'
+
+Note that the hash table m2 is not mandatory, a scheduler instance
+can use its own struct to store its queues
+
+Three global data structures (hash tables) contain all
+pipes, schedulers and flowsets.
+- pipehash[x]:      contains all pipes in the system
+- schedulerhash[x]: contains all scheduler templates in the system
+- flowsethash[x]:   contains all flowset linked with a scheduler (or pipe).
+Additionally, a list that contains all unlinked flowset:
+- unlinkedflowset:  contains flowset that are not linked with any scheduler
+flowset are put in this list when they refer to a non
+existing scheduler or pipe.
+
+Scheduler instances and the delay lines associated with pipes
+need to be woken up at certain times. Because we have many
+such objects, we keep them in a priority heap (system_heap).
+
+Almost all objects in this implementation are preceded
+by a structure (struct dn_id) which makes it easier to
+identify it.
+
+
+Files
+-----
+New dummynet is split in several files.
+Two headers, a file to implement the userland part and some files for the
+kernel side are provided.
+- ip_dummynet_s.h is the minimal header that is used to implement
+  a scheduler
+- ip_dummynet.h is the main header, that includes the ip_dummynet_s.h and is
+  use by both kernel and user space.
+- dummynet.c is the file used to implement the user side of dummynet.
+  It contains the function to parsing command line, and functions to
+  show the output of dummynet objects.
+- ip_dummynet.c is the main files for the kernel side of dummynet. It contains
+  main functions for processing packets, some functions exported to help
+  writing new scheduler, and the handler for the various dummynet socket
+  options.
+- ip_dummynet_config.c cointains functions to create and delete objects.
+- ip_dummynet_get.c contains functions to prepare the buffer to pass to
+  userland to show object info.
+Moreover, there are two new file (ip_dummynet_glue.c and ip_fw_glue.c) that
+are used to allow compatibility with the "ipfw" binary from FreeBSd 7.2 and
+FreeBSD 8.
+
+
+Packet arrival
+==============
+A packet enter in dummynet process throught the function dummynet_io().
+When a packet arrives, first it is checked the flow set number and the flowset
+is searched in the flowset list and if it isn't found the packet is dropped.
+Dummynet uses a generation number to check if pointers are valid. The
+flowset has a pointer to the scheduler that could be deleted, so in the
+flowset struct there is a value that contain a number. If this number doesn't
+match the generation number, a reconfigure routine is started to check if
+pointers are valid. The generation number is incremented when a change in
+the internal struct occurs (for example when a new object is inserted or
+deleted).
+
+At this point, the pointer to the scheduler is valid, and the scheduler
+instance is searched from all scheduler instance of the scheduler, depending
+of the sched_mask field. If a scheduler instance isn't found, a new one is
+created now.
+
+Now the enqueue() function of the scheduler is called and if the scheduler
+instance was idle the dequeue() function is called now, else it will
+be called by system later. After the dequeue(), the packet returned is moved
+in the delay line and send after a delay depending on link parameters.
+See ip_dummynet.c dummynet_io() function for more details
+
+The reconfiguretion routine
+---------------------------
+The reconfigure routine is called by the system when the flowset ptr_sched_val
+number differs from the generation number. This means that a new object is
+inserted in the system or a object was deleted and no new packet belonging to
+this flowset are yet arrived.
+
+The reconfigure routine first check if the scheduler pointed by the flowset is
+the same that the scheduler located by the number. If so, and if the type are
+the same, it means that the scheduler wasn't changed. Now is check if the pipe
+exist, and eventually the pointer in the scheduler is updated. If scheduler
+and pipe exists, the packet can be enqueued.
+
+If the scheduler type differs from flowset type, it means that the scheduler
+has changed type so the flowset must be deleted and recreated. The pointer
+are update and the packet can be enqueued.
+
+If the scheduler no longer exists, the flowset is remove from flowset list
+and inserted in the unlinked flowset list, so that new packet are discarded
+early.
+
+If scheduler or pipe don't exist,packet shoubl be dropped and the function
+return 1 so that dummynet_io() can drop the packet.
+
+dummynet_task()
+===============
+The dummynet_task() is the the main dummynet processing function and is
+called every tick. This function first calculate the new current time, then
+it checks if it is the time to wake up object from the system_heap comparing
+the current time and the key of the heap. Two types of object (really the
+heap contains pointer to objects) are in the
+system_heap:
+
+- scheduler instance: if a scheduler instance is waked up, the dequeue()
+  function is called until it has credit. If the dequeue() returns packets,
+  the scheduler instance is inserted in the heap with a new key depending of
+  the data that will be send out. If the scheduler instance remains with
+  some credit, it means that is hasn't other packet to send and so the
+  instance is no longer inserted in the heap.
+
+  If the scheduler instance extracted from the heap has the DELETE flag set,
+  the dequeue() is not called and the instance is destroyed now.
+
+- delay line: when extracting a delay line, the function transmit_event() is
+  called to send out packet from delay line.
+
+  If the scheduler instance associated with this delay line doesn't exists,
+  the delay line will be delete now.
+
+Configuration
+=============
+To create a pipe, queue or scheduler, the user should type commands like:
+"ipfw pipe x config"
+"ipfw queue y config pipe x"
+"ipfw pipe x config sched <type>"
+
+The userland side of dummynet will prepare a buffer contains data to pass to
+kernel side.
+The buffer contains all struct needed to configure an object. In more detail,
+to configure a pipe all three structs (new_pipe, new_sch, new_fs) are needed,
+plus the delay profile struct if the pipe has a delay profile.
+
+If configuring a scheduler only the struct new_sch is wrote in the buffer,
+while if configuring a flowset only the new_fs struct is wrote.
+
+The first struct in the buffer contains the type of command request, that is
+if it is configuring a pipe, a queue, or a scheduler. Then there are structs
+need to configure the object, and finally there is the struct that mark
+the end of the buffer.
+
+To support the insertion of pipe and queue using the old syntax, when adding
+a pipe it's necessary to create a FIFO flowset and a FIFO scheduler, which
+have a number x + DN_PIPEOFFSET.
+
+Add a pipe
+----------
+A pipe is only a template for a link.
+If the pipe already exists, parameters are updated. If a delay profile exists
+it is deleted and a new one is created.
+If the pipe doesn't exist a new one is created. After the creation, the
+flowset unlinked list is scanned to see if there are some flowset that would
+be linked with this pipe. If so, these flowset will be of wf2q+ type (for
+compatibility) and a new wf2q+ scheduler is created now.
+
+Add a scheduler
+---------------
+If the scheduler already exists, and the type and the mask are the same, the
+scheduler is simply reconfigured calling the config_scheduler() scheduler
+function with the RECONFIGURE flag active.
+If the type or the mask differ, it is necessary to delete the old scheduler
+and create a new one.
+If the scheduler doesn't exists, a new one is created. If the scheduler has
+a mask, the hash table is created to store pointers to scheduler instances.
+When a new scheduler is created, it is necessary to scan the unlinked
+flowset list to search eventually flowset that would be linked with this
+scheduler number. If some are found, flowsets became of the type of this
+scheduler and they are configured properly.
+
+Add a flowset
+-------------
+Flowset pointers are store in the system in two list. The unlinked flowset list
+contains all flowset that aren't linked with a scheduler, the flowset list
+contains flowset linked to a scheduler, and so they have a type.
+When adding a new flowset, first it is checked if the flowset exists (that is,
+it is in the flowset list) and if it doesn't exists a new flowset is created
+and added to unlinked flowset list if the scheduler which the flowset would be
+linked doesn't exists, or added in the flowset list and configured properly if
+the scheduler exists. If the flowset (before to be created) was in the
+unlinked flowset list, it is removed and deleted, and then recreated.
+If the flowset exists, to allow reconfiguration of this flowset, the
+scheduler number and types must match with the one in memory. If this isn't
+so, the flowset is deleted and a new one will be created. Really, the flowset
+it isn't deleted now, but it is removed from flowset list and it will be
+deleted later because there could be some queues that are using it.
+
+Listing of object
+=================
+The user can request a list of object present in dummynet through the command
+"ipfw [-v] pipe|queue [x] list|show"
+The kernel side of dummynet send a buffer to user side that contains all
+pipe, all scheduler, all flowset, plus all scheduler instances and all queues.
+The dummynet user land will format the output and show only the relevant
+information.
+The buffer sent start with all pipe from the system. The entire struct new_pipe
+is passed, except the delay_profile struct that is useless in user space.
+After pipes, all flowset are wrote in the buffer. The struct contains
+scheduler flowset specific data is linked with the flowset writing the
+'obj' id of the extension into the 'alg_fs' pointer.
+Then schedulers are wrote. If a scheduler has one or more scheduler instance,
+these are linked to the parent scheduler writing the id of the parent in the
+'ptr_sched' pointer. If a scheduler instance has queues, there are wrote in
+the buffer and linked thorugh the 'obj' and 'sched_inst' pointer.
+Finally, flowsets in the unlinked flowset list  are write in the buffer, and
+then a struct gen in saved in the buffer to mark the last struct in the buffer.
+
+
+Delete of object
+================
+An object is usually removed by user through a command like
+"ipfw pipe|queue x delete". XXX sched?
+ipfw pass to the kernel a struct gen that contains the type and the number
+of the object to remove
+
+Delete of pipe x
+----------------
+A pipe can be deleted by the user throught the command 'ipfw pipe x delete'.
+To delete a pipe, the pipe is removed from the pipe list, and then deleted.
+Also the scheduler associated with this pipe should be deleted.
+For compatibility with old dummynet syntax, the associated FIFO scheduler and
+FIFO flowset must be deleted.
+
+Delete of flowset x
+-------------------
+To remove a flowset, we must be sure that is no loger referenced by any object.
+If the flowset to remove is in the unlinked flowset list, there is not any
+issue, the flowset can be safely removed calling a free() (the flowset
+extension is not yet created if the flowset is in this list).
+If the flowset is in the flowset list, first we remove from it so new packet
+are discarded when arrive. Next, the flowset is marked as delete.
+Now we must check if some queue is using this flowset.
+To do this, a counter (active_f) is provided. This counter indicate how many
+queues exist using this flowset.
+The active_f counter is automatically incremented when a queue is created
+and decremented when a queue is deleted.
+If the counter is 0, the flowset can be safely deleted, and the delete_alg_fs()
+scheduler function is called before deallocate memory.
+If the counter is not 0, the flowset remain in memory until the counter become
+zero. When a queue is delete (by dn_delete_queue() function) it is checked if
+the linked flowset is deleting and if so the counter is decrementing. If the
+counter reaches 0, the flowset is deleted.
+The deletion of a queue can be done only by the scheduler, or when the scheduler
+is destroyed.
+
+Delete of scheduler x
+---------------------
+To delete a scheduler we must be sure that any scheduler instance of this type
+are in the system_heap. To do so, a counter (inst_counter) is provided.
+This counter is managed by the system: it is incremented every time it is
+inserted in the system_heap, and decremented every time it is extracted from it.
+To delete the scheduler, first we remove it from the scheduler list, so new
+packet are discarded when they arrive, and mark the scheduler as deleting.
+
+If the counter is 0, we can remove the scheduler safely calling the
+really_deletescheduler() function. This function will scan all scheduler
+instances and call the delete_scheduler_instance() function that will delete
+the instance. When all instance are deleted, the scheduler template is
+deleted calling the delete_scheduler_template(). If the delay line associate
+with the scheduler is empty, it is deleted now, else it will be deleted when
+it will became empy.
+If the counter was not 0, we wait for it. Every time the dummynet_task()
+function extract a scheduler from the system_heap, the counter is decremented.
+If the scheduler has the delete flag enabled the dequeue() is not called and
+delete_scheduler_instance() is called to delete the instance.
+Obviously this scheduler instance is no loger inserted in the system_heap.
+If the counter reaches 0, the delete_scheduler_template() function is called
+all memory is released.
+NOTE: Flowsets that belong to this scheduler are not deleted, so if a new
+      scheduler with the same number is inserted will use these flowsets.
+      To do so, the best approach would be insert these flowset in the
+      unlinked flowset list, but doing this now will be very expensive.
+      So flowsets will remain in memory and linked with a scheduler that no
+      longer exists until a packet belonging to this flowset arrives. When
+      this packet arrives, the reconfigure() function is called because the
+      generation number mismatch with one contains in the flowset and so
+      the flowset will be moved into the flowset unlinked list, or will be
+      linked with the new scheduler if a new one was created.
+
+
+COMPATIBILITY WITH FREEBSD 7.2 AND FREEBSD 8 'IPFW' BINARY
+==========================================================
+Dummynet is not compatible with old ipfw binary because internal structs are
+changed. Moreover, the old ipfw binary is not compatible with new kernels
+because the struct that represents a firewall rule has changed. So, if a user
+install a new kernel on a FreeBSD 7.2, the ipfw (and possibly many other
+commands) will not work.
+New dummynet uses a new socket option: IP_DUMMYNET3, used for both set and get.
+The old option can be used to allow compatibility with the 'ipfw' binary of
+older version (tested with 7.2 and 8.0) of FreeBSD.
+Two file are provided for this purpose:
+- ip_dummynet_glue.c translates old dummynet requests to the new ones,
+- ip_fw_glue.c converts the rule format between 7.2 and 8 versions.
+Let see in detail these two files.
+
+IP_DUMMYNET_GLUE.C
+------------------
+The internal structs of new dummynet are very different from the original.
+Because of there are some difference from between dummynet in FreeBSD 7.2 and
+dummynet in FreeBSD 8 (the FreeBSD 8 version includes support to pipe delay
+profile and burst option), I have to include both header files. I copied
+the revision 191715 (for version 7.2) and the revision 196045 (for version 8)
+and I appended a number to each struct to mark them.
+
+The main function of this file is ip_dummynet_compat() that is called by
+ip_dn_ctl() when it receive a request of old socket option.
+
+A global variabile ('is7') store the version of 'ipfw' that FreeBSD is using.
+This variable is set every time a request of configuration is done, because
+with this request we receive a buffer of which size depending of ipfw version.
+Because of in general the first action is a configuration, this variable is
+usually set accordly. If the first action is a request of listing of pipes
+or queues, the system cannot know the version of ipfw, and we suppose that
+version 7.2 is used. If version is wrong, the output can be senseless, but
+the application should not crash.
+
+There are four request for old dummynet:
+- IP_DUMMYNET_FLUSH: the flush options have no parameter, so simply the
+  dummynet_flush() function is called;
+- IP_DUMMYNET_DEL: the delete option need to be translate.
+  It is only necessary to extract the number and the type of the object
+  (pipe or queue) to delete from the buffer received and build a new struct
+  gen contains the right parameters, then call the delete_object() function;
+- IP_DUMMYNET_CONFIGURE: the configure command receive a buffer depending of
+  the ipfw version. After the properly extraction of all data, that depends
+  by the ipfw version used, new structures are filled and then the dummynet
+  config_pipe() function is properly called. Note that the 7.2 version does
+  not support some parameter as burst or delay profile.
+- IP_DUMMYNET_GET: The get command should send to the ipfw the correct buffer
+  depending of its version. There are two function that build the
+  corrected buffer, ip_dummynet_get7() and ip_dummynet_get8(). These
+  functions reproduce the buffer exactly as 'ipfw' expect. The only difference
+  is that the weight parameter for a queue is no loger sent by dummynet and so
+  it is set to 0.
+  Moreover, because of the internal structure has changed, the bucket size
+  of a queue could not be correct, because now all flowset share the hash
+  table.
+  If the version of ipfw is wrong, the output could be senseless or truncated,
+  but the application should not crash.
+
+IP_FW_GLUE.C
+------------
+The ipfw binary also is used to add rules to FreeBSD firewall. Because of the
+struct ip_fw is changed from FreeBsd 7.2 to FreeBSD 8, it is necessary
+to write some glue code to allow use ipfw from FreeBSD 7.2 with the kernel
+provided with FreeBSD 8.
+This file contains two functions to convert a rule from FreeBSD 7.2 format to
+FreeBSD 8 format, and viceversa.
+The conversion should be done when a rule passes from userspace to kernel space
+and viceversa.
+I have to modify the ip_fw2.c file to manage these two case, and added a
+variable (is7) to store the ipfw version used, using an approach like the
+previous file:
+- when a new rule is added (option IP_FW_ADD) the is7 variable is set if the
+  size of the rule received corrispond to FreeBSD 7.2 ipfw version. If so, the
+  rule is converted to version 8 calling the function convert_rule_to_8().
+  Moreover, after the insertion of the rule, the rule is now reconverted to
+  version 7 because the ipfw binary will print it.
+- when the user request a list of rules (option IP_FW_GET) the is7 variable
+  should be set correctly because we suppose that a configure command was done,
+  else we suppose that the FreeBSD version is 8. The function ipfw_getrules()
+  in ip_fw2.c file return all rules, eventually converted to version 7 (if
+  the is7 is set) to the ipfw binary.
+The conversion of a rule is quite simple. The only difference between the
+two structures (struct ip_fw) is that in the new there is a new field
+(uint32_t id). So, I copy the entire rule in a buffer and the copy the rule in
+the right position in the new (or old) struct. The size of commands are not
+changed, and the copy is done into a cicle.
+
+How to configure dummynet
+=========================
+It is possible to configure dummynet through two main commands:
+'ipfw pipe' and 'ipfw queue'.
+To allow compatibility with old version, it is possible configure dummynet
+using the old command syntax. Doing so, obviously, it is only possible to
+configure a FIFO scheduler or a wf2q+ scheduler.
+A new command, 'ipfw pipe x config sched <type>' is supported to add a new
+scheduler to the system.
+
+- ipfw pipe x config ...
+  create a new pipe with the link parameters
+  create a new scheduler fifo (x + offset)
+  create a new flowset fifo (x + offset)
+  the mask is eventually stored in the FIFO scheduler
+
+- ipfw queue y config pipe x ...
+  create a new flowset y linked to sched x.
+    The type of flowset depends by the specified scheduler.
+    If the scheduler does not exist, this flowset is inserted in a special
+    list and will be not active.
+    If pipe x exists and sched does not exist, a new wf2q+ scheduler is
+    created and the flowset will be linked to this new scheduler (this is
+    done for compatibility with old syntax).
+
+- ipfw pipe x config sched <type> ...
+  create a new scheduler x of type <type>.
+  Search into the flowset unlinked list if there are some flowset that
+  should be linked with this new scheduler.
+
+- ipfw pipe x delete
+  delete the pipe x
+  delete the scheduler fifo (x + offset)
+  delete the scheduler x
+  delete the flowset fifo (x + offset)
+
+- ipfw queue x delete
+  delete the flowset x
+
+- ipfw sched x delete ///XXX
+  delete the scheduler x
+
+Follow now some examples to how configure dummynet:
+- Ex1:
+  ipfw pipe 10 config bw 1M delay 15 // create a pipe with band and delay
+                                        A FIFO flowset and scheduler is
+                                        also created
+  ipfw queue 5 config pipe 10 weight 56 // create a flowset. This flowset
+                                           will be of wf2q+ because a pipe 10
+                                           exists. Moreover, the wf2q+
+                                           scheduler is created now.
+- Ex2:
+  ipfw queue 5 config pipe 10 weight 56 // Create a flowset. Scheduler 10
+                                           does not exist, so this flowset
+                                           is inserted in the unlinked
+                                           flowset list.
+  ipfw pipe 10 config bw... // Create a pipe, a FIFO flowset and scheduler.
+                               Because of a flowset with 'pipe 10' exists,
+                               a wf2q+ scheduler is created now and that
+                               flowset is linked with this sceduler.
+
+- Ex3:
+  ipfw pipe 10 config bw...    // Create a pipe, a FIFO flowset and scheduler.
+  ipfw pipe 10 config sched rr // Create a scheduler of type RR, linked to
+                                  pipe 10
+  ipfw queue 5 config pipe 10 weight 56 // Create a flowset 5. This flowset
+                                           will belong to scheduler 10 and
+                                           it is of type RR
+
+- Ex4:
+  ipfw pipe 10 config sched rr // Create a scheduler of type RR, linked to
+                                  pipe 10 (not exist yet)
+  ipfw pipe 10 config bw... // Create a pipe, a FIFO flowset and scheduler.
+  ipfw queue 5 config pipe 10 weight 56 // Create a flowset 5.This flowset
+                                           will belong to scheduler 10 and
+                                           it is of type RR
+  ipfw pipe 10 config sched wf2q+ // Modify the type of scheduler 10. It
+                                     becomes a wf2q+ scheduler.
+                                     When a new packet of flowset 5 arrives,
+                                     the flowset 5 becomes to wf2q+ type.
+
+How to implement a new scheduler
+================================
+In dummynet, a scheduler algorithm is represented by two main structs, some
+functions and other minor structs.
+- A struct new_sch_xyz (where xyz is the 'type' of scheduler algorithm
+  implemented) contains data relative to scheduler, as global parameter that
+  are common to all instances of the scheduler
+- A struct new_sch_inst_xyz contains data relative to a single scheduler
+  instance, as local status variable depending for example by flows that
+  are linked with the scheduler, and so on.
+To add a scheduler to dummynet, the user should type a command like:
+'ipfw pipe x config sched <type> [mask ... ...]'
+This command creates a new struct new_sch_xyz of type <type>, and
+store the optional parameter in that struct.
+
+The parameter mask determines how many scheduler instance of this
+scheduler may exist. For example, it is possible to divide traffic
+depending on the source port (or destination, or ip address...),
+so that every scheduler instance act as an independent scheduler.
+If the mask is not set, all traffic goes to the same instance.
+
+When a packet arrives to a scheduler, the system search the corrected
+scheduler instance, and if it does not exist it is created now (the
+struct new_sch_inst_xyz is allocated by the system, and the scheduler
+fills the field correctly). It is a task of the scheduler to create
+the struct that contains all queues for a scheduler instance.
+Dummynet provides some function to create an hash table to store
+queues, but the schedule algorithm can choice the own struct.
+
+To link a flow to a scheduler, the user should type a command like:
+'ipfw queue z config pipe x [mask... ...]'
+
+This command creates a new 'new_fs' struct that will be inserted
+in the system.  If the scheduler x exists, this flowset will be
+linked to that scheduler and the flowset type become the same as
+the scheduler type. At this point, the function create_alg_fs_xyz()
+is called to allow store eventually parameter for the flowset that
+depend by scheduler (for example the 'weight' parameter for a wf2q+
+scheduler, or some priority...). A parameter mask can be used for
+a flowset. If the mask parameter is set, the scheduler instance can
+separate packet according to its flow id (src and dst ip, ports...)
+and assign it to a separate queue. This is done by the scheduler,
+so it can ignore the mask if it wants.
+
+See now the two main structs:
+struct new_sch_xyz {
+    struct gen g; /* important the name g */
+    /* global params */
+};
+struct new_sch_inst_xyz {
+    struct gen g; /* important the name g */
+    /* params of the instance */
+};
+It is important to embed the struct gen as first parameter. The struct gen
+contains some values that the scheduler instance must fill (the 'type' of
+scheduler, the 'len' of the struct...)
+The function create_scheduler_xyz() should be implemented to initialize global
+parameters in the first struct, and if memory allocation is done it is
+mandatory to implement the delete_scheduler_template() function to free that
+memory.
+The function create_scheduler_instance_xyz() must be implemented even if the
+scheduler instance does not use extra parameters. In this function the struct
+gen fields must be filled with corrected infos. The
+delete_scheduler_instance_xyz() function must bu implemented if the instance
+has allocated some memory in the previous function.
+
+To store data belonging to a flowset the follow struct is used:
+struct alg_fs_xyz {
+    struct gen g;
+    /* fill correctly the gen struct
+     g.subtype = DN_XYZ;
+     g.len = sizeof(struct alg_fs_xyz)
+     ...
+     */
+    /* params for the flow */
+};
+The create_alg_fs_xyz() function is mandatory, because it must fill the struct
+gen, but the delete_alg_fs_xyz() is mandatory only if the previous function
+has allocated some memory.
+
+A struct new_queue contains packets belonging to a queue and some statistical
+data. The scheduler could have to store data in this struct, so it must define
+a new_queue_xyz struct:
+struct new_queue_xyz {
+    struct new_queue q;
+    /* parameter for a queue */
+}
+
+All structures are allocated by the system. To do so, the scheduler must
+set the size of its structs in the scheduler descriptor:
+scheduler_size:     sizeof(new_sch_xyz)
+scheduler_i_size:   sizeof(new_sch_inst_xyz)
+flowset_size:       sizeof(alg_fs_xyz)
+queue_size:         sizeof(new_queue_xyz);
+The scheduler_size could be 0, but other struct must have at least a struct gen.
+
+
+After the definition of structs, it is necessary to implement the
+scheduler functions.
+
+- int (*config_scheduler)(char *command, void *sch, int reconfigure);
+    Configure a scheduler, or reconfigure if 'reconfigure' == 1.
+    This function performs additional allocation and initialization of global
+    parameter for this scheduler.
+    If memory is allocated here, the delete_scheduler_template() function
+    should be implemented to remove this memory.
+- int (*delete_scheduler_template)(void* sch);
+    Delete a scheduler template. This function is mandatory if the scheduler
+    uses extra data respect the struct new_sch.
+- int (*create_scheduler_instance)(void *s);
+    Create a new scheduler instance. The system allocate the necessary memory
+    and the schedulet can access it using the 's' pointer.
+    The scheduler instance stores all queues, and to do this can use the
+    hash table provided by the system.
+- int (*delete_scheduler_instance)(void *s);
+    Delete a scheduler instance. It is important to free memory allocated
+    by create_scheduler_instance() function. The memory allocated by system
+    is freed by the system itself. The struct contains all queue also has
+    to be deleted.
+- int (*enqueue)(void *s, struct gen *f, struct mbuf *m,
+                 struct ipfw_flow_id *id);
+    Called when a packet arrives. The packet 'm' belongs to the scheduler
+    instance 's', has a flowset 'f' and the flowid 'id' has already been
+    masked. The enqueue() must call dn_queue_packet(q, m) function to really
+    enqueue packet in the queue q. The queue 'q' is chosen by the scheduler
+    and if it does not exist should be created calling the dn_create_queue()
+    function. If the schedule want to drop the packet, it must call the
+    dn_drop_packet() function and then return 1.
+- struct mbuf * (*dequeue)(void *s);
+    Called when the timer expires (or when a packet arrives and the scheduler
+    instance is idle).
+    This function is called when at least a packet can be send out. The
+    scheduler choices the packet and returns it; if no packet are in the
+    schedulerinstance, the function must return NULL.
+    Before return a packet, it is important to call the function
+    dn_return_packet() to update some statistic of the queue and update the
+    queue counters.
+- int (*drain_queue)(void *s, int flag);
+    The system request to scheduler to delete all queues that is not using
+    to free memory. The flag parameter indicate if a queue must be deleted
+    even if it is active.
+
+- int (*create_alg_fs)(char *command, struct gen *g, int reconfigure);
+    It is called when a flowset is linked with a scheduler. This is done
+    when the scheduler is defined, so we can know the type of flowset.
+    The function initialize the flowset paramenter parsing the command
+    line. The parameter will be stored in the g struct that have the right
+    size allocated by the system. If the reconfigure flag is set, it means
+    that the flowset is reconfiguring
+- int (*delete_alg_fs)(struct gen *f);
+    It is called when a flowset is deleting. Must remove the memory allocate
+    by the create_alg_fs() function.
+
+- int (*create_queue_alg)(struct new_queue *q, struct gen *f);
+    Called when a queue is created. The function should link the queue
+    to the struct used by the scheduler instance to store all queues.
+- int (*delete_queue_alg)(struct new_queue *q);
+    Called when a queue is deleting. The function should remove extra data
+    and update the struct contains all queues in the scheduler instance.
+
+The struct scheduler represent the scheduler descriptor that is passed to
+dummynet when a scheduler module is loaded.
+This struct contains the type of scheduler, the lenght of all structs and
+all function pointers.
+If a function is not implemented should be initialize to NULL. Some functions
+are mandatory, other are mandatory if some memory should be freed.
+Mandatory functions:
+- create_scheduler_instance()
+- enqueue()
+- dequeue()
+- create_alg_fs()
+- drain_queue()
+Optional functions:
+- config_scheduler()
+- create_queue_alg()
+Mandatory functions if the corresponding create...() has allocated memory:
+- delete_scheduler_template()
+- delete_scheduler_instance()
+- delete_alg_fs()
+- delete_queue_alg()
+


More information about the svn-src-user mailing list