libdhcp design proposal

This page outlines basic concepts and structures layout of libdhcp - a library that will handle DHCPv4 and DHCPv6 packets. Its main goals are:

  1. receive DHCPv4 and DHCPv6 packets
  2. parse received packet and verify its correctness
  3. allow easy packet manipulation - this mainly means:
    1. addition of new options
    2. access and modification of existing options
    3. removal of existing options
    4. easy option copying between packets (server's reply often contains copies of client's messages)
  4. building packet - creating over-wire buffer for transmission (with possible modification, e.g. for digest calculation)
  5. design priorities are: safety, ease of use, memory efficiency, speed (in that order). Note that DHCP server rarely handles many packets at the same time. Largest deployments known process 1M packets/day. That's roughly 12 packets per second. Even assuming 10 fold increase in largest deployments, disk I/O performance will likely stay the primary performance bottleneck.

Note: Getters/setters omitted for clarity.

Generic questions

  • Q: Do we expect external scripts to be written in languages other than Python? There are issues that require decision, e.g. int vs enum. Enum is more elegant, but may be difficult from language interface perspective. Python does support it, but what about others (e.g. bash scripts)?

Address class

This is a class that represents IPv6 address. Due to its ubiquitous usage, is was designed to be memory efficient. Currently sizeof(Addr6) is 16.

    class Addr6 {
        Addr6(const char* addr, bool plain=false);
        Addr6(struct in6_addr* addr);
        Addr6(struct sockaddr_in6* addr);
        inline const char * get() const { return addr_; }
        std::string getPlain() const;
        char* getAddr() { return addr_; }
        bool equals(const Addr6& other) const;
        bool operator==(const Addr6& other) const;

        bool linkLocal() const;
        bool multicast() const;

        // no dtor necessary (no allocations done)
        char addr_[16];
  • Q: We can have:
    1. Addr class (common for IPv4 and IPv6)
    2. Separate Addr4 and Addr6 classes?
    3. An (empty) Addr class with Addr4 and Addr6 classes derived from it.
    Third answer looks most object-oriented, but I see a lot of dynamic_cast<Addr6> ahead. Maybe it would be better to have simple boolean field in base class?

Option Structures

Base class (a generic option) class declaration:

class Option {
    Option(unsigned short type); // ctor, used for options constructed during transmission
    Option(unsigned short type, const char* buf, int len); // ctor, used for received options

    virtual char* pack(char* buf, unsigned int len); // writes option in wire-format to buf, returns pointer to first unused byte after stored option
    virtual const char* unpack(const char* buf, unsigned int len); // parses received buffer, returns pointer to first unused byte after parsed option

    virtual unsigned short len(); // returns data length (actual wire format is len()+4 for DHCPv6
    virtual bool valid(); // returns if option is valid (e.g. option may be truncated

    virtual ~Option(); // just to ensure that dtor stays virtual

    unsigned short type_;
    unsigned int len_;
    char * value_;

    TypeTBD options_; // generic mechanism for suboptions, usually empty
  • Q: Do we want to have Option4 and Option6? I think that one unified class is more convenient, but requires adding extra parameter to constructors. (Similar to universe concept in ISC DHCP. There are universes other than just DHCP4 and DHCP6, e.g. server-specific options. We may decide to reuse this concept in Kea, but I haven't given it much tought yet.). It should be noted that DHCPv4 and DHCPv6 have fundamentally different option layout (1 octet type/length in v4 vs. 2 octet type/length in v6).
  • Q: Do we want to go with bool valid() method? Alternatively, we can throw an exception if option is malformed/truncated/borked in some other unusual way. Q: It would be useful to also pass a parameter that defines expected option layout. That is similar to format codes in ISC DHCP (see dhcp/common/tables.c). That can be used for option verification and for definition of custom options. For example A6 means array of IPv6 addresses. Automated verification is then done that option length is divisible by 16). Defining separate enum type for this would be more C++-style, but a bit less flexible than leaving it as string parameter. String may contain invalid format, but is more flexible. On the other hand, there is quite limited number of layouts that are in use. Which one is more preferred: enum or string for layout parameter?

An example of derived class, specialized to convey list of IPv6 addresses:

class OptionAddrLst6: public Option {
    OptionAddr6(unsigned short type); // for creating empty option
    OptionAddr6(unsigned short type, const char* buf, int len); // for parsing received buffer
    OptionAddr6(unsigned short type, const std::vector<Addr6>& addrs); // this may be useful instead of using OptionAddr6(int) + setAddr(addrs)

    // use pack() and unpack() from base class (or redefine it for putting addresses in addrs_

    virtual unsigned short len(); // returns data length (actual wire format is len()+4 for DHCPv6
    virtual bool valid();

    std::vector<Addr6>& getAddr(); // returns reference to vector of addresses
    void setAddr(const std::vector<Addr6>& addrs);
    std::vector<Addr6> addrs_;

It may be somewhat surprising that option takes pointer to the actual option content, not to initial type/length fields. This approach is necessary to instantiate objects of various derived classes (e.g. OptionAddrList?), based on type. Example use case for parsing received buffer:

  boost::shared_ptr<Option> opt;
  unsigned short code = ntohs( *((unsigned short*) (buf+pos)));
  unsigned short length = ntohs( *((unsigned short*) (buf+pos)));
  switch (code) {
      opt = new Option(code, buffer+pos, length);
      opt = new OptionAddrLst6(code, buffer+pos, length);


Example usage for preparing buffer for transmission:

  ptr = opt1->pack(buffer, buffer_len); // option is written in wire format to buffer and pointer to the first byte after stored option
  buffer_len -= opt1->len();
  ptr = opt2->pack(ptr, buffer_len); // option is written in wire format to buffer and pointer to the first byte after stored option
  buffer_len -= opt2->len();

To avoid unnecessary copies, the life cycle will roughly look as follows:

  1. allocate buffer for packet (possibly preallocated earlier and reused)
  2. receive packet
  3. parse buffer and instantiate series of Option and Option-derived objects. Most will just store pointer to a specific place within buffer. Lifetime of those Option objects is equal to lifetime of the buffer, so they can be destroyed together with buffer.
  4. once processing is done, we may destroy all Option objects and release buffer (or reuse buffer for next packet to be received).

With that approach, received data is never copied. There may be specific cases, when some pieces of data is copied (e.g. received list of addresses is parsed and stored as vector<Addr6> for more convenient use), but that is expected to be an exception rather than a rule. If we later decide that it is safer and cleaner from OOP perspective for Option objects to take ownership of the memory they use, copy on creation may be done. That would require modification of Option class (and possibly derived classes as well)

Packet structures

There are 2 possible approaches to represent DHCP packet: define struct or define class. Note that structs in C++ can be derived, have members etc. The only practical difference between struct and class is that struct does not have access specifiers (everything is public). Using class with protected fields and getters/setters is probably cleaner and more OOP-style, but minimally slower. We can probably accept that minimal performance loss.

DHCPv4 and DHCPv6 are completely different protocols. The type of information they convey is similar, but packet format, options format, source/destination addresses, state machines, message types and option types are completely different. Therefore there is no sense to unify packet format for v4 and v6. It seems reasonable to have separate Pkt4 and Pkt6 structs. Mapping DHCPv4 client to DHCPv6 is non-trivial (and sometimes just impossible). Nevertheless, server administrators would appreciate if we could treat v4 and v6 in as similar way as practically possible. Common core class would be a useful step in that direction.

  • Q: Structures are called Pkt, Pkt4, Pkt6. It is short, but may be a bit confusing. Proper names would be DhcpPkt?, Dhcp4Pkt and Dhcp6Pkt, but it is longer, so a bit less convenient for use. Here's a far fetched question. Do we expect to have anything besides DNS and DHCP in BIND10 framework? If we do, then proper, longer naming is more appropriate.
class Pkt {
    // following are meta-information (not packet fields)
    char iface_[64]; // interface identification
    int ifindex_;
    time_t timestamp_;

    unsigned int src_port_;
    unsigned int dst_port_;
class Pkt4 : public Pkt {
   // TBD, will fill this in after DHCPv6 is defined
   Option4Lst options_;
class Pkt6 : public Pkt {
    // on-wire binary packet
    char * data_;
    unsigned int data_len_;
    Addr6 src_addr_;
    Addr6 dst_addr_;
    TypeTBD proto_; // TCP or UDP (there are TCP packets in DHCPv6)

    // parsed packet
    char msg_type_; // message type
    int transid_;   // transaction-id
    Option6Lst options_; // list of received options
    TypeTBD relays_[];   // list of traversed relays (up to 32)

Note: Setters/getters omitted for clarity.

  • Q: Naming convention: localPort/remotePort vs srcPort/dstPort. Local/remote naming is more natural for pieces of code that deal with receiving/transmitting. It is also more convenient for replying to a packet - remote stays remote. On the other hand, in other parts of the code that are not directly related to either operation (e.g. relaying message in relay) it is more natural to go with src/dst naming. Which one is preferred?
  • Q: Does Boost.Python provides convenient export of enums? If it does, protocol (and possibly hordes of other enums) can be defined as a nice enum.

Options storage container

There are several aspects that should be taken into consideration:

  1. In DHCPv4, option types are unique. If there are more than one instance of option with type X, all instances should be concatenated and treated as a single option. This uniqueness may be useful. Many STL containers may be used: vector, deque, list, map.
  2. In DHCPv6, multitple instances of the same type are allowed (and quite frequently used). Lack of uniqueness means that approach getOption(int type) won't work or will need to return a collection of options. The only reasonable choice seems to be multimap.
  3. There are hooks planned. See DhcpHooks. We should choose a data storage format that could be easily convertible to Python. [1] seems very useful for that purpose.
  4. It may be useful to use shared_ptr (implementation from boost, probably). It is a convenient way to prevent memory leaks and make our lives easier. And once C++0x finally catches on (shared_ptr is part of TR1 that is going to be included in final C++0x spec), we can migrate to standard based solution without requiring extra lib (but my understanding is that we will gradually use more boost, not less). This will ease common operations, e.g. client-id option is copied from client message (e.g. SOLICIT) to server response (e.g. ADVERTISE). With shared_ptr only a single instance of client-id option is necessary. The Option object will be released once server response packet is released.

Taking into consideration all of the above, I propose to use following storage formats:

typedef std::map<unsigned int, boost::shared_ptr<Option> > Option4Lst; // DHCPv4 options
typedef std::multimap<unsigned int, boost::shared_ptr<Option> > Option6Lst; // DHCPv6 options

Relays info

Hmmm, to be deremined later. According to [wiki:Relay? support is planned for Useful info:

  1. In DHCPv4 packet from server is sent directly from server to client, not via relays.
  2. In DHCPv6 packet from server is sent to last relay.
  3. Relayed messages in DHCPv6 are encapsulated, e.g. SOLICIT passing thru relay becomes RELAY-FORW message that contains RELAY-MSG option that contains SOLICIT message in it.
  4. There may be up to 32 relays in DHCPv6.
  5. To simplify relaying support, message will be decapsulated first and normal message (SOLICIT, REQUEST etc.) will be processed.


Python export

There are external hooks planned. See DhcpHooks. Used data format should be easily convertible to at least Python format. Brief check shows that Boost.python looks like a nice candidate, but I found in one place that it does not support Python3 yet. That particular piece of documentation was quite old, however.

Last modified 8 years ago Last modified on Aug 11, 2011, 2:10:41 PM