Changes between Version 13 and Version 14 of SimpleParser

Aug 20, 2018, 11:21:34 PM (15 months ago)



  • SimpleParser

    v13 v14  
    11''This document is a draft and is a subject for substantial changes''
    22= SimpleParser: A design for re-configuration of Kea Configuration Parsers, attempt !#2 =
    3 For earlier attempt, see ConfigParseDesign.
    5 == the pains of the current code ==
    6 There's unanimous agreement among the team (and some external powers users as well) that the current parser code is difficult to work with and a source of general unhappiness. To be more specific, the following issues and concerns were identified:
    8 1. '''Complexity.'''. Existing parsers are overwhelmingly complex. Even though each parser derived from DhcpConfigParser is relatively small class, the complexity comes from too large number of interdependent parsers.
    9 2. '''Dispersed.''' The code is disorganized, i.e. spread out between multiple directories (src/bin/dhcp6 and src/lib/dhcpsrv/parsers).
    10 3. '''Build/commit fiasco.''' The split into build/commit never worked well. The underlying concept of having intermediate configuration storage (created during build) that gets validated and then, if validation was successful, is commited, never worked in DHCP. It was abused beyond reason. Also, due to dependency of certain parameters (e.g. option definitions had to be already parsed when option data was parsed) many parsers moved the whole logic to build(), and commit() was not used at all.
    11 4. '''No rollback capability.''' It is not trivial to revert configuration change. This split originates from BIND10 days and was inherited from DNS auth that did receive only changes in the configuration, rather than the full configuration. As a result, the split was abused and many of the parsers have commit() being a no-op operation. Several years ago, Marcin implemented great feature of staging configuration that was added to the parsers as afterthought.
    12 5. '''List of parameters.''' There is no way to generate a list of all directives. We do have .spec files, but they're not actually used by the code. The code has the directives spread out in multiple places in multiple files in mutliple directories. Answering a simple question ("can I do X in the scope Y?") requires a session of reverse engineering. What's worse, we have the .spec files that are kinda kept up to date. This is actually more damaging that way, because there's no strict correlation between the code and a .spec file. So there may be parameters that are supported, but are not in .spec files. The opposite is also true - .spec files can be buggy and have different parameters. This is particularly true for default values.
    13 6. '''Comments and file inclusions.''' It's exceedingly complex to add comments that don't start at the first column or span several lines. Both Tomek and Marcin tried to implement it, but failed miserably. The same is true for including files (have include statement in the config that includes other files.
    14 7. '''Default values.''' The current parsers don't handle the default values, i.e. if there's no directive, the parser is not created at all. We have kludgy workarounds for that, but the code for it is in different place than the parser, which leads to the logic being spread out in different places.
    15 8. '''Unclear inheritance.''' It is exceedingly difficult to understand which parameters are inherited from which scopes. Kea development team can probably answer the question, but it is impossible to answer such question without deep understanding of the code.
    16 9. '''Unnecessary intermediate storage.'''. Due to various reasons, we have a total of four stages of configuration processing: first converts string into JSON structures (a tree of Element instances), the second stage creates parsers, the third stage call build/commit on those parses to put the data into intermediate storage (a contraption of various StringStorage, Int32Storage and the like) and then finally, the fourth step is to use that intermediate storage to create actual configuration structures used by the DHCP code.
    17 10. '''Unnecessary O(n) memory complexity.''' If we have 1000 host definitions in the config file, we create a separate parser instance for each host reservation. This is wasteful memory utilization and can become a problem for huge configuration.
    18 11. '''Syntax checking.''' Syntax checking is poor. The existing JSON parser allowed things like empty option-data entries like this:
    19 {{{
    20 "option-data": [ {} ]
    21 }}}
    22   or having trailing commas:
    23 {{{
    24 "option-data": [
    25     {
    26         "code": 12,
    27         "data": "2001:db8:1:0:ff00::1"
    28     },
    29 ]
    30 }}}
    31   or having incorrect types, e.g.
    32 {{{
    33 "renew-timer": "forevah"
    34 }}}
    35 12. '''No includes''' It was next to impossible to extend the current code to provide an ability to include files. Several people (including those posting to kea-users) complained that there is no way to include files from config files.
    37 == Proposed solution ==
    38 To solve those issues a two phase approach was proposed:
    40 PHASE 1: replace isc::data::fromJSON with bison-based parser. This will address the following issues:
    42 1. allow to have a single file that defines the actual syntax, much better syntax checking
    43 2. provide more flexibility, in particular allow various comment types and file inclusions
    44 3. the code is more compact
    45 4. the whole grammar is defined in one file (in two if you need token definitions). This functionally replaces the .spec files. The major advantage over spec files is that bison grammar is actually enforced during parsing (as .spec files were not used during parsing, its content may have been incorrect or outdated and nobody would detect that). The grammar can be used as a source of list of parameters.
    46 5. The lexer code provides support for several nice features: bash comments (# - we had that before, but the old code required comment to start at the first column, now it can start anywhere), C comments (because some people like that comment style) and C++ comments (because commenting out multiple lines is easy). What's more important, the new code allows file inclusions. Anywhere in the JSON file or string you can add {{{<? include "second-file.json"?>}}} and the content of that file will be included. There is a limit of 10 nested files, so we can recover from stupid mistakes like recursive file inclusions.
    48 As a result, the parser returns JSON structures that are guaranteed to be syntatically correct. In case of mistakes, Bison returns very useful, context specific errors, e.g. "unexpected :, expected } or , in line X, column Y".
    50 PHASE 2: replace existing !DhcpConfigParser classes with simpler replacement that does the following:
    52 1. must be significantly less complex than existing code
    53 2. does not use build/commit approach
    54 3. must have rollback capability (i.e. use staging config feature implemented in !CfgMgr)
    55 4. must provide a clear list of default values
    56 5. must provide a clear list of inherited values
    58 The following text explains how those goals were achieved:
    60 == PHASE 1: Syntactic bison-based parser ==
    62 The code is implemented in two files: src/bin/dhcp6/dhcp6_lexer.ll, which defines regular expressions that are used on the input (be it a file, a string in memory or perhaps one day data coming from a socket, and src/bin/dhcp6/dhcp6_parser.yy, which defines the syntax (or 'grammar' in bison nomenclature). This is a structured file that is easy to read. Here are some examples of used:
    64 {{{
    65 // Defines global Dhcp6 object. It starts with "Dhcp6", followed by : and {, followed
    66 // by a list of global parameters and closed with }.
    67 dhcp6_object: DHCP6 COLON LCURLY_BRACKET global_params RCURLY_BRACKET;
    69 // Global parameters could be one or more global parameters.
    70 // This list is defined recursively.
    71 global_params: global_param
    72 | global_params COMMA global_param;
    74 // These are the parameters that are allowed in the top-level for
    75 // Dhcp6.
    76 global_param
    77 : preferred_lifetime
    78 | valid_lifetime
    79 | renew_timer
    80 | rebind_timer
    81 | subnet6_list
    82 | interfaces_config
    83 | lease_database
    84 | hosts_database
    85 | mac_sources
    86 | relay_supplied_options
    87 | host_reservation_identifiers
    88 | client_classes
    89 | option_data_list
    90 | hooks_libraries
    91 | expired_leases_processing
    92 | server_id
    93 | dhcp4o6_port
    94 ;
    95 }}}
    97 Working code that demonstrates this capability for Kea-dhcp6 is available on trac5014 branch.
    99 == PHASE 2: SimpleParser ==
    101 To radically simplify the parsers, a !SimpleParser class has been defined. It is intended to replace DhcpConfigParser. An important factor here is that the existing code can be migrated one parser at a time. It has a number of characteristics:
    103 1. Simplicity. Each parser provides a simple interface, e.g.
    104 {{{
    105 CfgOptionPtr parse(ConstElementPtr json)
    106 }}}
    107   This has the major benefit of not needing any intermediate storages. The JSON structures (returned by bison parser or coming from any other source) are directly converted to the structures used by configuration manager and the DHCP code. Since there is no intermediate storage, the code becomes much simpler and {String,Int,Boolean}Storage is no longer used.
    108 2. The code is almost stateless. The only state kept in most cases is whether the parsing is done in v4 or v6 context. This greatly simplifies the parsers (no contexts, no child parsers list, no separate storage for uint32, strings etc. In fact, there's so little state kept, that this parser is mostly a collection of static methods.
    109 3. No optional parameters (all are mandatory). This simplifies the parser, but introduces a new step before parsing where we insert the default values into client configuration before parsing. This is actually a good thing, because we now have a clear picture of the default parameters as they're defined in a single place (the !DhcpConfigParser had the defaults spread out in multiple files in multiple directories).
    110 4. Clearly defined default values. There is a single place where all default values are specified. This is easily readable by people who are not familiar with the code.
    111 5. Clearly defined inherited values. There is a single place where all the inherited values are specified. This is easily readable by people who are not familiar with the code.
    112 6. Control over the order in which the structures are parsed. The order of parsing is determined by the order the parsers are called. It's easy to call specific parser earlier than others.
    113 7. Better memory complexity. Since the parsers are almost stateless, we can reuse the same parser instance for many elements, e.g. only one parser for 1000s of host reservations. A side effect is that we don't need pointers to parsers any more. That's yet another simplification.
    115 Here is an example of a parser that parses option definition:
    116 {{{
    117 #!c++
    118 /// @brief Parser for a single option definition.
    119 ///
    120 /// This parser creates an instance of a single option definition.
    121 class OptionDefParser : public SimpleParser {
    122 public:
    123     /// @brief Constructor.
    124     ///
    125     /// @param address_family Address family: @c AF_INET or AF_INET6
    126     OptionDefParser(const uint16_t address_family);
    128     /// @brief Parses an entry that describes single option definition.
    129     ///
    130     /// @param option_def a configuration entry to be parsed.
    131     /// @return tuple (option definition, option space) of the parsed structure
    132     ///
    133     /// @throw DhcpConfigError if parsing was unsuccessful.
    134     OptionDefinitionTuple parse(isc::data::ConstElementPtr option_def);
    136 private:
    137     /// @brief Address family: @c AF_INET or @c AF_INET6
    138     uint16_t address_family_;
    139 };
    140 }}}
    142 Parsing the structures is easy. The !SimpleParser provides several convenience methods for obtaining parameters: getInteger, getString, getBoolean. This is a slightly simplified code code that parses option defintion:
    144 {{{
    145 #!c++
    146 std::pair<isc::dhcp::OptionDefinitionPtr, std::string>
    147 OptionDefParser::parse(ConstElementPtr option_def) {
    149     // Get mandatory parameters.
    150     std::string name = getString(option_def, "name");
    151     uint32_t code = getInteger(option_def, "code");
    152     std::string type = getString(option_def, "type");
    154     // Get optional parameters. Whoever called this parser, should have
    155     // called SimpleParser::setDefaults first.
    156     std::string space = getString(option_def, "space");
    157     std::string encapsulates = getString(option_def, "encapsulate");
    159     def.reset(new OptionDefinition(name, code, type, encapsulates));
    160     return make_pair(def, space);
    161 }
    162 }}}
    164 An important point here is that the code is only slightly tweaked code that used to be in build() or commit() methods. This way we retain all the little tweaks that we accumulated over the years (like not requiring option data if the option is empty, requiring only one of option name or option code for defined options etc).
    166 Another simplification comes from the fact that the parsers don't deal with the default values as all parameters are mandatory. Of course expecting users to always specify everything would be a bad idea, therefore there's a code that fills in the defaults with a very simple call:
    167 {{{
    168 #!c++
    169     /// @brief Sets option defaults for a single option
    170     ///
    171     /// This method sets default values for a single option.
    172     ///
    173     /// @param option an option data to be filled in with defaults.
    174     /// @param v6 is it v6 (true) or v4 (false) option?
    175     /// @return number of default values added
    176     static size_t setOptionDefaults(isc::data::ElementPtr option, bool v6);
    177 }}}
    179 The code internally uses an array defined in the following way:
    180 {{{
    181 #!c++
    182 /// This table defines the default values for option definitions in DHCPv6
    183 const SimpleDefaults OPTION6_DEF_DEFAULTS = {
    184     { "record-types", Element::string,  "" },
    185     { "space",        Element::string,  "dhcp6" },
    186     { "array",        Element::boolean, "false" },
    187     { "encapsulate",  Element::string,  "" }
    188 };
    189 }}}
    191 Such an array is easily readable, even by people who have no clue about C++. In the future a solution could be engineered that would generate parts of the User's Guide automatically based on that array. Automatically generated documentation ensures no discrepancies between the code and docs.
    193 Finally, there's the capability to derive parameters between scopes. This is a powerful mechanism. So far the envisaged usage is for deriving timer values from global to the subnet scope, but there are many other applications possible (options for subnets and pools, next-server and bootfile from global to subnet to possibly host reservation etc.). This mechanism is implemented using the following interface:
    195 {{{
    196 #!c++
    197     /// @brief Derives (inherits) parameters from parent scope to a child
    198     ///
    199     /// This method derives parameters from the parent scope to the child,
    200     /// if there are no values specified in the child scope. For example,
    201     /// this method can be used to derive timers from global scope (e.g. for
    202     /// the whole DHCPv6 server) to a subnet scope. This method checks
    203     /// if the child scope doesn't have more specific values defined. If
    204     /// it doesn't, then the value from parent scope is copied over.
    205     ///
    206     /// @param parent scope to copy from (e.g. global)
    207     /// @param child scope to copy from (e.g. subnet)
    208     /// @param params names of the parameters to copy
    209     /// @return number of parameters copied
    210     static size_t deriveParams(isc::data::ConstElementPtr parent,
    211                                isc::data::ElementPtr child,
    212                                const ParamsList& params);
    213 }}}
    215 The code uses an array defined in the following way:
    216 {{{
    217 #!c++
    218 /// This array defines a list of parameters that can be inherited
    219 /// from the global to the subnet scope.
    220 const ParamsList GLOBAL_TO_SUBNET = {
    221     "renew-timer",
    222     "rebind-timer",
    223     "preferred-lifetime",
    224     "valid-lifetime"
    225 };
    226 }}}
    228 Again, this is a piece of code that can be easily read by average users not skilled in C++ programming.
    230 == Proof of concept code ==
    232 There is a working proof of concept code. Code for phase 1 is available on trac5014 branch. Code for
    233 phase 2 (that also includes code for phase 1) is on trac5014_phase2 branch. Note that some parts of the code
    234 require C++11. Make sure you compile it with -std=c++11.
    236 Notable pieces of the code:
    238 1. There's an overview for the bison parser in src/bin/dhcp6/dhcp6.dox. This is recommended first read for people who are not familiar with bison.
    239 2. The details of simple parser are documented in !SimpleParser in src/lib/dhcpsrv/parsers/simple_parser.h
    240 3. An example usage of the parser was added in 813f8529dad949c7c63be11645d7955e36478a78. Four parsers were converted to !SimpleParser: !OptionDataParser, !OptionDataListParser, !OptionDataDefParser, and !OptionDataDefListParser. You may review those classes and how they are used. The aforementioned commit implements the changes in src/bin/dhcp{4,6}/
    241 4. As this is proof of concept and mostly refactoring, the number of new unit-tests is limited. One notable new unit-test is !ParserTest.file (see src/bin/dhcp6/tests/ that loads all example configs from doc/examples/kea6.
    243 This code as written with some extra work (unit-tests and applying similar changes to Dhcp4) will address #5014, #5015, #5036, #3450, #3875, #3960 (phase 1) and #2646, #3998, #3766 and #5039 (phase 2).
    245 == Future improvements ==
    246 1. Since the syntax is enforced by the parser, it duplicates the functionality of .spec files. We may consider whether we want to keep spec files or ditch them.
    247 2. Francis made several useful comments in #5014. Those should be looked at and implemented in one of the upcoming tickets.
    248 3. Since existing parsers became very simple (for example OptionDefParser and OptionDefListParser each consist of a single method), we can consider merging and decrease the number of independent classes. This would further simplify the code.
    249 3. your suggestion goes here. ;-)
    251 == Implementation progress ==
    252 - #5014 - implement a prototype
    253 - #5015 - design parser refactoring
    255 Dhcp6:
    256 - #5036 - Define Dhcp6 grammar (bison)
    257 - #5037 - Migrate Dhcp6 and Subnet6 to bison parser
    258 - #5038 - Dhcp6: Migrate interfaces configuration
    259 - #5039 - Dhcp6: Migrate option values and definitions configuration
    260 - #5040 - Dhcp6: migrate host reservations configuration
    261 - #5041 - Dhcp6: Migrate hook libraries configuration
    262 - #5042 - Dhcp6: Migrate MAC sources, control socket and relay information parsers
    263 - #5043 - Dhcp6: Migrate D2 (DHCP-DDNS) configuration.
    264 - #5044 - Dhcp6: Migrate DUID configuration
    265 - #5045 - Dhcp6: Migrate lease expiration configuration
    267 Dhcp4:
    268 - #5017 - Define Dhcp4 grammar (bison)
    269 - #5019 - Migrate Dhcp4 and Subnet4 to the bison parser
    270 - #5020 - Dhcp4: migrate interfaces configuration to new parser
    271 - #5021 - Dhcp4: Migrate option values and definitions configuration to new parser
    272 - #5030 - Dhcp4: Migrate host reservations configuration
    273 - #5031 - Dhcp4: Migrate hook libraries configuration
    274 - #5032 - Dhcp4: Migrate MAC sources, control socket, relay
    275 - #5033 - Dhcp4: Migrate D2 configuration
    276 - #5034 - Dhcp4: Migrate DUID configuration
    277 - #5035 - Dhcp4: Migrate lease expiration configuration
    279 Also see dedicated milestone for this:
     4Migrated to Gitlab. Please see: