Opened 3 years ago

Closed 3 years ago

Last modified 3 years ago

#5014 closed task (complete)

Prototype parser implementation using bison

Reported by: tomek Owned by: tomek
Priority: high Milestone: Kea 1.2 - Mozilla Milestone 1
Component: remote-management Version: git
Keywords: Cc:
CVSS Scoring: Parent Tickets:
Sensitive: no Defect Severity: N/A
Sub-Project: DHCP Feature Depending on Ticket:
Estimated Difficulty: 0 Add Hours to Ticket: 0
Total Hours: 40 Internal?: no

Description (last modified by tomek)

Major goal of the 1.2 milestone is to significantly refactor or rewrite parser configuration. This ticket covers an experiment: try to implement bison grammar that will cover some basic capabilities. Bison worked well for us when we needed something to parse classification expressions. Its maintenance is easy, the code is readable and easy to extend. Since we already use bison, using it in a different place does not bring any new dependencies.

The code should cover the following:

  • Dhcp6 clause
  • Interfaces definition
  • Lease-database (only memfile is fine)
  • timers (on a global level)
  • Subnet6 structures
  • Pool structure

This should be sufficient to run bare bones configuration. The code does not need any new unit-tests, but existing tests should be runnable when possible.

After that code is written, it should be assessed whether the major issues of the old implementation are addressed:

  1. parsers are overwhelmingly complex. Even though each parser is relatively simple class, the complexity comes from too large number of interacting parsers.
  2. the split into build/commit never worked well. In particular, it is not trivial to revert configuration change
  3. the code is disorganized, i.e. spread out between multiple directories (src/bin/dhcp6 and src/lib/dhcpsrv/parsers).
  4. there is no way to generate a list of all directives
  5. it's exceedingly complex to add comments that don't start at the first column or span several lines
  6. current parsers don't handle the default values, i.e. if there's no directive, the parser is not created at all. We have kludgy workarounds for that, but the code for it is in different place than the parser, which leads to the logic being spread out in different places.

The new code should be examined against the following requirements:

  • it should be possible to parse part of the configuration
  • it must be possible to read the configuration data from a source different than a local file on disk
  • the code must be more readable compared to old parsers

The requirements mentioned don't have to be implemented in the prototype, but there must be a way to implement them. For example, it's ok for the prototype to read data from a file only as long as it's possible to extend it in the future to read it from somewhere else (e.g. a socket).

This code is not expected to be merged. It's a throw away prototype. Once this prototype is done, the lessons will be used to design actual parser. There will be a separate ticket for that.

Subtickets

Change History (21)

comment:1 Changed 3 years ago by tomek

  • Description modified (diff)

comment:2 follow-up: Changed 3 years ago by fdupont

I have a concern about this: currently the Kea configuration has 2 syntaxes at different levels: one is a subset of JSON (BTW there are a lot of JSON grammars for bison), one is described by this ticket, and IMHO the second is not suitable for a direct use of bison.

comment:3 in reply to: ↑ 2 ; follow-up: Changed 3 years ago by tomek

Replying to fdupont:

I have a concern about this: currently the Kea configuration has 2 syntaxes at different levels: one is a subset of JSON (BTW there are a lot of JSON grammars for bison), one is described by this ticket, and IMHO the second is not suitable for a direct use of bison.

Ack. We definitely need a way to parse generic JSON structures (for example to parse arbitrary structures in hook parameters or to parse commands incoming over command channel).

But I think using bison to parse the configuration file is ok. I did that for dibbler and it worked fine. It was really easy to extend parsers in Dibbler and also get a complete list of supported parameters. Here's an example:

https://github.com/tomaszmrugalski/dibbler/blob/master/SrvCfgMgr/SrvParser.y

Don't laugh, I wrote large parts of that code over 10 years ago. Kea's equivalent will be much cleaner.

Last edited 3 years ago by tomek (previous) (diff)

comment:4 Changed 3 years ago by tomek

Additional information is available in ConfigParseDesign, which is the document describing previous attempt at parsers improvement.

comment:5 Changed 3 years ago by tomek

  • Owner set to tomek
  • Status changed from new to assigned

comment:6 in reply to: ↑ 3 Changed 3 years ago by fdupont

Replying to tomek:

Replying to fdupont:

I have a concern about this: currently the Kea configuration has 2 syntaxes at different levels: one is a subset of JSON (BTW there are a lot of JSON grammars for bison), one is described by this ticket, and IMHO the second is not suitable for a direct use of bison.

Ack. We definitely need a way to parse generic JSON structures (for example to parse arbitrary structures in hook parameters or to parse commands incoming over command channel).

=> one of the things I'd like to investigate is to replace the current home made JSON parser by a flex+bison one: first I think than automata used by flex/bison are faster, second this should lead to cleaner code. BTW the main user visible difference should be in the "location" for errors: flex/bison gives begin and end columns...

But I think using bison to parse the configuration file is ok. I did that for dibbler and it worked fine. It was really easy to extend parsers in Dibbler and also get a complete list of supported parameters.

=> you can do what you want using bison (or using flex, e.g., doxygen is flex based). This doesn't mean it is the best.
About parameters (aka keywords) there are pros and cons:

+ far easier to find unknown stuff
+ names are compared (and in an optimized way) only once and after you can use a switch (vs a chain of if compare)

  • you lose shared parse of common structures, e.g., lists
  • keywords get a global scope, e.g. a DHCPv4 parameter name can conflict within the DHCPv6 config.

I have to write the AFTER config comment I talked about last week. There should be something about the Juniper config style (I don't know how it was done but it was really nice to use).

comment:7 Changed 3 years ago by tomek

  • Description modified (diff)

comment:8 Changed 3 years ago by tomek

  • Component changed from configuration to remote-management

comment:9 Changed 3 years ago by fdupont

IMHO we should use only flex with context stuff. It should be faster (can't be better than a finite state automaton in general cases) and has no issue with keywords. The only drawback is it can be hard to read the .ll file without a good organization. BTW we have an example: doxygen is 100% flex based and can scan some complex languages as C or C++. So it is feasible...

comment:10 Changed 3 years ago by tomek

  • Owner changed from tomek to Unassigned
  • Status changed from assigned to reviewing

comment:11 follow-up: Changed 3 years ago by tomek

This ticket is now under review. Note this code IS NOT expected to be merged in its current state. The goal of this review is to validate the general approach.

An e-mail has been sent to the team asking for feedback.

comment:12 in reply to: ↑ 11 Changed 3 years ago by fdupont

Replying to tomek:

An e-mail has been sent to the team asking for feedback.

=> sent my comments as an answer to that e-mail.

comment:13 Changed 3 years ago by tomek

  • Owner changed from Unassigned to fdupont

comment:14 Changed 3 years ago by fdupont

Fixed %empty reported by bison -W.
Fixed tests including // and /* */ unit tests.
Added <?include "filename"?> with unit test. BTW it is a pure lexical include: the include file can produce 2 objets or be unphased, i.e., the input flow is diverted. IMHO this matches the proposed (by Tomek) syntax. Recursion is limited to 10.
There are a zillion of unit tests to add (but not now as it is a proof of concept).
Next step could be to solve the keyword issue, cf https://www.gnu.org/software/bison/manual/html_node/Lexical-Tie_002dins.html#Lexical-Tie_002dins for a possible solution. Note today the JSON sub-parser is not enable to parse a segment of the dhcpv6 server config...

comment:15 Changed 3 years ago by fdupont

  • Owner changed from fdupont to UnAssigned

I did all the urgent things I wanted. Ready for other eyes?

comment:16 Changed 3 years ago by fdupont

As now we use syntactic contexts to recognize keywords we can go a step further: add a keyword as a new Element subclass encapsulating an enum. This allows to replace chains of if (name == "constant")}} by a {{{switch: cleaner, faster, etc.
Note it works better (bit doesn't require) to have a global list of keywords.

Another point: we can easily improve error handling, for instance for a JSON map:

my_map: KW1 ":" value
 | KW2 ":" value
 ;

can be changed into:

my_map: KW1 ":" value
 | KW2 ":" value
 | STRING { ctx.error("unknown keyword $1 in my_map"); }
 ;

or if we collect all errors:

my_map: KW1 ":" value
 | KW2 ":" value
 | STRING ":" value { print_error("unknown keyword $1 in my_map"); }
 ;

comment:17 Changed 3 years ago by tomek

  • Resolution set to complete
  • Status changed from reviewing to closed
  • Total Hours changed from 0 to 40

Francis, thanks a lot for the improvements you made and for the suggestions.

Updating the JSON parser what phase 1 of the cleanup. I have now pushed the code for phase 2, which covers dealing with the existing configuration parsers complexity (all those DhcpConfigParser? derivatives). That code is available on trac5014_phase2 branch, if you're interested.

I consider this prototype complete. It sufficiently advanced to prove that the concept is reasonable and will work. As such, I'm closing this ticket.

Having said that, the code will continue to be worked on. I will describe it in #5015 (probably on as a wiki page), and then start turning it into production code as part of #5036 and following tickets. The comments made above by Francis will be addressed in #5036 and following tickets.

comment:18 Changed 3 years ago by tomek

  • Resolution complete deleted
  • Status changed from closed to reopened

comment:19 Changed 3 years ago by tomek

  • Owner changed from UnAssigned to tomek
  • Status changed from reopened to assigned

comment:20 Changed 3 years ago by tomek

  • Resolution set to complete
  • Status changed from assigned to closed

comment:21 Changed 3 years ago by stephen

  • Milestone changed from Kea1.2 to Kea 1.2 - Mozilla Milestone 1

Moved to milestone "Kea 1.2 - Mozilla Milestone 1" as a result of Kea meeting on 15 December 2016

Note: See TracTickets for help on using tickets.