Opened 10 years ago

Closed 9 years ago

#175 closed task (complete)

Loadzone: Add verbose options to exactly what is happening with loadzone

Reported by: zhanglikun Owned by: shentingting
Priority: medium Milestone: 05. 3rd Incremental Release: Serious Secondary
Component: loadzone Version:
Keywords: Cc:
CVSS Scoring: Parent Tickets:
Sensitive: no Defect Severity:
Sub-Project: Feature Depending on Ticket:
Estimated Difficulty: Add Hours to Ticket:
Total Hours: Internal?:

Description

Create ticket for task 89. The following part is the requirement for this task provided by Jeremy.

task 89: loadzone:verbose option to exactly what is happening with
name cleanup based on origin, and %done while running

Question: What does ?name cleanup? mean here?

I am guessing that "name cleanup" is to report what was derived for the origin (in the case that the origin is not defined on the command line).

We talked about various different "verbose" options for b10-loadzone, but for now:

  • report to STDERR
  • report origin name
  • show count of how many RRsets or records were loaded
  • report where the data is stored to (for now is a sqlite database file)
  • if possible, report whether or not a new database is created or an existing one is updated
  • maybe report whether a new zone was created or an existing zone was replaced
  • maybe periodically show a count of how many RRSets or records have been loaded

"%done" is for periodically showing the percentage of the master file data that is loaded until complete. I am not sure how relevant or easy that is to do now as I don't know if it pre-parses the loaded data to get something to compare with. It is probably not required in the first "verbose" option.

Subtickets

Attachments (5)

b10-loadzone.py.in (2.9 KB) - added by shentingting 10 years ago.
diff_b10-loadzone.txt (1.6 KB) - added by shentingting 10 years ago.
master.2.py (21.5 KB) - added by shentingting 10 years ago.
master.py (21.5 KB) - added by shentingting 10 years ago.
diff_master.txt (14.1 KB) - added by shentingting 10 years ago.

Download all attachments as: .zip

Change History (20)

Changed 10 years ago by shentingting

Changed 10 years ago by shentingting

comment:1 Changed 10 years ago by shentingting

  • Owner changed from zhanglikun to each
  • Status changed from new to reviewing

hello Evan:

I just finished the loadzone`s code, please help me to review it. I mainly added two function in master.py file for verbose feature, and fixed bug25. Until now i did not write any test case, but I tested it by hand. I do not know whether I should add some unit test cases.
Any suggestions?

Changed 10 years ago by shentingting

Changed 10 years ago by shentingting

Changed 10 years ago by shentingting

comment:2 follow-up: Changed 10 years ago by jreed

I tested loadzone from branches/tingting-loadzone at revision 1901.

The "including" line is printed even if not interactive.

Also carriage returns are printed. Maybe that is not needed when printing the "included" line.

If included zone filename is too long, it is cropped. So may be hard to read with long path. Maybe crop in middle and add ... dots? or just show basename of filename?

Maybe add a space after "seconds(s)" or before "(.*%" percentage)

Also for verbose, it may be useful to know if this data is overwriting or replacing existing zone or existing records.

I am unclear what the % percentage means. Sometimes I finishes with 0%, 100% or 99%. I didn't look at code to figure this out.

Maybe add a space after colon in error messages (so doesn't have text immediately after colon).

Overall looks great!

comment:3 in reply to: ↑ 2 ; follow-up: Changed 10 years ago by shentingting

Replying to jreed:

I tested loadzone from branches/tingting-loadzone at revision 1901.

The "including" line is printed even if not interactive.


hello jeremy.

Could you tell me how to test the loadzone in non-interactive! I tested it

like this:b10-loadzone include.db > /dev/null. There is no "including" line

printed. I checked the code and fixed this bug.

comment:4 in reply to: ↑ 3 Changed 10 years ago by jreed

Replying to shentingting:

Could you tell me how to test the loadzone in non-interactive! I tested it

like this:b10-loadzone include.db > /dev/null. There is no "including" line

printed. I checked the code and fixed this bug.

Yes, it is now fixed.

Tested like:

/home/reed/opt/bind10-loadzone/bin/b10-loadzone zone.foo | tee ~/J123

comment:5 Changed 10 years ago by shentingting

  • Owner changed from each to shane

hello shane.

I have already committed the loadzone code, please help me to review it! It is under branches/tingting-loadzone directory.


comment:6 Changed 10 years ago by shane

  • Owner changed from shane to zhanglikun

I haven't ordered these items. Some are minor, some require quite a bit of reworking. I'm not sure the best strategy given that we have a code freeze today. :(


Can you double-check the TODO file? With this update:

  • you can add origin to $INCLUDE
  • any line can end with a comment (this is done by cleanup() in master.py)
  • the code now issues an error if it can't figure out the origin (rather than using '.')
  • the verbose option gives lots of information

I think the last two TODO items are still open.


I think that there should be a newline, "\n", at the end of the output line:

Loading file "ttl1.db"

Otherwise this disappears immediately.


The output:

including "inclsub.db" from "include.db"

Should be capitalized:

Including "inclsub.db" from "include.db"

To make it consistent with the rest of the output.


I'm not so happy with how the tests work. I'd like to be able to run quick tests, but that is not possible with the current testing framework.

  • The tests do not run out of the source tree (maybe they make a directory in /tmp and use that via the "-d" option?)
  • There is no "pytest" target in Makefile.am (see src/bin/bind10/Makefile.am for a possible example)
  • I am not sure we should install these, so probably the EXTRA_DIST should be removed from the Makefile.am
  • Using "dig" to check the results means we have to have a server running. I think it would be best to check these using the data source API. It also means we need to have "xfrout" installed. I realize we don't have a Python wrapper now so this is not easy, so I guess this is okay for now.
  • Typo in tests: extenstion -> extension

I think some of these can be fixed easily, and should be.

I see that there is a separate ticket for this, #174. I guess some minor fixes from the above list should be done, and the rest of the work should go on in that ticket (copying this there).


b10-loadzone.py.in:

The "Committing changes... done" output is bogus. The idea was to start the output before committing, then finish it when the output was done:

    sys.stdout.write("Committing changes...")
    sys.sdtout.flush()
    # do the work of committing changes... the stuff at the end of sqlite3_ds.py
    sys.stdout.write("done.\n") 

We can either refactor the code to make the final write to disk a separate function, or output directly from that code (not recommended), or simply remove this output. I think that removing is the best option for now.


master.py:

Typo: formal -> format

The parse_ttl() function should probably use the re.findall() method rather than re.split(), maybe something like:

def parse_ttl(s):
    if not is_ttl(s):
        raise MasterFileError('Invalid TTL: ' + s)
    for ttl_expr in re.findall('\d+[wdhms]?', s, re.I):
        ttl = ttl_expr[:-1]
        suffix = ttl_expr[-1].lower()
        if suffix == 'w': 
            . 
            .
            .

It's simpler to read, I think.

In the records() function, I don't understand why the oldsize variable is needed. Since size is never modified, can't you just do something like the following?

    for line in input:
        size = len(line)
          .
          .
          .
        yield ret, size

There may be something about how yield works that I don't understand of course...

In the __init__() function for the MasterFile class, we should use fstat() rather than stat(). There are two reasons for this:

  1. The file may be overwritten with a new one in between the time the stat() and open() are called.
  2. More importantly, there may be an error with the stat() function. All sorts of file system errors could happen here - permissions problems, file not found, and so on. With fstat() you are much less likely to have a problem.

I recommend removing the stat() call and replacing it with something like:

        try:
            self.__zonefile = open(filename, 'r')
        except:
            raise MasterFileError("Could not open " + filename)
        self.__filesize = os.fstat(self.__zonefile.fileno()).st_size 

In the __status() function, you may have a problem if the file is 0 bytes. This is because of this line:

        percent = (self.__cur * 100)/self.__filesize

This should probably be something like:

        if self.__filesize == 0:
            percent = 100
        else:
            percent = (self.__cur * 100)/self.__filesize

This same logic also applies to the division in the zonedata() function.

There should be a space after the word "second(s)", everywhere it is used:

10 RR(s) loaded in 0.44 second(s)(100.00% of ttl1.db)
                                ^^^
                           a space here please

Using a separate thread to do the output is probably a bad idea. It's not that complicated, but it is more complicated than is necessary, and can lead to race conditions unless one is careful. For example, consider the following case:

  • The verbose() thread outputs 80 spaces
  • The main thread takes over and outputs 80 spaces
  • The verbose() thread outputs its update message
  • The main thread outputs that a new include has arrived

In this case, the output is completely ugly and not at all what is wanted. So I propose the code be refactored to not use threads:

  • Remove the threading stuff from the __status() function
  • Remove the verbose() function, and the call to it
  • Remove the icky "\r%s % (80 * " ") output everywhere, which can be better written as "\r" + (80 * " ") anyway. ;)
  • In the zonedata() function, call __status() periodically:
    def zonedata(self):
        name = ''
        last_status = 0.0

        for record, size in records(self.__zonefile):
            now = time.time()
            if now - last_status >= 1.0:
                self.__status()
                last_status = now

The __include() function will not work as written. For example, if you have a file name like: "This file name has spaces.db" then it will break. I suggest something like:

__include_syntax1 = re.compile('\s+(\S+)(?:\s+(\S+))?$', re.I)
__include_syntax2 = re.compile('\s+"([^"]+)"(?:\s+(\S+))?$', re.I)
__include_syntax3 = re.compile("\s+'([^']+)'(?:\s+(\S+))?$", re.I)
def __include(self, s):
    if not s.lower().startswith('$include'):
        return "", ""
    m = __include_syntax1.match(s)
    if not m:
        m = __include_syntax2.match(s)
    if not m:
        m = __include_syntax3.match(s)
    if not m:
        raise MasterFileError('Invalid $include format')
    file = m.group(1)
    if m.group(2):
        if not isname(m.group(2))
            raise MasterFileError('Invalid $include format (invalid origin)')
        origin = self.statedname(m.group(2), s)
    else:
        origin = self.__origin
    return file, origin

The code uses class variables a lot, with MasterFile.foo instead of instance variables, like self.foo. This will work, but effectively defines a global variable. This is wrong. :)

It was in the old version of the code, and it was *sometimes* apropriate, but not always. For example, __maxttl can be a class variable - although it should be written in all caps since it is a constant, __MAXTTL. However, __ttl, __lastttl, __zonefile, __name, and so on, are all actually instance variables and should be declared as such. I think this is okay for now, but I'd like to see a separate ticket opened to refactor this.

This expression is used a lot:

    MasterFile.__ttl or Masterfile.__lastttl

It might be better to make a function for this:

    def getTTL(self):
        return self.__ttl or self.__lastttl

Just to make it a bit shorter... not necessary though!

This code, at line 491:

#            if not ttl:
#                raise MasterFileError("No TTL specified; zone rejected")

Should be deleted.

comment:7 Changed 10 years ago by shane

  • Milestone changed from 04. 2nd Incremental Release: Early Adopters to 05. 3rd Incremental Release

comment:8 Changed 10 years ago by shane

  • Owner changed from zhanglikun to shentingting

comment:9 follow-up: Changed 9 years ago by shentingting

1 About the EXTRA_DIST question.

Tarball should include the tests according Jeremy`s suggestion. So I do not remove EXTRA_DIST from Makefile.am.

2 The default origin: I remembered there was a ticket that said the default origin problem. The code issues an error if it can't figure out the rather than using '.', that means we must give origin directly. I coded according that. But now I can not find the ticket.
So about this problem, any suggestions?

comment:10 in reply to: ↑ 9 Changed 9 years ago by jreed

Replying to shentingting:

2 The default origin: I remembered there was a ticket that said the default origin problem. The code issues an error if it can't figure out the rather than using '.', that means we must give origin directly. I coded according that. But now I can not find the ticket.
So about this problem, any suggestions?

I can't find a ticket either. But it is noted in src/bin/loadzone/TODO -- I think this ticket 175 also covers it but didn't mention it.

comment:11 Changed 9 years ago by zhanglikun

Some comment to the new added code of master.py:

  1. the code in function:'__four':

the code of block 341-345 and block 347-351 are same, need refactor.

  1. code in line 226.

self.filesize ==0 to self.filesize == 0

  1. refactor include():

syntax error in self. include_syntax1

comment:12 Changed 9 years ago by shentingting

  • Owner changed from shentingting to shane

Hello shane.

I amended the code as your suggestion and committed it again, please help me to review it! It is still under branches/tingting-loadzone directory.

Thank you!

comment:13 Changed 9 years ago by shane

  • Owner changed from shane to shentingting

Okay, there are a few minor points, but I think it is basically ready:

  • The TODO file still seems to list things that are fixed with this branch. I am not sure, though!
  • The error_test.sh script needs to add "-d zone.sqlite3" like the correct_test.sh script has.
  • In parse_ttl(), it is probably easier to use ttl_expr.isdigit() rather than `re.match('\d+$', ttl_expr).
  • In parse_ttl() I think there is an unnecessary copy:
        ttl = int(ttl_expr[:])   # this converts a copy of the ttl_expr string to an int
        ttl = int(ttl_expr)      # this does the exact same thing without the copy
    
  • Since this no longer uses threading, the "import threading" can be removed from the top of master.py

If you make these changes, I think it should be merged.

comment:14 Changed 9 years ago by shentingting

The code has been committed in r2340 in trunk, please check.

comment:15 Changed 9 years ago by jreed

  • Resolution set to complete
  • Status changed from reviewing to closed

I will close this ticket since merged to trunk. Thanks for the code! I may open new ticket(s) as needed later.

Note: See TracTickets for help on using tickets.