Opened 9 years ago

Closed 8 years ago

Last modified 8 years ago

#216 closed enhancement (fixed)

Xfrin: Implement the feature items in TODO file.

Reported by: zhanglikun Owned by: shane
Priority: medium Milestone:
Component: xfrin Version:
Keywords: Cc:
CVSS Scoring: Parent Tickets:
Sensitive: no Defect Severity: N/A
Sub-Project: DNS Feature Depending on Ticket:
Estimated Difficulty: 0.0 Add Hours to Ticket: 0
Total Hours: 5.0 Internal?: no

Description (last modified by shane)

I'm moving this to the A-Team backlog, and removing Tingting as the owner, since she's not on the project any more.

Subtickets

Attachments (3)

xfrin.diff (11.7 KB) - added by shentingting 9 years ago.
xfrin.2.diff (11.7 KB) - added by shentingting 9 years ago.
xfrin.3.diff (6.1 KB) - added by shentingting 9 years ago.

Download all attachments as: .zip

Change History (53)

comment:1 Changed 9 years ago by shane

  • Milestone set to feature backlog item

comment:2 follow-up: Changed 9 years ago by shentingting

  • billable set to 1
  • Internal? unset

some changes in xfrin code:

  1. Add config_handler function to make new config data to be applied.[TODO list #1]
  2. Delete unnecessary mutex on recorder, remove quato and repeat check to main thread.#2
  3. I checked that if xfrin fails, the garbage data is not in DB file. but, when all zone data are in the same zone file and multi zone xfrin process at the same time, there is data access conflicts.#7
  4. Deal with connection exception(the master closes connection etc.) during transmission in handle_read function#12
  5. Using socket pair communication mechanism to achieve quickly terminate on shutdown command, and still using a global flag. This is a double-check mechanism. I changed threading.Event() to a global flag to achieve the same function, this eliminates some confuse about thread.Event`s abuse.#13,#15,#16
  6. Amend test code.

comment:3 in reply to: ↑ 2 Changed 9 years ago by shentingting

btw, the code is in /branches/trac216. version 2573

comment:4 Changed 9 years ago by shentingting

  • Owner changed from zhanglikun to jinmei
  • Status changed from new to reviewing

Hello jinmei!

Until now, I have made some changes to XFRIN and committed the code. could you

help me to review it? Thanks!

comment:5 follow-up: Changed 9 years ago by jinmei

  • Owner changed from jinmei to zhanglikun

Quick initial checks:

  • are these changes confirmed by tests?
  • what's the latest code coverage?
  • please add proposed changelog entry

comment:6 in reply to: ↑ 5 Changed 9 years ago by shentingting

Yes, I tested this changes.

coverage report:
Name Stmts Exec Cover


xfrin 362 291 80%
xfrin_test 313 307 98%


TOTAL 675 598 88%

Replying to jinmei:

Quick initial checks:

  • are these changes confirmed by tests?
  • what's the latest code coverage?
  • please add proposed changelog entry

comment:7 Changed 9 years ago by jinmei

  • Owner changed from zhanglikun to shentingting
  • It doesn't pass tests at least in my environment (see the end of this message). Please fix them first.
  • As for the coverage, 91% of xfrin.py is covered in the current trunk. If it decreased to 80% in this patch that wouldn't be good.
  • A minor editorial nit. There's an awkward line currently commented out. If it's not necessary, please remove it. If it plans to be completed but cannot be so for some reason right now, please clarify that in a comment.
    +    def test_connect(self):
    +        #self.assertEqual(, "")
    
  • Another minor editorial nit in the ChangeLog entry. I've directly fixed it in the branch (r2644)
Making check in tests
make  check-local
for pytest in xfrin_test.py ; do \
	echo Running test: $pytest ; \
	env PYTHONPATH=/Users/jinmei/src/isc/bind10/branches/trac216/src/lib/dns/.libs:/Users/jinmei/src/isc/bind10/branches/trac216/src/lib/dns/python/.libs:/Users/jinmei/src/isc/bind10/branches/trac216/src/bin/xfrin:/Users/jinmei/src/isc/bind10/branches/trac216/src/lib/python:/Users/jinmei/src/isc/bind10/branches/trac216/src/lib/python \
	/opt/local/bin/python3.1 /Users/jinmei/src/isc/bind10/branches/trac216/src/bin/xfrin/tests/$pytest ; \
	done
Running test: xfrin_test.py
.[b10-xfrin] test error
[b10-xfrin] Error happened! is the command channel daemon running?
.[b10-xfrin] test exception
.[b10-xfrin] exit b10-xfrin
....E.E.....................................
======================================================================
ERROR: test_command_handler_retransfer_inprogress (__main__.TestXfrin)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/jinmei/src/isc/bind10/branches/trac216/src/bin/xfrin/tests/xfrin_test.py", line 453, in test_command_handler_retransfer_inprogress
    self.xfr._zones[TEST_ZONE_NAME] = MockThread()
AttributeError: 'MockXfrin' object has no attribute '_zones'

======================================================================
ERROR: test_command_handler_retransfer_quota (__main__.TestXfrin)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/jinmei/src/isc/bind10/branches/trac216/src/bin/xfrin/tests/xfrin_test.py", line 442, in test_command_handler_retransfer_quota
    self.xfr._zones[str(i) + TEST_ZONE_NAME] = MockThread()
AttributeError: 'MockXfrin' object has no attribute '_zones'

----------------------------------------------------------------------
Ran 47 tests in 0.084s

FAILED (errors=2)
make[2]: *** [check-local] Error 1
make[1]: *** [check-am] Error 2
make: *** [check-recursive] Error 1

comment:8 follow-up: Changed 9 years ago by shentingting

I am very sorry! I forgot to change xfrin_test.py after I changed a variable name

from self._zones to self._threads_zones. Now, it works well!

comment:9 in reply to: ↑ 8 Changed 9 years ago by jinmei

Replying to shentingting:

I am very sorry! I forgot to change xfrin_test.py after I changed a variable name

from self._zones to self._threads_zones. Now, it works well!

Okay, thanks.

Before performing full review, however, please improve test coverage:

  • XfrinConnection.send/recv/_loop/close must be tested using an apprproiate mock. The current trunk version of xfrin does that. See also xfrout. It uses a mock Socket class for testing.
  • Likewise, in Xfrin._cc_setup we should check config_data is correctly handled (you'll probably need a mock version of ModuleCCSession).

Other things I noted:

  • please also fixed the editorial issue I mentioned in my previous comment
  • why did we switch from asyncore to select? I don't know this migration is inevitable or a good idea, but in any case such a design decision should be documented.

comment:10 follow-up: Changed 9 years ago by jinmei

  • Milestone changed from feature backlog item to y2 6 month milestone

comment:11 in reply to: ↑ 10 ; follow-up: Changed 9 years ago by shentingting

Replying to jinmei:

Now I am thinking about adding test case problem. for xfrinConnection.send/recv/_loop/close are copied from asyncore library, and those functions are all called based socket API. so I don`t think it is necessary to add new test case for those function.

why did we switch from asyncore to select?

There are several reasons:

  1. It is not necessary to use asyncore, since each thread is responsible for a single xfr connection, socket operations can safely block (with timeouts). this should be easily implemented using the bear socket module, and the code would look like more straightforward by avoiding complicated logic for asynchrony.
  2. we'd need an explicit communication channel (e.g. a socket pair) between the parent thread and xfr connection thread, through which a shutdown notification would be sent to the child. With this approach each thread needs to watch at least two channels, and then it would need some asynchronous communication mechanism. But the socket pair is not asyncore.dispatcher type or its subtype, and it is difficult to switch it into a type that asyncore can listen. So it can not put this socket pair into map, the notify channel is difficult to build. Select can compelte the same function as asyncore in xfrin deamon, and can build a new notify channel to deal with quickly terminate on shutdown command. So I switch from asyncore to select.
  3. The asyncores increases complexity, it is more difficult to read code.


comment:12 in reply to: ↑ 11 Changed 9 years ago by jinmei

Replying to shentingting:

Replying to jinmei:

Now I am thinking about adding test case problem. for xfrinConnection.send/recv/_loop/close are copied from asyncore library, and those functions are all called based socket API. so I don`t think it is necessary to add new test case for those function.

We do need tests for these. Tests are not only for checking they work correctly today, but also for ensuring no (or less if any) regression happens with future changes.

I understand it's not straightforward to test code using low level API such as socket I/O (if that's what you mean by "those functions are all called based socket API"). But it's possible via some indirection techniques I already mentioned, and it's important because behaviors around socket I/O is a core part of network applications.

Please add tests.

why did we switch from asyncore to select?

There are several reasons:

  1. It is not necessary to use asyncore, since each thread is responsible for a single xfr connection, socket operations can safely block (with timeouts). this should be easily implemented using the bear socket module, and the code would look like more straightforward by avoiding complicated logic for asynchrony.
  2. we'd need an explicit communication channel (e.g. a socket pair) between the parent thread and xfr connection thread, through which a shutdown notification would be sent to the child. With this approach each thread needs to watch at least two channels, and then it would need some asynchronous communication mechanism. But the socket pair is not asyncore.dispatcher type or its subtype, and it is difficult to switch it into a type that asyncore can listen. So it can not put this socket pair into map, the notify channel is difficult to build. Select can compelte the same function as asyncore in xfrin deamon, and can build a new notify channel to deal with quickly terminate on shutdown command. So I switch from asyncore to select.
  3. The asyncores increases complexity, it is more difficult to read code.


Okay. The second point makes sense (although I've not confirmed if it's impossible with asyncore - but I trust you:-). Such an implementation decision isn't obvious to others, so please document it in the form of code comment explicitly.

(While I have no opinon/preference on asyncore vs select per se) The first and third reasons aren't convincing to me, btw. In fact, the first point doesn't make sense because the resulting code with select actually uses the asynchronous nature of select. Arguments on complexity (the third point) are often a matter of taste, and may not always be a tie-breaking reason if that's a controversial choice.

Anyway I'm not disagreeing with the change itself. I just wondered why, and I just want the decision to be documented properly.

comment:13 Changed 9 years ago by shentingting

okay, I added some new test case yesterday. And I committed the code to trac216.

coverage report:


Name Stmts Exec Cover


xfrin 363 318 87%
xfrin_test 372 365 98%


TOTAL 735 683 92%

comment:14 Changed 9 years ago by jinmei

  • Owner changed from shentingting to jinmei

comment:15 follow-up: Changed 9 years ago by jinmei

I'm going to review the patch. Meanwhile, please provide suggested ChangeLog? entry.

comment:16 in reply to: ↑ 15 Changed 9 years ago by jinmei

Replying to jinmei:

I'm going to review the patch. Meanwhile, please provide suggested ChangeLog? entry.

(Sorry, I missed the committed one in the branch)

comment:17 Changed 9 years ago by jinmei

I'm reviewing the branch. It's not completed yet, but I'd like to provide some initial feedback on the usage of select.select.

  • the test (test_loop) doesn't actually seem to test what is to be tested. with this test select.select returns simply because the mock socketpair[0] is closed right after setUp(). we should actually test the cases where
    • select returns successfully with data
    • select timeouts
    • select returns due to a shutdown message
  • method names "loop" is now misleading because we switched the API
  • we could avoid having a separate "callback" method for recv after select, that is, handle_read(). in fact, according to the background reason for the switch, it would be better to unify the select+read logic as much as possible, as long as we can perform sufficient tests.
  • the current code returns if select fails due to EINTR. is that the intentional behavior? the convention in this case (as far as I know) is to ignore EINTR and continue the select until a required event happens or select timeouts.

I've committed a proposed change to address these points (with other cleanups) to the branch. It's r2807. Please check it.

comment:18 Changed 9 years ago by jinmei

  • Add Hours to Ticket changed from 0.0 to 3.0
  • Total Hours changed from 0.0 to 3.0

comment:19 Changed 9 years ago by jinmei

Okay, I've completed my review for branches/trac216.

In addition to the initial feedback, here are the review comments:

xfrin.py(.in)

  • I suspect we now don't have to pass shutdown_flag to XfrinConnection?? I also even think we don't need this flag within this class now that we've changed how to notify threads of a shutdown event.
  • I believe we should use select for connect and send while watching a shutdown event. consider, e.g., the case where a shutdown event is sent when the primary server is dead and connect blocks. but it's okay for me to leave this enhancement to another ticket (or we should rather do so)
  • why making the socket blocking explicity? this seems to be redundant and unnecessary. (also note: if and when we make connect asynchronous, we should rather make it non blockng)
    +        self._socket.setblocking(1)
    
  • why using connect_ex instead of normal connect?
    +        err = self._socket.connect_ex(address)
    
    this is not necessarily incorrect, but isn't consistent with other style of our code. also, (according to Evan:-) relying on error codes instead of exceptions is not "pythonic". note, again, we should probably use connect_ex if and when we do asynchronous connect.
  • in the shutdown event handling of XfrinConnection?._select() (note: I renamed it in r2807), I think we should check the received data, not just consider it a shutdown due to the readability of the socket.
  • Xfrin.init: the initialization of _max_transfers_in doesn't make sense because it won't be used anywhere and will soon be overridden in _cc_setup(). Besides, hardcoding such a magic number in the code is not a good practice.
  • it's not clearly what these comments mean. Also the lines are too long.
            #the item in self._threads_zones: zone name and xfr communication thread. 
            #The item in self._conn_sockets: a socket and xfr communication thread, the main thread uses 
            #the socket to communicate with this xfr communication thread.
    
  • the name _conn_sockets is confusing because this is a hash from sockets to threads. something like _conn_sockets_to_threads would be better (and, in whic case _threads_zones would be renamed to _zones_to_threads, etc)
  • I'd rename 'fd' to 'socket' (or something) here:
            for fd in self._conn_sockets.keys():
                fd.send(b"shutdown")
    
    because these are not "FDs (file descriptors)".
  • this is largely a matter of taste, but _filter_hash() could be simpler:
        def _filter_hash(self, hash):
            '''delete zone_name in self._threads_zones or a socket in self._conn_sockets.'''
            for key in [k for k in hash.keys()]:
                if not (hash[key]).is_alive():
                    del hash[key]
    
    btw, the function description is not very clear and not really correct about _conn_sockets in that what's deleted is not 'socket'.
  • isn't it better to delete completed threads as soon as possible? in the current implementation they are not purged until a new xfrin attempt occurs.
  • keys of _threads_zones should be pairs of "zone_name and rrclass". and, this case should be tested.
  • configuration update for "transfers_in" should be tested.

xfrin_test.py

  • I guess MockXfrinConnection?.connect_to_master() should return True.
  • please don't leave a comment line like this.
    +        #self._max_transfers_in = 10
    
    if it's not necessary, remove it; if it's necessary but is intentionally commented out, please also comment why.
  • I believe I've pointed this out multiple times, but it's still not fixed. Please clean up the commented-out part.
    +    def test_connect(self):
    +        #self.assertEqual(, "")
    

comment:20 Changed 9 years ago by jinmei

  • Add Hours to Ticket changed from 0.0 to 2.0
  • Owner changed from jinmei to shentingting
  • Total Hours changed from 3.0 to 5.0

comment:21 Changed 9 years ago by shane

  • Owner changed from shentingting to UnAssigned

comment:22 Changed 9 years ago by shane

  • Owner changed from UnAssigned to shentingting

Tingting, can you please comment on the work done for this ticket so whoever reviews it can know what the last changes were? Thanks!

comment:23 Changed 9 years ago by shentingting

some changes in xfrin code:

  1. Add config_handler function to make new config data to be applied.[TODO list #1]
  2. Delete unnecessary mutex on recorder, remove quato and repeat check to main thread.#2
  3. I checked that if xfrin fails, the garbage data is not in DB file. but, when all zone data are in the same zone file and multi zone xfrin process at the same time, there is data access conflicts.#7
  4. Deal with connection exception(the master closes connection etc.) during transmission in handle_read function#12
  5. Using socket pair communication mechanism to achieve quickly terminate on shutdown command, and still using a global flag. This is a double-check mechanism. I changed threading.Event() to a global flag to achieve the same function, this eliminates some confuse about thread.Event`s abuse.#13,#15,#16
  6. Amend test code.

some changes after JinMei? reviewed.

  1. delete self._socket.setblocking(1).
  2. change err = self._socket.connect_ex(address) to self._socket.connect(address), in order to use exception to deal with error.
  3. delete some comments and some useless lines.
  4. make _filter_hash() function more simpler.
  5. change two variables name. _conn_sockets rename _conn_sockets_to_threads

_threads_zones rename _zones_to_threads.

comment:24 follow-up: Changed 9 years ago by zhanglikun

  • Owner changed from shentingting to jinmei

Jinmei, I think tingting has forgot to reassign the review task to you. could you have a review for the latest change she did. so the latter change for xfrin can be down after this is merged into trunk.

comment:25 in reply to: ↑ 24 ; follow-up: Changed 9 years ago by jinmei

Replying to zhanglikun:

Jinmei, I think tingting has forgot to reassign the review task to you. could you have a review for the latest change she did. so the latter change for xfrin can be down after this is merged into trunk.

Okay, but please provide proposed changelog entry.

comment:26 in reply to: ↑ 25 Changed 9 years ago by jinmei

Replying to jinmei:

Replying to zhanglikun:

Jinmei, I think tingting has forgot to reassign the review task to you. could you have a review for the latest change she did. so the latter change for xfrin can be down after this is merged into trunk.

Okay, but please provide proposed changelog entry.

Ah, I see it's in the branch.

Another request. This branch has been merged from trunk, which makes it difficult to identify the change to review. Could you make a separate branch to rebase it, or provide a separate diff file (if it's sufficiently small) that only contains relevant change?

Changed 9 years ago by shentingting

Changed 9 years ago by shentingting

comment:27 follow-up: Changed 9 years ago by shentingting

Because the latest change is small, so I attach a diff file to this ticket. the xfrin.2.diff is correct.

comment:28 in reply to: ↑ 27 ; follow-up: Changed 9 years ago by jinmei

  • Owner changed from jinmei to shentingting

Replying to shentingting:

Because the latest change is small, so I attach a diff file to this ticket. the xfrin.2.diff is correct.

xfrin.2.diff looks mostly okay except the following two points:

  • I'm not sure why we need this in XfrinConnection.connect():
                if why.args[0] not in (0, EISCONN):
                    raise
    
    Should "0" be really considered in this context explicitly? It seems to be a leftover from the previous version using _ex (where 0 should mean "success"). I'm also not sure why we can ignore EISCONN. Is there any valid scenario we catch EISCONN but it's not an error?
  • you removed the description for _threads_zones and _conn_sockets with renaming the attributes. it's not good. we should revise the description according to the name change.

Also, it's not clear to me whether all of my previous comments were addressed. For example, there doesn't seem to be any change for this point:

  • "Xfrin.init: the initialization of _max_transfers_in doesn't make sense..."
  • "configuration update for "transfers_in" should be tested."

(there are probably more)

Please clarify which of my points were addressed and which were not, and in the latter case, why not. I do not necessarily rqeuire all of the comments be addressed, but without the explanation I cannot be sure whether they were just forgotten or rejected due to some reasoable reason.

Giving the ticket back to tingting.

comment:29 in reply to: ↑ 28 ; follow-up: Changed 9 years ago by shentingting

Replying to jinmei:

Replying to shentingting:

Because the latest change is small, so I attach a diff file to this ticket. the xfrin.2.diff is correct.

xfrin.2.diff looks mostly okay except the following two points:

  • I'm not sure why we need this in XfrinConnection.connect():
                if why.args[0] not in (0, EISCONN):
                    raise
    
    Should "0" be really considered in this context explicitly? It seems to be a leftover from the previous version using _ex (where 0 should mean "success"). I'm also not sure why we can ignore EISCONN. Is there any valid scenario we catch EISCONN but it's not an error?
  • you removed the description for _threads_zones and _conn_sockets with renaming the attributes. it's not good. we should revise the description according to the name change.

Also, it's not clear to me whether all of my previous comments were addressed. For example, there doesn't seem to be any change for this point:

  • "Xfrin.init: the initialization of _max_transfers_in doesn't make sense..."
  • "configuration update for "transfers_in" should be tested."

(there are probably more)

Please clarify which of my points were addressed and which were not, and in the latter case, why not. I do not necessarily rqeuire all of the comments be addressed, but without the explanation I cannot be sure whether they were just forgotten or rejected due to some reasoable reason.

Giving the ticket back to tingting.

  1. "Xfrin.init" problem: I think the initialization is necessary. (1)In Xfrin._cc_setup function, after we new a isc.config.ModuleCCSession variable, we must call it`s start function. (2) the start function call a config_handler callback function, which we rewrite. (3) In config_handler function the first sentence is "self._max_transfers_in = new_config.get("transfers_in") or self._max_transfers_in" to make new config valiable. If we do not set neither xfrin.transfers_in config data nor the initialization of xfrin._max_transfers_in , this will make a error"[b10-xfrin] 'Xfrin' object has no attribute '_max_transfers_in'. Just like self._master_addr and self._master_port, which are initial.
  1. Actually I copy connect function, send and recv function from asyncore.py library, and make a little changes, 0 is success, a connected socket call connect function again, it will make a EISCONN code. Maybe this is make sense.
  1. In fact, I have already add test_config_handler function to test transfers_in.
  1. I am adding description for _conn_sockets_to_threads and _zones_to_threads.

comment:30 in reply to: ↑ 29 ; follow-up: Changed 9 years ago by jinmei

Replying to shentingting:

  1. "Xfrin.init" problem: I think the initialization is necessary.

Ah, okay. Quite tricky:-)

But another minor point of hardcoding a magic number isn't addressed yet.

  1. Actually I copy connect function, send and recv function from asyncore.py library, and make a little changes, 0 is success, a connected socket call connect function again, it will make a EISCONN code. Maybe this is make sense.

It still doesn't make sense to me. The asyncore code uses connect_ex(), so a return value of 0 is valid. Your current code uses connect(), which only handles exceptions, so 0 shouldn't be a valid value.

As for EISCONN, simply because asyncore() expects that doesn't justify the use of it in your specific case. asyncore() is a generic library function and may need to expect more general error cases. Please forget what asyncore() does, and explain whether EISCONN can validly happen *in your usage*.

  1. In fact, I have already add test_config_handler function to test transfers_in.

Okay.

  1. I am adding description for _conn_sockets_to_threads and _zones_to_threads.

I can't see it. Maybe that's a local change in your working directory?

Also, you still haven't answered my previous points one by one. What I'd expect is to go through my previous sets of comments:
http://bind10.isc.org/ticket/216?replyto=29#comment:17
http://bind10.isc.org/ticket/216?replyto=29#comment:19
and answer each bullet one by one.

comment:31 in reply to: ↑ 30 ; follow-up: Changed 9 years ago by shentingting

http://bind10.isc.org/ticket/216?replyto=29#comment:17

I saw your changes, and I agree with you.

http://bind10.isc.org/ticket/216?replyto=29#comment:19

isn't it better to delete completed threads as soon as possible? in the current implementation they are not purged until a new xfrin attempt occurs.

For this, We can delete completed threads as soon as possible. but when we delete completed threads, it will find this threads fd in hash map, this waste time. I purged it when a new xfrin attempt occurs. This will not run delete, it save time. Although It will occupy some memory, but It is little, its number less than max_transfers_in. So I did so.

I believe we should use select for connect and send while watching a shutdown event. consider, e.g., the case where a shutdown event is sent when the primary server is dead and connect blocks. but it's okay for me to leave this enhancement to another ticket (or we should rather do so)

I think it is better to create another ticket to solve this.

For the newest code, I delete connect function. For error code 0, it is success, so there is no exception, And add a DEFAULT_TRANSFER_IN variable to remove the hardcoding.

comment:32 in reply to: ↑ 31 ; follow-up: Changed 9 years ago by jinmei

First off, if you meant you completed your turn and waited for a response (I assumed so), please change the ticket owner. By leaving the owner (currently it's still you), we assume you are still working on it by default.

Replying to shentingting:

http://bind10.isc.org/ticket/216?replyto=29#comment:17

I saw your changes, and I agree with you.

http://bind10.isc.org/ticket/216?replyto=29#comment:19

isn't it better to delete completed threads as soon as possible? in the current implementation they are not purged until a new xfrin attempt occurs.

For this, We can delete completed threads as soon as possible. but when we delete completed threads, it will find this thread's fd in hash map, this waste time. I purged it when a new xfrin attempt occurs. This will not run delete, it save time. Although It will occupy some memory, but It is little, it's number less than max_transfers_in. So I did so.

That's not convincing to me. Searching a hash map is generally cheap, and if we argue that memory consumption is marginal because max_transfers_in is small (but we should remember this is configurable), that should also apply to the search cost.

But, I wouldn't necessarily insist it should be addressed within this ticket. If you don't agree, please move forward with the current code for now.

I believe we should use select for connect and send while watching a shutdown event. consider, e.g., the case where a shutdown event is sent when the primary server is dead and connect blocks. but it's okay for me to leave this enhancement to another ticket (or we should rather do so)

I think it is better to create another ticket to solve this.

I've created one (#370).

For the newest code, I delete connect function. For error code 0, it is success, so there is no exception, And add a DEFAULT_TRANSFER_IN variable to remove the hardcoding.

The diff from r3181 to r3183 looks okay.

But please answer my original question: have you addressed all of my previous comments, or is there any outstanding issue? By only selectively answering a subset of the points I cannot be sure about that.

In general, if I made comments like this:

  • Point 1: ...
  • Point 2: ... ...
  • Point n: ...

I'd expect a response as follows:

  • Point 1: I addressed this in rXXXX
  • Point 2: I don't agree with the suggestion because XXX. So I didn't touch the code for now. ...(covering all other points like this)
  • Point n: Good point, but I'd defer it to a different ticket. Will open one.

Or, if the points are minor, a comprehensive summary would be sufficient:

  • I believe I've addressed all of your comments in rXXXX. Please check.

But the responses I've seen so far are something like this:

  • (citing Point 2) I did XXX for this point in the latest code.

That makes me wonder "so, what about point 1, point3, ... and point n!?"

If you could be more comprehensive in subsequent post-review responses, that would be very helpful.

Changed 9 years ago by shentingting

comment:33 in reply to: ↑ 32 ; follow-up: Changed 9 years ago by shentingting

  • Owner changed from shentingting to jinmei

Thank you very much for all you suggestions, I am very sorry for my previous disordered responses. I will do it as you suggestions in the future.
Replying to jinmei:

First off, if you meant you completed your turn and waited for a response (I assumed so), please change the ticket owner. By leaving the owner (currently it's still you), we assume you are still working on it by default.

I am sorry again.

Replying to shentingting:

http://bind10.isc.org/ticket/216?replyto=29#comment:17

I saw your changes, and I agree with you.

http://bind10.isc.org/ticket/216?replyto=29#comment:19

isn't it better to delete completed threads as soon as possible? in the current implementation they are not purged until a new xfrin attempt occurs.

For this, We can delete completed threads as soon as possible. but when we delete completed threads, it will find this thread's fd in hash map, this waste time. I purged it when a new xfrin attempt occurs. This will not run delete, it save time. Although It will occupy some memory, but It is little, it's number less than max_transfers_in. So I did so.

That's not convincing to me. Searching a hash map is generally cheap, and if we argue that memory consumption is marginal because max_transfers_in is small (but we should remember this is configurable), that should also apply to the search cost.

But, I wouldn't necessarily insist it should be addressed within this ticket. If you don't agree, please move forward with the current code for now.

what you said is meaningful, but I still insist my design. maybe i should move forward firstly.

I believe we should use select for connect and send while watching a shutdown event. consider, e.g., the case where a shutdown event is sent when the primary server is dead and connect blocks. but it's okay for me to leave this enhancement to another ticket (or we should rather do so)

I think it is better to create another ticket to solve this.

I've created one (#370).

okay, thanks!

For the newest code, I delete connect function. For error code 0, it is success, so there is no exception, And add a DEFAULT_TRANSFER_IN variable to remove the hardcoding.

The diff from r3181 to r3183 looks okay.

okay.

But please answer my original question: have you addressed all of my previous comments, or is there any outstanding issue? By only selectively answering a subset of the points I cannot be sure about that.

yes I addressed all your previous comments. In r3288, I made some changes.
point1: I set socket non-blocking. because we use select function, it is asyncronous. So it is unify. In provious version, the socket is blocking.
point2: In connect function, I deleted 0 and EISCONN error type. Just as your said, 0 is success, which is not an exception, and EISCONN can not happen in our design.
ponit3: I changed select function, now it can select the socket which is writable or readable.
you can check those changes in xfrin.3.diff.

In general, if I made comments like this:

  • Point 1: ...
  • Point 2: ... ...
  • Point n: ...

I'd expect a response as follows:

  • Point 1: I addressed this in rXXXX
  • Point 2: I don't agree with the suggestion because XXX. So I didn't touch the code for now. ...(covering all other points like this)
  • Point n: Good point, but I'd defer it to a different ticket. Will open one.

Or, if the points are minor, a comprehensive summary would be sufficient:

  • I believe I've addressed all of your comments in rXXXX. Please check.

But the responses I've seen so far are something like this:

  • (citing Point 2) I did XXX for this point in the latest code.

That makes me wonder "so, what about point 1, point3, ... and point n!?"

If you could be more comprehensive in subsequent post-review responses, that would be very helpful.

comment:34 in reply to: ↑ 33 ; follow-up: Changed 9 years ago by jinmei

Replying to shentingting:

ponit3: I changed select function, now it can select the socket which is writable or readable.

Quick check: why do we need to check writability?
(intentionally leaving the owner being me)

comment:35 in reply to: ↑ 34 ; follow-up: Changed 9 years ago by shentingting

Replying to jinmei:

Replying to shentingting:

ponit3: I changed select function, now it can select the socket which is writable or readable.

Quick check: why do we need to check writability?
(intentionally leaving the owner being me)

because select is asyncronous, so I set the socket non-blocking. The connect function is non-blocking, it will return immediately when we call it, this will generate EINPROCESS exception, we will ignore this and use select to tell us whether it is really readable or writable.

comment:36 in reply to: ↑ 35 Changed 9 years ago by jinmei

Replying to shentingting:

ponit3: I changed select function, now it can select the socket which is writable or readable.

Quick check: why do we need to check writability?
(intentionally leaving the owner being me)

because select is asyncronous, so I set the socket non-blocking. The connect function is non-blocking, it will return immediately when we call it, this will generate EINPROCESS exception, we will ignore this and use select to tell us whether it is really readable or writable.

I've given a more complete review to the latest change, and, I regret to say this but I'm not so happy about it.

But before discussing that point, I'd like to check this point: are you sure you addressed (including rejecting some possibly) all points? From a quick check this one is still open:

  • I guess MockXfrinConnection.connect_to_master() should return True.

Admittedly this is a minor point, but since I've not seen any response on the comment or code change, I cannot be sure if you even did any decision about this, and this one also makes me wonder there may be other remaining issues.

Now, about the latest change. I guess you tried to address this comment:

  • I believe we should use select for connect and send while watching a shutdown event. consider, e.g., the case where a shutdown event is sent when the primary server is dead and connect blocks. but it's okay for me to leave this enhancement to another ticket (or we should rather do so)

I appreciate the effort, but the code doesn't solve the real issue: "while watching a shutdown event". This code essentially does blocking connect/write by doing select() only for the writability followed by the corresponding write operation. Also, scattering select() calls over the source code decreases readability.

As I commented before, I suspect this will be a sufficiently large changeset and we should defer it to a new ticket, especially because we've been spending quite a long time for this ticket already. May I respectfully suggest we cancel the changes for non-blocking connect/send?

Some other comments on specific parts of the code:

  • why do we need to cover EALREADY?
    +            if why.args[0] in (EINPROGRESS, EALREADY, EWOULDBLOCK):
    +                return 
    
    As far as I can see we shouldn't see this error in our scenario.
  • why was the "It's extracted..." comment removed?
             This method is a trivial asynchronous I/O routine using select.
    -        It's extracted from _get_request_response so that we can test the
    -        rest of the code without involving actual communication with a remote
    -        server.'''
    +        It checks which socket is writable or readable.'''
    
    This part still applies with the latest change, and IMO this type of "why we do this" is one of the most important comments. "What we do" type of comment like "It checks which..." may also help, but (IMO) is less important than "why" because, especially in such a trivial code, "what" is often obvious (for that matter, "writable or readable" is not really correct, because, at least according to the method interface, we also catch error events).

p.s. you don't have to feel sorry for anything. what I've been asking is basically "my rule", rather than agreed review protocol of the team. You may also have your own rule about how to handle review comments, and that's probably just different from mine. I've been asking what I've asked based on my rule, but you may think it's not reasonable. If so, please feel free to make an objection. We can then discuss where we can make a compromise.

comment:37 follow-up: Changed 9 years ago by jinmei

  • Owner changed from jinmei to shentingting

comment:38 in reply to: ↑ 37 ; follow-up: Changed 9 years ago by shentingting

  • Owner changed from shentingting to jinmei

Replying to jinmei:
Replying to jinmei:

Replying to shentingting:

ponit3: I changed select function, now it can select the socket which is writable or readable.

Quick check: why do we need to check writability?
(intentionally leaving the owner being me)

because select is asyncronous, so I set the socket non-blocking. The connect function is non-blocking, it will return immediately when we call it, this will generate EINPROCESS exception, we will ignore this and use select to tell us whether it is really readable or writable.

I've given a more complete review to the latest change, and, I regret to say this but I'm not so happy about it.

But before discussing that point, I'd like to check this point: are you sure you addressed (including rejecting some possibly) all points? From a quick check this one is still open:

  • I guess MockXfrinConnection.connect_to_master() should return True.

year, I miss this. I have already fixed this.

Admittedly this is a minor point, but since I've not seen any response on the comment or code change, I cannot be sure if you even did any decision about this, and this one also makes me wonder there may be other remaining issues.

Now, about the latest change. I guess you tried to address this comment:

  • I believe we should use select for connect and send while watching a shutdown event. consider, e.g., the case where a shutdown event is sent when the primary server is dead and connect blocks. but it's okay for me to leave this enhancement to another ticket (or we should rather do so)

I appreciate the effort, but the code doesn't solve the real issue: "while watching a shutdown event". This code essentially does blocking connect/write by doing select() only for the writability followed by the corresponding write operation. Also, scattering select() calls over the source code decreases readability.

As I commented before, I suspect this will be a sufficiently large changeset and we should defer it to a new ticket, especially because we've been spending quite a long time for this ticket already. May I respectfully suggest we cancel the changes for non-blocking connect/send?

Okay, I cancel the latest change and leave it to the next ticket.

Some other comments on specific parts of the code:

  • why do we need to cover EALREADY?
    +            if why.args[0] in (EINPROGRESS, EALREADY, EWOULDBLOCK):
    +                return 
    

I did not consider this. yes, I can not happen in XFRIN function. I deleted the EALREADY error code.

As far as I can see we shouldn't see this error in our scenario.

  • why was the "It's extracted..." comment removed?
             This method is a trivial asynchronous I/O routine using select.
    -        It's extracted from _get_request_response so that we can test the
    -        rest of the code without involving actual communication with a remote
    -        server.'''
    +        It checks which socket is writable or readable.'''
    
    This part still applies with the latest change, and IMO this type of "why we do this" is one of the most important comments. "What we do" type of comment like "It checks which..." may also help, but (IMO) is less important than "why" because, especially in such a trivial code, "what" is often obvious (for that matter, "writable or readable" is not really correct, because, at least according to the method interface, we also catch error events).

okay, I canceled the change.

p.s. you don't have to feel sorry for anything. what I've been asking is basically "my rule", rather than agreed review protocol of the team. You may also have your own rule about how to handle review comments, and that's probably just different from mine. I've been asking what I've asked based on my rule, but you may think it's not reasonable. If so, please feel free to make an objection. We can then discuss where we can make a compromise.

First, thanks again. From your rules I learned a lot. I am not careful enough and can not consider all aspect, which wastes a lot you time. Now, I will do this more carefully.

comment:39 in reply to: ↑ 38 ; follow-up: Changed 9 years ago by jinmei

One quick not: if we cancel asynchronous connect(), EWOULDBLOCK and EINPROGRESS won't normally happen.

comment:40 in reply to: ↑ 39 ; follow-up: Changed 9 years ago by jinmei

  • Owner changed from jinmei to shentingting

Replying to jinmei:

One quick not: if we cancel asynchronous connect(), EWOULDBLOCK and EINPROGRESS won't normally happen.

(I meant quick "check":-)

The same comment applies to XfrinConnection?.send()/recv()/close(): we should catch possible exceptions only. I suggest considering in which case each exception can happen. If you cannot give an explanation, we should probably remove it (of course, we should be careful not to ignore possible exceptions).

I also noticed these methods are not tested yet. As we discussed before earlier in this ticket. Please add tests for these methods.

Other than these it looks okay.

comment:41 in reply to: ↑ 40 ; follow-up: Changed 9 years ago by shentingting

  • Owner changed from shentingting to jinmei

Replying to jinmei:

Replying to jinmei:

One quick not: if we cancel asynchronous connect(), EWOULDBLOCK and EINPROGRESS won't normally happen.

(I meant quick "check":-)

yes, I delete the code. Now the connect function is not necessary, so I remove it.

The same comment applies to XfrinConnection?.send()/recv()/close(): we should catch possible exceptions only. I suggest considering in which case each exception can happen. If you cannot give an explanation, we should probably remove it (of course, we should be careful not to ignore possible exceptions).

I also noticed these methods are not tested yet. As we discussed before earlier in this ticket. Please add tests for these methods.

For send()/recv()/close(), I remove some exceptions that It can not happen. For the rest exception, I do not know how to test them. In xfrin_test, I mode these function to test other function. so Do you have any suggestion for this tests?

Other than these it looks okay.

comment:42 in reply to: ↑ 41 Changed 9 years ago by jinmei

  • Owner changed from jinmei to shentingting

Replying to shentingting:

I also noticed these methods are not tested yet. As we discussed before earlier in this ticket. Please add tests for these methods.

For send()/recv()/close(), I remove some exceptions that It can not happen. For the rest exception, I do not know how to test them. In xfrin_test, I mode these function to test other function. so Do you have any suggestion for this tests?

Can't we use MockSocket?

BTW, I've noticed a possible typo:

    def test_send_data_nodate(self):

I suspect should be "test_send_data_nodata".

On a related note, I don't think "mock_send" is a good name:

    def mock_send(self, data):
        return 0
[...]
        self.conn.send = self.conn.mock_send

because this method is (seeminly) intended to send some unexpected result.

Suggestions:

  • I'd use a more specific name to make the intent clear, e.g., mock_send_nodata.
  • I'd also describe the intent of this test in comments. It's not very clear just from the test code (you'll at least have to chec mock_send()).

comment:43 Changed 9 years ago by shane

  • Description modified (diff)
  • Milestone set to A-Team-Task-Backlog
  • Owner changed from shentingting to shane
  • Status changed from reviewing to accepted

comment:44 Changed 9 years ago by shane

  • Owner changed from shane to UnAssigned
  • Status changed from accepted to assigned

comment:45 Changed 9 years ago by stephen

  • Milestone A-Team-Task-Backlog deleted

Milestone A-Team-Task-Backlog deleted

comment:46 follow-up: Changed 8 years ago by shane

  • Defect Severity set to N/A
  • Owner changed from UnAssigned to jinmei
  • Sub-Project set to DNS

So... um. There seems to have been a lot of work here, and I don't know if it ever made it into trunk.

Jinmei, do you know the status? If not I'll ask Likun.

comment:47 in reply to: ↑ 46 Changed 8 years ago by jinmei

  • Milestone set to New Tasks

Replying to shane:

So... um. There seems to have been a lot of work here, and I don't know if it ever made it into trunk.

Jinmei, do you know the status? If not I'll ask Likun.

I'm not sure about the very latest status in terms of this TODO list
(I almost forgot its existence already:-). From a quick look some of
them seems to have been fixed:

  1. what if xfrin fails after opening a new DB? looks like garbage (intermediate) data remains in the DB file, although it's more about the data source implementation. check it, and fix it if it's the case.
  2. the current use of asyncore seems to be thread unsafe because it

relies on a global channel map (which is the implicit default).

  1. Check soa serial first when doing zone refreshment.
  2. Add configuration items to seperate zone, including ACL, multiple masters, etc.

(maybe more)

IMO we can close this ticket and reopen new tickets for specific
remaining problems.

comment:48 Changed 8 years ago by jinmei

  • Owner changed from jinmei to shane

comment:49 Changed 8 years ago by shane

  • Resolution set to fixed
  • Status changed from assigned to closed

Okay, I'll close this in the interests of keeping me sane. I think xfrin will need some love soon anyway.

comment:50 Changed 8 years ago by shane

  • Milestone New Tasks deleted
Note: See TracTickets for help on using tickets.