Opened 9 years ago

Closed 9 years ago

#957 closed defect (fixed)

IntervalTimerTest.destructIntervalTimer segfaults

Reported by: y-aharen Owned by: y-aharen
Priority: medium Milestone: Sprint-20110614
Component: Unclassified Version:
Keywords: Cc:
CVSS Scoring: Parent Tickets:
Sensitive: no Defect Severity: N/A
Sub-Project: Core Feature Depending on Ticket:
Estimated Difficulty: 3.0 Add Hours to Ticket: 0
Total Hours: 3.0 Internal?: no

Description

http://bind10.isc.org/~tester/2011/05/20110519145655-d2ee006/make-coverage.out

[ RUN      ] IntervalTimerTest.destructIntervalTimer
/bin/sh: line 5:  2412 Segmentation fault      ${dir}$tst
FAIL: run_unittests

http://bind10.isc.org/~tester/builder//BIND10/20110510165000-NetBSD5-amd64/logs/unittests.out

[ RUN      ] IntervalTimerTest.destructIntervalTimer
terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::bad_function_call> >'
  what():  call to empty boost::function
[1]   Abort trap (core dumped) ${dir}${tst}
FAIL: run_unittests

IntervalTimerTest.destructIntervalTimer fails occasionally. These are not timing issue.

deadline_timer handlers may be called after canceling the timer. To solve the problem, we should redesign IntervalTimer not to delete resources while deadline_timer is holding the handler.

Subtickets

Attachments (1)

timer.diff (1.1 KB) - added by jinmei 9 years ago.

Download all attachments as: .zip

Change History (18)

comment:1 Changed 50 years ago by y-aharen

  • Add Hours to Ticket changed from 0.0 to 3.0
  • Total Hours changed from 0.0 to 3.0

comment:1 Changed 9 years ago by y-aharen

  • Sub-Project changed from DNS to Core

comment:2 Changed 9 years ago by y-aharen

  • Status changed from new to accepted

comment:3 Changed 9 years ago by y-aharen

  • Owner changed from y-aharen to UnAssigned
  • Status changed from accepted to reviewing

branch trac957 is ready for reviewing.

comment:4 Changed 9 years ago by shane

  • Milestone changed from New Tasks to Year 3 Task Backlog

comment:5 Changed 9 years ago by shane

  • Milestone changed from Year 3 Task Backlog to Next-Sprint-Proposed

comment:6 Changed 9 years ago by stephen

  • Milestone changed from Next-Sprint-Proposed to Sprint-20110614

comment:7 in reply to: ↑ description ; follow-up: Changed 9 years ago by jinmei

Replying to y-aharen:

Quick initial check: there doesn't seem to be any change in the tests.

Normally we should first add a test case that could trigger the bug
without the proposed fix, fix the problem in the main code, and then
confirm the fix actually solves the problem with the previously added
test.

Isn't it possible in this case?

comment:8 Changed 9 years ago by jinmei

  • Owner changed from UnAssigned to y-aharen

comment:9 in reply to: ↑ 7 ; follow-up: Changed 9 years ago by y-aharen

  • Owner changed from y-aharen to jinmei

Replying to jinmei:

Normally we should first add a test case that could trigger the bug
without the proposed fix, fix the problem in the main code, and then
confirm the fix actually solves the problem with the previously added
test.

Isn't it possible in this case?

The segfaults will occur when IntervalTimer is destructed while the timer has been expired and the invocation of the callback function is pending. I couldn't make such circumstances continually in unittest.

I'd like to fix this to correctly manage the lifetime of an instance of IntervalTimer by using boost::shared_ptr and shared_from_this(), which seems to be best practice.

comment:10 in reply to: ↑ 9 ; follow-up: Changed 9 years ago by jinmei

Replying to y-aharen:

Replying to jinmei:

Normally we should first add a test case that could trigger the bug
without the proposed fix, fix the problem in the main code, and then
confirm the fix actually solves the problem with the previously added
test.

Isn't it possible in this case?

The segfaults will occur when IntervalTimer is destructed while the timer has been expired and the invocation of the callback function is pending. I couldn't make such circumstances continually in unittest.

I'd suggest adding some assertion check like the attached one. This
would make the bug more likely to happen without this fix.

I'd like to fix this to correctly manage the lifetime of an instance of IntervalTimer by using boost::shared_ptr and shared_from_this(), which seems to be best practice.

The code basically looks okay. I have some comments and suggestions
however:

  • I'd like to see more detailed comment about why we need shared_from_this, and, if that's a common practice, with a reference to this technique along with ASIO. I myself didn't even know the existence of shared_from_this much less the use of it with ASIO. If it's not only me, we need deeper explanation to help others understand the code.
  • I suspect this should never happen because the access to IntervalTimerImpl? is limited in interval_timer.cc. If this ever happens, it means a severe internal bug. So, I'd rather use assert() here.
    +    } catch (const boost::bad_weak_ptr& e) {
    +        isc_throw(isc::Unexpected, "Failed to update timer");
         }
    
  • And, although it's not for this branch, I'd suggest using more informative error, preferably convert the asio error to string:
         } catch (const asio::system_error& e) {
             isc_throw(isc::Unexpected, "Failed to update timer");
    
    Or, if it's not feasible, the unused 'e' should be removed.

Finally, I think the technique using shared_from_this is useful in
other cases (specifically for the resolver code) and is uncommon. So,
could you introduce it to the bind10-dev list so that everyone will be
aware of it?

comment:11 Changed 9 years ago by jinmei

  • Owner changed from jinmei to y-aharen

Changed 9 years ago by jinmei

comment:12 Changed 9 years ago by stephen

  • Estimated Difficulty changed from 0.0 to 3

comment:13 in reply to: ↑ 10 ; follow-up: Changed 9 years ago by y-aharen

  • Owner changed from y-aharen to jinmei

Replying to jinmei:

Replying to y-aharen:

Replying to jinmei:

Normally we should first add a test case that could trigger the bug
without the proposed fix, fix the problem in the main code, and then
confirm the fix actually solves the problem with the previously added
test.

Isn't it possible in this case?

The segfaults will occur when IntervalTimer is destructed while the timer has been expired and the invocation of the callback function is pending. I couldn't make such circumstances continually in unittest.

I'd suggest adding some assertion check like the attached one. This
would make the bug more likely to happen without this fix.

Thank you for your helpful suggestion. I finally got what was happening in the destruction of an instance of the class. I modified the main code. In commmit 3e1fae0b23d4df32f239e5aa1350e40557d8ada4 fails on IntervalTimerTest.destructIntervalTimer. In commit 49ae23f163b8e4ad45f7146f95235253691a0f4b the test will success.

I'd like to fix this to correctly manage the lifetime of an instance of IntervalTimer by using boost::shared_ptr and shared_from_this(), which seems to be best practice.

The code basically looks okay. I have some comments and suggestions
however:

  • I'd like to see more detailed comment about why we need shared_from_this, and, if that's a common practice, with a reference to this technique along with ASIO. I myself didn't even know the existence of shared_from_this much less the use of it with ASIO. If it's not only me, we need deeper explanation to help others understand the code.

I explained the detail with a link to asio documentation.

  • I suspect this should never happen because the access to IntervalTimerImpl is limited in interval_timer.cc. If this ever happens, it means a severe internal bug. So, I'd rather use assert() here.
    +    } catch (const boost::bad_weak_ptr& e) {
    +        isc_throw(isc::Unexpected, "Failed to update timer");
         }
    

Yes, the exception will be thrown only in case an internal bug. I modified the code to cause assertion failure instead of throwing isc::Unexpected.

  • And, although it's not for this branch, I'd suggest using more informative error, preferably convert the asio error to string:
         } catch (const asio::system_error& e) {
             isc_throw(isc::Unexpected, "Failed to update timer");
    
    Or, if it's not feasible, the unused 'e' should be removed.

I modified the code to convert the error to string.

Finally, I think the technique using shared_from_this is useful in
other cases (specifically for the resolver code) and is uncommon. So,
could you introduce it to the bind10-dev list so that everyone will be
aware of it?

I posted an e-mail to bind10-dev list. I hope it will be helpful.

comment:14 in reply to: ↑ 13 Changed 9 years ago by jinmei

The latest code looks okay. Please merge.

I also noticed your report to bind10-dev about the enabled_shared trick.
Thanks for that.

comment:15 Changed 9 years ago by jinmei

  • Owner changed from jinmei to y-aharen

comment:16 Changed 9 years ago by y-aharen

  • Add Hours to Ticket changed from 0 to 3
  • Resolution set to fixed
  • Status changed from reviewing to closed

Thank you very much for reviewing.
I merged the branch into trunk at e59c215e14b5718f62699ec32514453b983ff603.

Note: See TracTickets for help on using tickets.