Opened 9 years ago

Closed 9 years ago

Last modified 9 years ago

#641 closed defect (fixed)

Memory leaks in NSAS

Reported by: vorner Owned by: stephen
Priority: medium Milestone: R-Team-Sprint-20110316
Component: Unclassified Version:
Keywords: Cc:
CVSS Scoring: Parent Tickets:
Sensitive: no Defect Severity:
Sub-Project: Feature Depending on Ticket:
Estimated Difficulty: 3.0 Add Hours to Ticket: 0
Total Hours: 0 Internal?: no

Description

One of the areas pointed to by output of #614 is the NSAS. I filtered out the error reports where the leak is obviously in the tests, not the NSAS itself, but there are still many places where valgrind says it is possibly or definitely lost.

It should be examined and found out if they are harmful and fixed.

The filtered output is attached.

Subtickets

Attachments (1)

nsas.valgrind (131.9 KB) - added by vorner 9 years ago.
Output of valgrind

Download all attachments as: .zip

Change History (11)

Changed 9 years ago by vorner

Output of valgrind

comment:1 Changed 9 years ago by shane

  • Milestone changed from R-Team-Task-Backlog to R-Team-Sprint-20110308

comment:2 Changed 9 years ago by stephen

  • Owner set to stephen
  • Status changed from new to assigned

comment:3 Changed 9 years ago by stephen

  • Owner changed from stephen to UnAssigned

Commit e0122c1e3219488a152350666165369a0e9fbe32

Main changes:

  • The NSAS now holds a "raw" pointer to the resolver.
  • The test resolver now has a destructor to call outstanding callbacks (which releases a shared pointer loop and allows objects in the test to be completely destroyed).
  • The memory leaks in the random number generator tests were due to ASSERT_DEATH statements. The random number generator code has been altered to throw exceptions (instead of calling abort()) on error and ASSERT_DEATHs have been replaced by EXPECT_THROWs.

(Still outstanding is testing the code in the resolver itself. Given the nearness of the end of the sprint, a review is requested in parallel with these tests taking place.)

comment:4 Changed 9 years ago by stephen

  • Status changed from assigned to reviewing

comment:5 Changed 9 years ago by vorner

  • Owner changed from UnAssigned to vorner

As requested in standup call, I'll take this review even from different team.

comment:6 Changed 9 years ago by vorner

  • Owner changed from vorner to stephen

Hello

First, about your concern if NSAS having only raw pointer to the resolver. If we assume that the only client of NSAS is the resolver itself, then it should be safe ‒ the NSAS has some code running only when the resolver requests it (either by asking for IP address or by calling a callback). We must ensure that we don't delete the resolver while it has any methods running, even in multithreaded version of resolver, therefore it should be OK. And the assumption seems reasonable to me (who else should be using NSAS?).

Anyway, in case we delete them both, the resolver should make sure all the pending callbacks are called, just like in the test resolver. But as this happens on shutdown only now, we probably can leave it out for now.

Secondly, the code. There are two minor nits:

  • In nameserver_address_store.cc, the newZone function ‒ there's a comment everything is passed as a pointer, so the compiler has it easier to prove it doesn't need to call the copy constructors of the shared pointers because of their side effects (which I believe it can't, in case of multithreaded application, because the ref count is different). If there was only a pointer (a primitive type), it knew there are no sideeffecs, allowing it to completely eliminate the boost::bind call to create a functor, inlining the whole thing. There's no longer need to have double pointer to the resolver once the there's no shared pointer.
  • The exceptions of random number generator ‒ I believe they should have a common ancestor, but something less general, than just Exception, so someone might just call all the „bad input of random number generator“ at once, without listing all of the cases, but without catching all exceptions. Maybe that common ancestor should also be subclass of InvalidParameter? or BadValue?.

Is it OK to have no changelog entry? Maybe yes, because the resolver didn't really exist in the previous releases in usable state, so a memory leak in the tests wasn't a bug worth noting.

With regards

comment:7 Changed 9 years ago by stephen

  • Owner changed from stephen to vorner

Anyway, in case we delete them both, the resolver should make sure all the pending callbacks are called, just like in the test resolver. But as this happens on shutdown only now, we probably can leave it out for now.

Fair enough. But as a future task should we also add the callbacks to a list inside the NSAS such that the NSAS destructor can call them? Alternatively, do we really need there to be a shared_ptr loop involving the callbacks, even if it is broken when they are called?

In nameserver_address_store.cc ... There's no longer need to have double pointer to the resolver once the there's no shared pointer.

Removed.

The exceptions of random number generator ... Maybe that common ancestor should also be subclass of InvalidParameter? or BadValue?.

The exceptions are now derived from BadValue.

Is it OK to have no changelog entry? Maybe yes, because the resolver didn't really exist in the previous releases in usable state, so a memory leak in the tests wasn't a bug worth noting.

That was my thought - it is a change invisible to the user.

Changes pushed as commit 788d9505e0bec5369ccecc6f51af957874063fd8

comment:8 Changed 9 years ago by vorner

  • Owner changed from vorner to stephen

Hello

Replying to stephen:

Anyway, in case we delete them both, the resolver should make sure all the pending callbacks are called, just like in the test resolver. But as this happens on shutdown only now, we probably can leave it out for now.

Fair enough. But as a future task should we also add the callbacks to a list inside the NSAS such that the NSAS destructor can call them? Alternatively, do we really need there to be a shared_ptr loop involving the callbacks, even if it is broken when they are called?

Well, I have no idea how to avoid the loop currently, because:

  • The zone entry needs to keep it's nameserver entries alive. It would be a bad situation if they just disappear and happen to be needed again.
  • The nameserver entry needs to keep the list of callbacks that should be called when data are ready there. They can be or not shared pointers, but they must not be deleted before they are called.
  • The callbacks need to keep the zone entries alive, because calling them when they are already dead is a bad thing and aborting the callbacks means not providing the answer the NSAS was asked for.

So, in that sense, anything from this list can't be deleted before the callback is called.

But list of the callbacks to delete/fail when NSAS is destroyed could be a solution.

Anyway, the change is OK. Please, merge.

Thanks

comment:9 Changed 9 years ago by stephen

  • Resolution set to fixed
  • Status changed from reviewing to closed

comment:10 Changed 9 years ago by stephen

  • Estimated Difficulty changed from 0.0 to 3
Note: See TracTickets for help on using tickets.