Opened 7 years ago

Closed 7 years ago

#2669 closed task (fixed)

Valgrind issues

Reported by: jreed Owned by: jelte
Priority: medium Milestone: Sprint-20130305
Component: Unclassified Version:
Keywords: Cc:
CVSS Scoring: Parent Tickets:
Sensitive: no Defect Severity: N/A
Sub-Project: Core Feature Depending on Ticket:
Estimated Difficulty: 5 Add Hours to Ticket: 0
Total Hours: 0 Internal?: no

Description

This is a ticket to tackle valgrind issues; just spend the allotted time (5 points) on it, tackling more valgrind issues.

Subtickets

Change History (13)

comment:1 Changed 7 years ago by jelte

  • Milestone changed from Next-Sprint-Proposed to Sprint-20130219

comment:2 Changed 7 years ago by jelte

  • Owner set to jelte
  • Status changed from new to assigned

comment:3 follow-up: Changed 7 years ago by jelte

  • Owner changed from jelte to UnAssigned
  • Status changed from assigned to reviewing

I got rid of most of the fork()-related issues by adding --child-silent-after-fork=yes (technically this could result in missed issues in forking unit tests, but currently it is reporting leaks from stack allocations...)

I ran it on the centos machine, and there were two issues left in my checkout:

  • one was an uninitialized memory block, fixed in the second commit
  • the other one appears to be inside the specific version of libsqlite on that old centos. I think I reported this as well last time, and more recent versions don't seem to have this issue. Anyway if we're not gonna upgrade, this is the repression I used on the centos buildbot:
    {
       Old libsqlite3 alloc failure
       Memcheck:Param
       write(buf)
       fun:__write_nocancel
       obj:/usr/lib64/libsqlite3.so.0.8.6
       obj:/usr/lib64/libsqlite3.so.0.8.6
       obj:/usr/lib64/libsqlite3.so.0.8.6
       obj:/usr/lib64/libsqlite3.so.0.8.6
       obj:/usr/lib64/libsqlite3.so.0.8.6
       obj:/usr/lib64/libsqlite3.so.0.8.6
       fun:sqlite3_step
       fun:_ZN3isc7datasrc18StatementProcessor4execEv
       fun:_ZN3isc7datasrc15SQLite3Accessor6commitEv
       fun:_ZN12_GLOBAL__N_128SQLite3Update_addRecord_Test8TestBodyEv
       fun:_ZN7testing8internal35HandleExceptionsInMethodIfSupportedINS_4TestEvEET0_PT_MS4_FS3_vEPKc
    }
    

comment:4 in reply to: ↑ 3 ; follow-up: Changed 7 years ago by jinmei

Replying to jelte:

Hmm, I'm afraid I don't know how to review this branch...

I got rid of most of the fork()-related issues by adding --child-silent-after-fork=yes (technically this could result in missed issues in forking unit tests, but currently it is reporting leaks from stack allocations...)

Could you elaborate a bit more? What does "leaks from stack
allocations" mean? Specifically what kind of error was reported for
which part of the code? Isn't there any false negatives by
suppressing this? (and if there is, are you suggesting suppressing
others at the cost of missing them is more beneficial?)

I ran it on the centos machine, and there were two issues left in my checkout:

  • one was an uninitialized memory block, fixed in the second commit

This one basically looks okay, although I'd do

            memset(data_, 0, sizeof(data_));

Another note is that you may need to include <cstring> for memset.
And, to this end, I'd personally even use vector<uint8_t> for data_
(then we won't have to worry about such low level thing as
initialization with such source-of-all-evil API as memset).

  • the other one appears to be inside the specific version of libsqlite on that old centos. I think I reported this as well last time, and more recent versions don't seem to have this issue. Anyway if we're not gonna upgrade, this is the repression I used on the centos buildbot:

I'm not sure about how to comment on this. Are you suggesting we
upgrade sqlite3 on the buildbox that runs valgrind?

comment:5 Changed 7 years ago by jinmei

  • Owner changed from UnAssigned to jelte

comment:6 in reply to: ↑ 4 ; follow-up: Changed 7 years ago by jelte

  • Owner changed from jelte to jinmei

Replying to jinmei:

Replying to jelte:

Hmm, I'm afraid I don't know how to review this branch...

I got rid of most of the fork()-related issues by adding --child-silent-after-fork=yes (technically this could result in missed issues in forking unit tests, but currently it is reporting leaks from stack allocations...)

Could you elaborate a bit more? What does "leaks from stack
allocations" mean? Specifically what kind of error was reported for
which part of the code? Isn't there any false negatives by
suppressing this? (and if there is, are you suggesting suppressing
others at the cost of missing them is more beneficial?)

Here's the first that gets hit on my system:

==4834==    at 0x4C286E7: operator new(unsigned long) (/tmp/buildd/valgrind-3.7.0/coregrind/m_replacemalloc/vg_replace_malloc.c:287)
==4834==    by 0x572A998: std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator<char> const&) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.17)
==4834==    by 0x572C384: char* std::string::_S_construct<char const*>(char const*, char const*, std::allocator<char> const&, std::forward_iterator_tag) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.17)
==4834==    by 0x572C462: std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(char const*, std::allocator<char> const&) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.17)
==4834==    by 0x4518D9: isc::util::(anonymous namespace)::InterprocessSyncFileTest_TestLock_Test::TestBody() (/home/jelte/repos/bind10/src/lib/util/tests/interprocess_sync_file_unittest.cc:56)
==4834==    by 0x4A68B9: void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (/home/jelte/apps/gtest-1.6.0/src/gtest.cc:2090)
==4834==    by 0x49C3A6: testing::Test::Run() (/home/jelte/apps/gtest-1.6.0/src/gtest.cc:2162)
==4834==    by 0x49C47D: testing::TestInfo::Run() (/home/jelte/apps/gtest-1.6.0/src/gtest.cc:2338)
==4834==    by 0x49C5CC: testing::TestCase::Run() (/home/jelte/apps/gtest-1.6.0/src/gtest.cc:2445)
==4834==    by 0x49C83C: testing::internal::UnitTestImpl::RunAllTests() (/home/jelte/apps/gtest-1.6.0/src/gtest.cc:4237)
==4834==    by 0x4A6439: bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) (/home/jelte/apps/gtest-1.6.0/src/gtest.cc:2090)
==4834==    by 0x49BA29: testing::UnitTest::Run() (/home/jelte/apps/gtest-1.6.0/src/gtest.cc:3874)
==4834==    by 0x40CE9E: main (/home/jelte/repos/bind10/src/lib/util/tests/run_unittests.cc:24)
==4834== 
{
   <insert_a_suppression_name_here>
   Memcheck:Leak
   fun:_Znwm
   fun:_ZNSs4_Rep9_S_createEmmRKSaIcE
   fun:_ZNSs12_S_constructIPKcEEPcT_S3_RKSaIcESt20forward_iterator_tag
   fun:_ZNSsC1EPKcRKSaIcE
   fun:_ZN3isc4util12_GLOBAL__N_138InterprocessSyncFileTest_TestLock_Test8TestBodyEv
   fun:_ZN7testing8internal35HandleExceptionsInMethodIfSupportedINS_4TestEvEET0_PT_MS4_FS3_vEPKc
   fun:_ZN7testing4Test3RunEv
   fun:_ZN7testing8TestInfo3RunEv
   fun:_ZN7testing8TestCase3RunEv
   fun:_ZN7testing8internal12UnitTestImpl11RunAllTestsEv
   fun:_ZN7testing8internal35HandleExceptionsInMethodIfSupportedINS0_12UnitTestImplEbEET0_PT_MS4_FS3_vEPKc
   fun:_ZN7testing8UnitTest3RunEv
   fun:main
}

(note, perhaps it is slightly more clear that it is about the std::string creation if you change the test to first create a string variable and pass that to the constructor of InterProcessSyncLocker?)

From what I can tell, it's not even so much the fork() itself, it's the call to exit() that makes it not free all. Perhaps we can get around that by adding an ExitPlease? exception that is caught in the main() of our unit tests.

(simply returning instead of exit appears to make other tests fail, and making a local try/catch that exists has the same problem since then the exception itself isn't freed)

If you have any other ideas, or if i missed something obvious, please let me know :)

I ran it on the centos machine, and there were two issues left in my checkout:

  • one was an uninitialized memory block, fixed in the second commit

This one basically looks okay, although I'd do

            memset(data_, 0, sizeof(data_));

Another note is that you may need to include <cstring> for memset.
And, to this end, I'd personally even use vector<uint8_t> for data_
(then we won't have to worry about such low level thing as
initialization with such source-of-all-evil API as memset).

ack, changed to vector.

  • the other one appears to be inside the specific version of libsqlite on that old centos. I think I reported this as well last time, and more recent versions don't seem to have this issue. Anyway if we're not gonna upgrade, this is the repression I used on the centos buildbot:

I'm not sure about how to comment on this. Are you suggesting we
upgrade sqlite3 on the buildbox that runs valgrind?

actually i had my two systems mixed up; the buildbot version seems fine, it is my local centos VM that has the bad libsqlite :) (my original plan was to either upgrade or suppress that one issue)

comment:7 in reply to: ↑ 6 ; follow-up: Changed 7 years ago by jinmei

Replying to jelte:

I got rid of most of the fork()-related issues by adding --child-silent-after-fork=yes (technically this could result in missed issues in forking unit tests, but currently it is reporting leaks from stack allocations...)

Could you elaborate a bit more? What does "leaks from stack
allocations" mean? Specifically what kind of error was reported for
which part of the code? Isn't there any false negatives by
suppressing this? (and if there is, are you suggesting suppressing
others at the cost of missing them is more beneficial?)

Here's the first that gets hit on my system:

[...]

(note, perhaps it is slightly more clear that it is about the std::string creation if you change the test to first create a string variable and pass that to the constructor of InterProcessSyncLocker?)

From what I can tell, it's not even so much the fork() itself, it's the call to exit() that makes it not free all. Perhaps we can get around that by adding an ExitPlease? exception that is caught in the main() of our unit tests.

(simply returning instead of exit appears to make other tests fail, and making a local try/catch that exists has the same problem since then the exception itself isn't freed)

If you have any other ideas, or if i missed something obvious, please let me know :)

This seems to me to be another reason why it's a bad idea to try to
invent such low level wheel ourselves:-) That aside, in this
particular case, can't we delay defining of sync and locker after
fork and have the child wait on the pipe until the parent acquires the
lock? But, even if it works, other cases would have to be handled
case-by-case basis, so it may not help much.

Since I don't know how many other cases we have and how difficult they
are to work around, it's hard to say whether the global suppression is
a reasonable choice. If the cases are not so many (and assuming the
fork generally happens in unit tests), another possibility is to skip
these test cases via gtest filter when running valgrind on buildbots.
That way, we can possibly remove the exceptions one-by-one.

  • the other one appears to be inside the specific version of libsqlite on that old centos. I think I reported this as well last time, and more recent versions don't seem to have this issue. Anyway if we're not gonna upgrade, this is the repression I used on the centos buildbot:

I'm not sure about how to comment on this. Are you suggesting we
upgrade sqlite3 on the buildbox that runs valgrind?

actually i had my two systems mixed up; the buildbot version seems fine, it is my local centos VM that has the bad libsqlite :) (my original plan was to either upgrade or suppress that one issue)

So I interpret it as we don't have to worry about this in the context
of this branch.

comment:8 Changed 7 years ago by jinmei

  • Owner changed from jinmei to jelte

comment:9 in reply to: ↑ 7 ; follow-up: Changed 7 years ago by jelte

  • Owner changed from jelte to jinmei

Replying to jinmei:

This seems to me to be another reason why it's a bad idea to try to
invent such low level wheel ourselves:-) That aside, in this
particular case, can't we delay defining of sync and locker after
fork and have the child wait on the pipe until the parent acquires the
lock? But, even if it works, other cases would have to be handled
case-by-case basis, so it may not help much.

I had looked into that but the problem is more fundamental than that; we'd need to get rid of exit(), and even that may only solve a few issues (valgrind can't seem to handle threads either, I just found out on my centos instance)

Since I don't know how many other cases we have and how difficult they
are to work around, it's hard to say whether the global suppression is
a reasonable choice. If the cases are not so many (and assuming the
fork generally happens in unit tests), another possibility is to skip
these test cases via gtest filter when running valgrind on buildbots.
That way, we can possibly remove the exceptions one-by-one.

Actually, this reminded me of a similar issue in death tests (Also for similar reasons), and for now I have the approach from there; check if valgrind is running and if so skip automatically.

Since this also skips on systems where it might work (or appear to work, as i suspect that technically valgrind is correct) it may not be the very best solution, but at least we don't have to muck around with filters and name conventions (and should we decide to want to go that way, it should be relatively easy to grep for calls to runningOnValgrind() now).

So latest push has more runningOnValgrind() checks (and a few include order fixes), and if we don't like that the alternative I can see is to rename those tests to something like <test_name>_SKIP_FOR_VALGRIND, and set that up as a buildbot filter.

comment:10 in reply to: ↑ 9 Changed 7 years ago by jinmei

Replying to jelte:

So latest push has more runningOnValgrind() checks (and a few include order fixes), and if we don't like that the alternative I can see is to rename those tests to something like <test_name>_SKIP_FOR_VALGRIND, and set that up as a buildbot filter.

I'm okay with this approach. You could have use runningOnValgrind()
like

    if (runningOnValgrind()) {
        return;
    }

without touching the rest of the code to reduce the amount of diff
(and keep it shorter in terms of consumed columns), but at this point
it probably does not make much sense to make that change. I'd leave
it to you.

One last comment: I suggest adding to check_valgrind.h comments (doc)
that we generally need to use this function and skip test code that
uses fork() or threads.

comment:11 Changed 7 years ago by jinmei

  • Owner changed from jinmei to jelte

comment:12 Changed 7 years ago by jelte

I have merged the branch, but am keeping the ticket open for a short while; I suspect we need to slightly modify the -I flags so valgrind detection works on the buildbot, and perhaps this hides some final issues.

comment:13 Changed 7 years ago by jelte

  • Resolution set to fixed
  • Status changed from reviewing to closed

There were some final issues in code I did not compile myself by default (dhcp mysql manager), but I have made a fix and put it in a new ticket, #2821.

That *should* address all of them, so I'm closing this ticket.

Note: See TracTickets for help on using tickets.