Opened 6 years ago

Closed 3 years ago

#3461 closed defect (complete)

Some unit tests in libutil generate valgrind errors

Reported by: tmark Owned by: tomek
Priority: medium Milestone: Outstanding Tasks
Component: libutil Version:
Keywords: Cc:
CVSS Scoring: Parent Tickets:
Sensitive: no Defect Severity: N/A
Sub-Project: DHCP Feature Depending on Ticket:
Estimated Difficulty: 0 Add Hours to Ticket: 1
Total Hours: 7 Internal?: no

Description (last modified by tomek)

libtuil unit tests do no pass cleanly when run with valgrind:

This is only a warning:

[ RUN      ] ForwardTest.badPop
==12494== Warning: invalid file descriptor -1 in syscall close()

Also only a warning:

[ RUN      ] MemorySegmentLocal.TestTooMuchMemory
==12494== Warning: silly arg (-1) to malloc()

FDTest.read, FDTest.write, FDTest.share, FDTest.transfer all report leaking similiar to this:

[ RUN      ] FDTest.read
==12495== 
==12495== HEAP SUMMARY:
==12495==     in use at exit: 16,777,418 bytes in 10 blocks
==12495==   total heap usage: 6,124 allocs, 6,114 frees, 18,255,621 bytes allocated
==12495== 
==12495== LEAK SUMMARY:
==12495==    definitely lost: 0 bytes in 0 blocks
==12495==    indirectly lost: 0 bytes in 0 blocks
==12495==      possibly lost: 0 bytes in 0 blocks
==12495==    still reachable: 16,777,418 bytes in 10 blocks
==12495==         suppressed: 0 bytes in 0 blocks
==12495== Rerun with --leak-check=full to see details of leaked memory
==12495== 
==12495== For counts of detected and suppressed errors, rerun with: -v
==12495== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 6 from 6)

This one reports as a leak but may have something to do with the gtest warning about
the death-test. Still it should be investigated.

[ RUN      ] BufferTest.outputBufferReadat

[WARNING] /opt/gtest-1.6.0/src/gtest-death-test.cc:789:: Death tests use fork(), which is unsafe particularly in a threaded context. For this test, Google Test couldn't detect the number of threads.
==12499== 
==12499== HEAP SUMMARY:
==12499==     in use at exit: 36,248 bytes in 598 blocks
==12499==   total heap usage: 7,259 allocs, 6,661 frees, 26,948,784 bytes allocated
==12499== 
==12499== LEAK SUMMARY:
==12499==    definitely lost: 0 bytes in 0 blocks
==12499==    indirectly lost: 0 bytes in 0 blocks
==12499==      possibly lost: 7,626 bytes in 208 blocks
==12499==    still reachable: 28,622 bytes in 390 blocks
==12499==         suppressed: 0 bytes in 0 blocks
==12499== Rerun with --leak-check=full to see details of leaked memory
==12499== 
==12499== For counts of detected and suppressed errors, rerun with: -v
==12499== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 6 from 6)

Subtickets

Change History (11)

comment:1 Changed 6 years ago by tmark

  • Description modified (diff)

comment:2 Changed 6 years ago by tmark

  • Component changed from Unclassified to libutil

comment:3 Changed 5 years ago by tomek

  • Milestone changed from Kea-proposed to Kea0.9
  • Priority changed from low to medium

We decided to spend at most 2 days on this issue in 0.9 timeframe. If the underlying issue need more time to fix, we'll disable the test, remove that particular piece of code from compilation or do similar postponement actions.

comment:4 Changed 5 years ago by marcin

  • Add Hours to Ticket changed from 0 to 6
  • Status changed from new to reviewing
  • Total Hours changed from 0 to 6

I have spent some time investigating the problems reported by Valgrind in the util library and I have corrected the code in two places:

  • In fd_share.cc there were cases that the invalid descriptor was being closed. This was harmless because an attempt to close invalid descriptor was expected and handled properly and the close() function would return -1 in this case and the suitable error code. However, valgrind doesn't like this so I decided to check if the descriptor is valid or not on my own.
  • Apparently, Valgrind uses signed size_t type and an attempt to pass a very high positive value for size_t (such as ULONG_MAX) is cast to -1 by Valgrind. This results in false positive that the size passed to malloc is -1 (in fact it is ULONG_MAX). Although, it is the issue in Valgrind I thought it would be harmless to change the passed value to LONG_MAX.

For the fd_read, fd_write and fd_share tests I found what follows:

  • These tests feed the pipe by forking the process. The child process just feeds the pipe and exists. My guess is that because the parent process still exists and there are allocated resources valgrind reports the memory as still accessible. I think there is nothing wrong with this.
  • The implementer of this test seemed to already have this problem and dealt with it by not running the test code if the test was ran under valgrind (see FDTest.read)
  • The code checking if the test is running under valgrind is not perfect because it relies on the presence of valgrind.h. If the valgrind.h header is not present the test assumes that the test is not running under valgrind and runs the body of the test which results in errors. Note that valgrind.h is not installed along with the valgrind tool but rather requires installation of valgrind-devel package (at least on Fedora 19). When this is installed there are no warnings produced by the FD tests.
  • We should either better document the need for valgrind-devel package installation or perhaps disable the tests. I have no strong opinion here! Please advise.

Regarding the BufferTest.outputBufferReadat:

  • This issue is strictly related to the fact that the test uses EXPECT_DEATH from google test. There is nothing wrong with our code.
  • Again, I am not sure what to do: we can remove the EXPECT_DEATH from the test, disable or remove the test.

There is also the unit test for MemorySegmentMapped? class which appears to fail as follows on Fedora 19:

[ RUN      ] MemorySegmentMappedTest.createAndModify
==863== Syscall param msync(start) points to uninitialised byte(s)
==863==    at 0x3AD760E780: __msync_nocancel (in /usr/lib64/libpthread-2.17.so)
==863==    by 0x4C2FC15: isc::util::MemorySegmentMapped::~MemorySegmentMapped() (mapped_region.hpp:713)
==863==    by 0x4C2FC68: isc::util::MemorySegmentMapped::~MemorySegmentMapped() (memory_segment_mapped.cc:285)
==863==    by 0x463E11: (anonymous namespace)::MemorySegmentMappedTest_createAndModify_Test::TestBody() (checked_delete.hpp:34)
==863==    by 0x5299FC9: void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (gtest.cc:2090)
==863==    by 0x528EBF8: testing::Test::Run() (gtest.cc:2162)
==863==    by 0x528ECD6: testing::TestInfo::Run() (gtest.cc:2338)
==863==    by 0x528EE14: testing::TestCase::Run() (gtest.cc:2445)
==863==    by 0x528F184: testing::internal::UnitTestImpl::RunAllTests() (gtest.cc:4237)
==863==    by 0x5299B49: bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) (gtest.cc:2090)
==863==    by 0x528E460: testing::UnitTest::Run() (gtest.cc:3874)
==863==    by 0x40B476: main (run_unittests.cc:23)
==863==  Address 0x54ad0c4 is not stack'd, malloc'd or (recently) free'd
==863== 
==863== Syscall param msync(start) points to uninitialised byte(s)
==863==    at 0x3AD760E780: __msync_nocancel (in /usr/lib64/libpthread-2.17.so)
==863==    by 0x4C2FC15: isc::util::MemorySegmentMapped::~MemorySegmentMapped() (mapped_region.hpp:713)
==863==    by 0x4C2FC68: isc::util::MemorySegmentMapped::~MemorySegmentMapped() (memory_segment_mapped.cc:285)
==863==    by 0x460F89: (anonymous namespace)::MemorySegmentMappedTest::~MemorySegmentMappedTest() (checked_delete.hpp:34)
==863==    by 0x461022: (anonymous namespace)::MemorySegmentMappedTest_createAndModify_Test::~MemorySegmentMappedTest_createAndModify_Test() (memory_segment_mapped_unittest.cc:107)
==863==    by 0x5299FC9: void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (gtest.cc:2090)
==863==    by 0x528ED0D: testing::TestInfo::Run() (gtest.cc:2344)
==863==    by 0x528EE14: testing::TestCase::Run() (gtest.cc:2445)
==863==    by 0x528F184: testing::internal::UnitTestImpl::RunAllTests() (gtest.cc:4237)
==863==    by 0x5299B49: bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) (gtest.cc:2090)
==863==    by 0x528E460: testing::UnitTest::Run() (gtest.cc:3874)
==863==    by 0x40B476: main (run_unittests.cc:23)
==863==  Address 0x54ad0c4 is not stack'd, malloc'd or (recently) free'd
==863== 
[       OK ] MemorySegmentMappedTest.createAndModify (79 ms)

and it seems it may be related to the particular implementation of the standard libraries on Fedora. If it turns out to be a problem on other OSes perhaps we should simply disable building it as there is no component using it.

comment:5 Changed 5 years ago by stephen

  • Owner changed from UnAssigned to stephen

comment:6 Changed 5 years ago by stephen

  • Owner changed from stephen to marcin

Reviewed commit b22d5d37ee0a82063d664006c849402253c0bbfb

In fd_share.cc there were cases that the invalid descriptor was being closed. This was harmless because an attempt to close invalid descriptor was expected and handled properly and the close() function would return -1 in this case and the suitable error code. However, valgrind doesn't like this so I decided to check if the descriptor is valid or not on my own.

In the line before the call to close(), "fd" is passed to dup(). It would be more logical check fd before calling that function, e.g..

int new_fd = -1;
int close_error = -1;
if (fd >= 0) {
   new_fd = dup(fd);
   close_error = close(fd);
}

(Whatever you do, please get rid of the "int" initialization using parentheses: I feel that "int x = -1" is more easily readable than "int x(-1)")

Apparently, Valgrind uses signed size_t type and an attempt to pass a very high positive value for size_t (such as ULONG_MAX) is cast to -1 by Valgrind. This results in false positive that the size passed to malloc is -1 (in fact it is ULONG_MAX). Although, it is the issue in Valgrind I thought it would be harmless to change the passed value to LONG_MAX.

That change is OK.

Other errors
I suggest we merge what we have done so far), note it on the ticket, and place this ticket back to unassigned. In particular, I think that some of the features whose tests are failing may well be removed, e.g. where do we use the fd_share functions in Lea? And the MemorySegmentMapped functions were used in the shared memory DNS authoritative servers: we may well not use them if we go for a multiprocess Kea model.

comment:7 Changed 5 years ago by marcin

  • Add Hours to Ticket changed from 6 to 1
  • Milestone changed from Kea0.9 to DHCP Outstanding Tasks
  • Status changed from reviewing to accepted
  • Total Hours changed from 6 to 7

As suggested in the review the issues with the following tests:

have been left unresolved because they test the code that may eventually be removed from Kea and they don't cause fatal failures on the Build farm.

However, since the issues still exist in the current code, the ticket is left open.

Fixes for other issues have been applied and merged to master with the commit: 003533bf6eee65593dee2fb6155564b97b752106

comment:8 Changed 5 years ago by marcin

  • Owner changed from marcin to UnAssigned
  • Status changed from accepted to assigned

comment:9 Changed 4 years ago by tomek

  • Milestone changed from DHCP Outstanding Tasks to Outstanding Tasks

Milestone renamed

comment:10 Changed 3 years ago by tomek

  • Description modified (diff)
  • Owner changed from UnAssigned to tomek

comment:11 Changed 3 years ago by tomek

  • Resolution set to complete
  • Status changed from assigned to closed
Note: See TracTickets for help on using tickets.