Opened 7 years ago

Closed 7 years ago

Last modified 7 years ago

#2468 closed defect (duplicate)

b10-stats-httpd_test.py not stopping

Reported by: jreed Owned by:
Priority: medium Milestone:
Component: statistics Version:
Keywords: Cc:
CVSS Scoring: Parent Tickets:
Sensitive: no Defect Severity: N/A
Sub-Project: Core Feature Depending on Ticket:
Estimated Difficulty: discuss Add Hours to Ticket: 0
Total Hours: 0 Internal?: no

Description

Also see #1192 for a similar ticket.

Several times on the FreeBSD 8.2 amd64 this test is still running:

jreed  1920  0.0  0.1  3204   708  ??  IN   Fri09PM   0:00.00 make check-local
jreed  1921  0.0  0.1  8296  1512  ??  IN   Fri09PM   0:00.01 [sh]
jreed  1923  0.0  5.8 222700 60044  ??  IN   Fri09PM   0:01.53 /usr/local/bin/python /usr/home/jreed/builder/work/BIND10-cppcheck/20121102210000-FreeBSD8-amd64-GCC/build/src/bin/stats/tests/b10-stats-httpd_test.py

It has been several days.

Here is another current example on FreeBSD 8.1 i386:

jreed  1231  0.0  0.1  1888   600  ??  IN   Thu10AM   0:00.00 make check-local
jreed  1232  0.0  0.2  3628  1208  ??  IN   Thu10AM   0:00.00 [sh]
jreed  1234  0.0  9.2 189204 47296  ??  IN   Thu10AM   0:01.62 /usr/local/bin/python3.1 /b/work/BIND10-distcheck/20121101093000-FreeBSD8-i386/build/bind10-devel-20120817/_build/../src/bin/stats/tests/b10-stats-httpd_test.py

Hanging for a week.

If you want to log into to either of these systems, let jreed know.

Subtickets

Change History (17)

comment:1 follow-up: Changed 7 years ago by jreed

Here is output from when I killed it last time (I think):

http://git.bind10.isc.org/~tester/builder/BIND10-cppcheck/20121029223001-FreeBSD8-amd64-GCC/logs/unittests.out

Maybe hanging in gethostbyaddr?

Also another example, different system:

http://git.bind10.isc.org/~tester/builder/BIND10/20121030203500-FreeBSD8-i386/logs/unittests.out

About 6 failures in the stats/tests in past month and a half.

Last edited 7 years ago by jreed (previous) (diff)

comment:2 in reply to: ↑ 1 ; follow-up: Changed 7 years ago by naokikambe

On my FreeBSD9 environment, I've never seen it. As long as I grepped logs at git.bind10.isc.org, it likely never happened on Linux systems.

> find . -name unittests.out | xargs grep -l 'A deadlock might be detec'
./BIND10-cppcheck/20121029173012-FreeBSD8-amd64-GCC/logs/unittests.out
./BIND10-cppcheck/20121029180746-FreeBSD8-amd64-GCC/logs/unittests.out
./BIND10-cppcheck/20121029223001-FreeBSD8-amd64-GCC/logs/unittests.out
./BIND10/20120508224610-NetBSD5-amd64/logs/unittests.out
./BIND10/20120509055016-NetBSD5-amd64/logs/unittests.out
./BIND10/20120509105016-NetBSD5-amd64/logs/unittests.out
./BIND10/20120511195000-NetBSD5-amd64/logs/unittests.out
./BIND10/20120722185001-Solaris10-sparc-Sunstudio/logs/unittests.out
./BIND10/20120808174510-MacOSX10.6-x86_64-Clang-Static/logs/unittests.out
./BIND10/20120913153047-Solaris10-sparc-GCC/logs/unittests.out
./BIND10/20121030203500-FreeBSD8-i386/logs/unittests.out
>

BTW according to the failure logs, it seems to happen at line 766 or 775 in b10-stats-httpd_test.py. They are inside of test_httpd(). test_httpd() is a little big. So how about splitting test_httpd() as following? If we specify a failed test, we might be able to fix or skip it.

  • src/bin/stats/tests/b10-stats-httpd_test.py

    diff --git a/src/bin/stats/tests/b10-stats-httpd_test.py b/src/bin/stats/tests/b10-stats-httpd_test.py
    index 73ed35b..627d50e 100644
    a b class TestStatsHttpd(unittest.TestCase): 
    761761            self.assertEqual(ht.address_family, socket.AF_INET)
    762762            self.assertTrue(isinstance(ht.socket, socket.socket))
    763763
     764    def test_httpd_anyIPv4(self):
    764765        # any address (IPv4)
    765766        server_addresses = get_availaddr(address='0.0.0.0')
    766767        self.stats_httpd = MyStatsHttpd(server_addresses)
    class TestStatsHttpd(unittest.TestCase): 
    769770            self.assertEqual(ht.address_family,socket.AF_INET)
    770771            self.assertTrue(isinstance(ht.socket, socket.socket))
    771772
     773    def test_httpd_anyIPv6(self):
    772774        # any address (IPv6)
    773775        if self.ipv6_enabled:
    774776            server_addresses = get_availaddr(address='::')
    class TestStatsHttpd(unittest.TestCase): 
    778780                self.assertEqual(ht.address_family,socket.AF_INET6)
    779781                self.assertTrue(isinstance(ht.socket, socket.socket))
    780782
     783    def test_httpd_failed(self):
    781784        # existent hostname
    782785        self.assertRaises(stats_httpd.HttpServerError, MyStatsHttpd,
    783786                          get_availaddr(address='localhost'))}}}

comment:3 in reply to: ↑ 2 ; follow-up: Changed 7 years ago by jreed

Replying to naokikambe:

They are inside of test_httpd(). test_httpd() is a little big. So how about splitting test_httpd() as following? If we specify a failed test, we might be able to fix or skip it.

Thanks. I committed this at 3a32db0518e8ea18bad9f04df67ad04dcc7afe45

comment:4 Changed 7 years ago by jreed

http://git.bind10.isc.org/~jreed/J.backtrace-2468.txt is backtraces for all threads for a process running for a week.

comment:5 in reply to: ↑ 3 ; follow-up: Changed 7 years ago by naokikambe

Thank you for committing. But the new code seems to open more files than before (The new code seems to need more msgq sockets). To improve this issue, I've now pushed a new branch as trac2468. By this improvement, extra stats servers wasn't started. So even if setting ulimit -n 256, revised b10-stats-httpd_test.py doesn't fail on my environment. Could you try the branch?

comment:6 in reply to: ↑ 5 ; follow-up: Changed 7 years ago by jinmei

Replying to naokikambe:

Thank you for committing. But the new code seems to open more files than before (The new code seems to need more msgq sockets). To improve this issue, I've now pushed a new branch as trac2468. By this improvement, extra stats servers wasn't started. So even if setting ulimit -n 256, revised b10-stats-httpd_test.py doesn't fail on my environment. Could you try the branch?

If the server instance is only needed for that specific test, the
change itself may make sense, but I don't understand why the original
code requires too many open files, and I'm a bit reluctant to give it
a go without knowing why it solves the problem.

I'd be also concerned about the use of a separate thread for unit
tests in the first place. Sometimes such a tricky setup might be the
real last resort, but we should at least fully consider other options
with some proper abstraction of the service. It's often possible for
such a dynamic language as Python.

Regarding the code itself, the use of hasattr seems to be a bit
cryptic. While we need more lines of code, it seems to me more
natural if we set it to None in setUp() and do shutdown if it is not
None in tearDown.

comment:7 in reply to: ↑ 6 ; follow-up: Changed 7 years ago by naokikambe

Thank you for the comment.

Replying to jinmei:

If the server instance is only needed for that specific test, the
change itself may make sense, but I don't understand why the original
code requires too many open files, and I'm a bit reluctant to give it
a go without knowing why it solves the problem.

I'd be also concerned about the use of a separate thread for unit
tests in the first place. Sometimes such a tricky setup might be the
real last resort, but we should at least fully consider other options
with some proper abstraction of the service. It's often possible for
such a dynamic language as Python.

I know many problems in b10-stats-httpd_test.py. We should consider other options like #1668. We can also work for revising b10-stats-httpd_test.py on another ticket.

But now b10-stats-httpd_test.py hangs at the buildbot. I think we should fix it instantly. If we don't need to do so, we might need to revert the change splitting the unittest(revert 3a32db0518e8ea18bad9f04df67ad04dcc7afe45). At least before the change, such a failure(too many open files) didn't happen at the buildbot, I think. That change was just for observation of hanging but it caused another problem. So can I revert it?

Regarding the code itself, the use of hasattr seems to be a bit
cryptic. While we need more lines of code, it seems to me more
natural if we set it to None in setUp() and do shutdown if it is not
None in tearDown.

If test_mccs() is unexpectedly stopped, the test needs to shutdown the stats_server object after that. It just does in tearDown(). That doesn't make sense much.

Regards,

comment:8 in reply to: ↑ 7 ; follow-up: Changed 7 years ago by jinmei

Replying to naokikambe:

If the server instance is only needed for that specific test, the
change itself may make sense, but I don't understand why the original
code requires too many open files, and I'm a bit reluctant to give it
a go without knowing why it solves the problem.

I'd be also concerned about the use of a separate thread for unit
tests in the first place. Sometimes such a tricky setup might be the
real last resort, but we should at least fully consider other options
with some proper abstraction of the service. It's often possible for
such a dynamic language as Python.

I know many problems in b10-stats-httpd_test.py. We should consider other options like #1668. We can also work for revising b10-stats-httpd_test.py on another ticket.

But now b10-stats-httpd_test.py hangs at the buildbot. I think we should fix it instantly. If we don't need to do so, we might need to revert the change splitting the unittest(revert 3a32db0518e8ea18bad9f04df67ad04dcc7afe45). At least before the change, such a failure(too many open files) didn't happen at the buildbot, I think. That change was just for observation of hanging but it caused another problem. So can I revert it?

Due to the urgency I'm okay with forgetting the use of thread in this
context, but I still don't like to make a change without understanding
why creating the thread for each test case results in so many open
files. Anyway, my time is going to be up, so please discuss it with
others. If someone else thinks we should apply this branch as an
urgent care for breaking the bots, I don't oppose to the decision.

comment:9 in reply to: ↑ 8 Changed 7 years ago by naokikambe

Replying to jinmei:
I'm ok with either merging or reverting.

BTW on the current master, if I set limit -n 256 and disabled all test classes other than TestStatsHttpd like the following, it passed (no "Too many open files" error happened). So probably python might do multiple unittests at a time.

If some tests take some time (or finishing some tests take some time), the number of opened files might be increased in a short time. Of course, unless the test uses threads, the number wouldn't be increased so sharply:).

  • src/bin/stats/tests/b10-stats-httpd_test.py

    diff --git a/src/bin/stats/tests/b10-stats-httpd_test.py b/src/bin/stats/tests/b10-stats-httpd_test.py
    index 997ac41..7be2a82 100644
    a b def is_ipv6_enabled(address='::1', port=8001): 
    145145            if sock: sock.close()
    146146    return False
    147147
     148@unittest.skip
    148149class TestItemNameList(unittest.TestCase):
    149150
    150151    def test_item_name_list(self):
    class TestItemNameList(unittest.TestCase): 
    223224        self.assertEqual(ans,
    224225                         stats_httpd.item_name_list(dictlist, ''))
    225226
     227@unittest.skip
    226228class TestHttpHandler(unittest.TestCase):
    227229    """Tests for HttpHandler class"""
    228230    def setUp(self):
    class TestHttpHandler(unittest.TestCase): 
    550552            # do validation
    551553            self.assertTrue(xsd.validate(xml_doc))
    552554
     555@unittest.skip
    553556class TestHttpServerError(unittest.TestCase):
    554557    """Tests for HttpServerError exception"""
    555558    def test_raises(self):
    class TestHttpServerError(unittest.TestCase): 
    558561        except stats_httpd.HttpServerError as err:
    559562            self.assertEqual(str(err), 'Nothing')
    560563
     564@unittest.skip
    561565class TestHttpServer(unittest.TestCase):
    562566    """Tests for HttpServer class"""
    563567    def setUp(self):
    class TestHttpServer(unittest.TestCase): 
    579583        for httpd in self.stats_httpd.httpd:
    580584            self.assertTrue(isinstance(httpd, stats_httpd.HttpServer))
    581585
     586@unittest.skip
    582587class TestStatsHttpdError(unittest.TestCase):
    583588    """Tests for StatsHttpdError exception"""

comment:10 Changed 7 years ago by jinmei

  • Milestone changed from New Tasks to Next-Sprint-Proposed

comment:12 in reply to: ↑ 11 Changed 7 years ago by naokikambe

Replying to jreed:

https://lists.isc.org/pipermail/bind10-dev/2013-January/004241.html

As another workaround, this patch disables tests using MockMsgq() on freebsd8.

  • src/bin/stats/tests/test_utils.py

    diff --git a/src/bin/stats/tests/test_utils.py b/src/bin/stats/tests/test_utils.py
    index 96f7046..de35512 100644
    a b import threading 
    2525import tempfile
    2626import json
    2727import signal
     28import unittest
    2829
    2930import msgq
    3031import isc.config.cfgmgr
    class MyStatsHttpd(stats_httpd.StatsHttpd): 
    535536
    536537class BaseModules:
    537538    def __init__(self):
     539        # skipping tests on freebsd8. see #2468 for details.
     540        if sys.platform.startswith('freebsd8'):
     541            raise unittest.SkipTest('Skipping on freebsd8')
    538542        # MockMsgq
    539543        self.msgq = ThreadingServerManager(MockMsgq)
    540544        self.msgq.run()}}}

comment:13 follow-up: Changed 7 years ago by jinmei

I believe #2823 provided a complete solution to this problem.
I'm moving it to next-sprint-proposed just to close the ticket
at the time of sprint planning meeting.

comment:14 Changed 7 years ago by jinmei

  • Milestone set to Next-Sprint-Proposed

comment:15 in reply to: ↑ 13 Changed 7 years ago by jinmei

Replying to jinmei:

I believe #2823 provided a complete solution to this problem.
I'm moving it to next-sprint-proposed just to close the ticket
at the time of sprint planning meeting.

closing.

comment:16 Changed 7 years ago by jinmei

  • Resolution set to duplicate
  • Status changed from new to closed

comment:17 Changed 7 years ago by muks

  • Milestone Previous-Sprint-Proposed deleted
Note: See TracTickets for help on using tickets.