Opened 7 years ago

Closed 7 years ago

Last modified 7 years ago

#2942 closed defect (fixed)

b10-auth terminated with signal 6, Aborted.

Reported by: jreed Owned by:
Priority: very high Milestone:
Component: b10-auth Version:
Keywords: Cc:
CVSS Scoring: Parent Tickets:
Sensitive: no Defect Severity: Very High
Sub-Project: DNS Feature Depending on Ticket:
Estimated Difficulty: 0 Add Hours to Ticket: 0
Total Hours: 0 Internal?: no

Description

This was from the bind10-1.1.0-release branch merged from master last night.
On the busy as112 server with debugging at debuglevel 1.

It crashed after b10-auth server started and received and responded to 459 DNS messages in less than a second. (Also see ticket #2941.)

2013-05-03 11:27:41.001 INFO  [b10-init.init/59526] BIND10_LOST_SOCKET_CONSUMER consumer 33 of sockets disconnected, considering all its sockets closed
2013-05-03 11:27:41.002 INFO  [b10-init.init/59526] BIND10_PROCESS_ENDED process 59537 of b10-auth-2 ended with status 134
2013-05-03 11:27:41.003 ERROR [b10-init.init/59526] BIND10_COMPONENT_FAILED component b10-auth-2 (pid 59537) failed: process dumped core with exit status 134 (killed by signal 6: SIGABRT)
...
2013-05-03 11:27:41.002 INFO  [b10-init.init/59526] BIND10_PROCESS_ENDED process 59537 of b10-auth-2 ended with status 134
2013-05-03 11:27:41.010 INFO  [b10-init.init/59526] BIND10_PROCESS_ENDED process 59533 of b10-auth ended with status 134
2013-05-03 11:27:42.046 INFO  [b10-init.init/59526] BIND10_PROCESS_ENDED process 59538 of b10-auth-4 ended with status 134
2013-05-03 11:27:42.047 INFO  [b10-init.init/59526] BIND10_PROCESS_ENDED process 59536 of b10-auth-3 ended with status 134

It never logged AUTH_SERVER_STARTED.

Output that was not logged was:

pure virtual method called
terminate called without an active exception
pure virtual method called
terminate called without an active exception
pure virtual method called
terminate called without an active exception
pure virtual method called
terminate called without an active exception

This is FreeBSD 8.1-RELEASE amd64.

backtrace:

#0  0x00000008028c103c in thr_kill () from /lib/libc.so.7
[New Thread 8032551c0 (LWP 101179)]
[New Thread 8032041c0 (LWP 101428)]
[New LWP 100463]
(gdb)  thread apply all bt

Thread 3 (LWP 100463):
#0  0x00000008028c103c in thr_kill () from /lib/libc.so.7
#1  0x00000008021993b5 in pthread_cond_signal () from /lib/libthr.so.3
#2  0x00000008019bffa9 in isc::util::thread::CondVar::wait (this=0x803204418, 
    mutex=@0x803204420) at sync.cc:239
#3  0x000000000043d59e in isc::auth::datasrc_clientmgr_internal::DataSrcClientsBuilderBase<isc::util::thread::Mutex, isc::util::thread::CondVar>::run (
    this=0x803204440) at datasrc_clients_mgr.h:484
#4  0x0000000000433c42 in boost::_mfi::mf0<void, isc::auth::datasrc_clientmgr_internal::DataSrcClientsBuilderBase<isc::util::thread::Mutex, isc::util::thread::CondVar> >::operator() (this=0x80322dd40, p=0x803204440)
    at mem_fn_template.hpp:49
#5  0x0000000000434001 in boost::_bi::list1<boost::_bi::value<isc::auth::datasrc_clientmgr_internal::DataSrcClientsBuilderBase<isc::util::thread::Mutex, isc::util::thread::CondVar>*> >::operator()<boost::_mfi::mf0<void, isc::auth::datasrc_clientmgr_internal::DataSrcClientsBuilderBase<isc::util::thread::Mutex, isc::util::thread::CondVar> >, boost::_bi::list0> (this=0x80322dd50, f=@0x80322dd40, 
    a=@0x7fffffbfeeff) at bind.hpp:253
#6  0x0000000000434046 in boost::_bi::bind_t<void, boost::_mfi::mf0<void, isc::auth::datasrc_clientmgr_internal::DataSrcClientsBuilderBase<isc::util::thread::Mutex, isc::util::thread::CondVar> >, boost::_bi::list1<boost::_bi::value<isc::auth::datasrc_clientmgr_internal::DataSrcClientsBuilderBase<isc::util::thread::Mute
ost::_bi::bind_t<void, boost::_mfi::mf0<void, isc::auth::datasrc_clientmgr_internal::DataSrcClientsBuilderBase<isc::util::thread::Mutex, isc::util::thread::CondVar> >, boost::_bi::list1<boost::_bi::value<isc::auth::datasrc_clientmgr_internal::DataSrcClientsBuilderBase<isc::util::thread::Mutex, isc::util::thread::CondVar>*> > >, void>::invoke (function_obj_ptr=@0x80322dd40)
    at function_template.hpp:153
#8  0x0000000800fddf91 in boost::function0<void>::operator() (this=0x80322dd38)
    at function_template.hpp:1013
#9  0x00000008019c4342 in isc::util::thread::Thread::Impl::run (
    impl_raw=0x80322dd30) at thread.cc:71
#10 0x0000000802191511 in pthread_getprio () from /lib/libthr.so.3
#11 0x0000000000000000 in ?? ()
Cannot access memory at address 0x7fffffbff000
warning: Couldn't find general-purpose registers in core file.


Thread 2 (Thread 8032041c0 (LWP 101428)):
#0  0x000000080219b2bc in __error () from /lib/libthr.so.3
#1  0x00000008021993b5 in pthread_cond_signal () from /lib/libthr.so.3
#2  0x00000008019bffa9 in isc::util::thread::CondVar::wait (this=0x803204418, 
    mutex=@0x803204420) at sync.cc:239
#3  0x000000000043d59e in isc::auth::datasrc_clientmgr_internal::DataSrcClientsBuilderBase<isc::util::thread::Mutex, isc::util::thread::CondVar>::run (
    this=0x803204440) at datasrc_clients_mgr.h:484
#4  0x0000000000433c42 in boost::_mfi::mf0<void, isc::auth::datasrc_clientmgr_internal::DataSrcClientsBuilderBase<isc::util::thread::Mutex, isc::util::thread::CondVar> >::operator() (this=0x80322dd40, p=0x803204440)
    at mem_fn_template.hpp:49
#5  0x0000000000434001 in boost::_bi::list1<boost::_bi::value<isc::auth::datasrc_clientmgr_internal::DataSrcClientsBuilderBase<isc::util::thread::Mutex, isc::util::thread::CondVar>*> >::operator()<boost::_mfi::mf0<void, isc::auth::datasrc_clientmgr_internal::DataSrcClientsBuilderBase<isc::util::thread::Mutex, isc::util::thread::CondVar> >, boost::_bi::list0> (this=0x80322dd50, f=@0x80322dd40, 
    a=@0x7fffffbfeeff) at bind.hpp:253
#6  0x0000000000434046 in boost::_bi::bind_t<void, boost::_mfi::mf0<void, isc::auth::datasrc_clientmgr_internal::DataSrcClientsBuilderBase<isc::util::thread::Mutex, isc::util::thread::CondVar> >, boost::_bi::list1<boost::_bi::value<isc::auth::datasrc_clientmgr_internal::DataSrcClientsBuilderBase<isc::util::thread::Mutex, isc::util::thread::CondVar>*> > >::operator() (this=0x80322dd40)
    at bind_template.hpp:20
#7  0x000000000043406d in boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void, boost::_mfi::mf0<void, isc::auth::datasrc_clientmgr_internal::DataSrcClientsBuilderBase<isc::util::thread::Mutex, isc::util::thread::CondVar> >, boost::_bi::list1<boost::_bi::value<isc::auth::datasrc_clientmgr_internal::DataSrcClientsBuilderBase<isc::util::thread::Mutex, isc::util::thread::CondVar>*> > >, void>::invoke (function_obj_ptr=@0x80322dd40)
#9  0x00000008019c4342 in isc::util::thread::Thread::Impl::run (
    impl_raw=0x80322dd30) at thread.cc:71
#10 0x0000000802191511 in pthread_getprio () from /lib/libthr.so.3
#11 0x0000000000000000 in ?? ()
Cannot access memory at address 0x7fffffbff000
warning: Couldn't find general-purpose registers in core file.


Thread 1 (Thread 8032551c0 (LWP 101179)):
#0  0x000000080219b2bc in __error () from /lib/libthr.so.3
#1  0x00000008021993b5 in pthread_cond_signal () from /lib/libthr.so.3
#2  0x00000008019bffa9 in isc::util::thread::CondVar::wait (this=0x803204418, 
    mutex=@0x803204420) at sync.cc:239
#3  0x000000000043d59e in isc::auth::datasrc_clientmgr_internal::DataSrcClientsBuilderBase<isc::util::thread::Mutex, isc::util::thread::CondVar>::run (
    this=0x803204440) at datasrc_clients_mgr.h:484
#4  0x0000000000433c42 in boost::_mfi::mf0<void, isc::auth::datasrc_clientmgr_internal::DataSrcClientsBuilderBase<isc::util::thread::Mutex, isc::util::thread::CondVar> >::operator() (this=0x80322dd40, p=0x803204440)
    at mem_fn_template.hpp:49
#5  0x0000000000434001 in boost::_bi::list1<boost::_bi::value<isc::auth::datasrc_clientmgr_internal::DataSrcClientsBuilderBase<isc::util::thread::Mutex, isc::util::thread::CondVar>*> >::operator()<boost::_mfi::mf0<void, isc::auth::datasrc_c::thread::CondVar> >, boost::_bi::list0> (this=0x80322dd50, f=@0x80322dd40, 
    a=@0x7fffffbfeeff) at bind.hpp:253
#6  0x0000000000434046 in boost::_bi::bind_t<void, boost::_mfi::mf0<void, isc::auth::datasrc_clientmgr_internal::DataSrcClientsBuilderBase<isc::util::thread::Mutex, isc::util::thread::CondVar> >, boost::_bi::list1<boost::_bi::value<isc::auth::datasrc_clientmgr_internal::DataSrcClientsBuilderBase<isc::util::thread::Mutex, isc::util::thread::CondVar>*> > >::operator() (this=0x80322dd40)
    at bind_template.hpp:20
#7  0x000000000043406d in boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void, boost::_mfi::mf0<void, isc::auth::datasrc_clientmgr_internal::DataSrcClientsBuilderBase<isc::util::thread::Mutex, isc::util::thread::CondVar> >, boost::_bi::list1<boost::_bi::value<isc::auth::datasrc_clientmgr_internal::DataSrcClientsBuilderBase<isc::util::thread::Mutex, isc::util::thread::CondVar>*> > >, void>::invoke (function_obj_ptr=@0x80322dd40)
    at function_template.hpp:153
#8  0x0000000800fddf91 in boost::function0<void>::operator() (this=0x80322dd38)
    at function_template.hpp:1013
#9  0x00000008019c4342 in isc::util::thread::Thread::Impl::run (
    impl_raw=0x80322dd30) at thread.cc:71
#10 0x0000000802191511 in pthread_getprio () from /lib/libthr.so.3
#11 0x0000000000000000 in ?? ()
Cannot access memory at address 0x7fffffbff000
#0  0x000000080219b2bc in __error () from /lib/libthr.so.3

I reverted to beta1 from a month ago and it works. I was able to reproduce this issue a few times.

Subtickets

Attachments (3)

udp-server.diff (645 bytes) - added by jinmei 7 years ago.
udp-server2.diff (4.4 KB) - added by jinmei 7 years ago.
udp-server3.diff (497 bytes) - added by jinmei 7 years ago.

Download all attachments as: .zip

Change History (25)

comment:1 follow-up: Changed 7 years ago by jreed

More backtrace:

#0  0x00000008028c103c in thr_kill () from /lib/libc.so.7
#1  0x000000080295d1cb in abort () from /lib/libc.so.7
#2  0x00000008024aefa4 in __gnu_cxx::__verbose_terminate_handler ()
   from /usr/lib/libstdc++.so.6
#3  0x00000008024b34a3 in std::set_unexpected () from /usr/lib/libstdc++.so.6
#4  0x00000008024b34e3 in std::terminate () from /usr/lib/libstdc++.so.6
#5  0x00000008024fb75f in __cxa_pure_virtual () from /usr/lib/libstdc++.so.6
#6  0x000000000042e875 in AuthSrv::processMessage (this=0x80322f5e0, 
    io_message=@0x7fffffffc590, message=@0x80320f268, buffer=@0x80332e480, 
    server=0x803333000) at auth_srv.cc:502
#7  0x000000000043a69b in MessageLookup::operator() (this=0x80320d9c0, 
    io_message=@0x7fffffffc590, message=@0x7fffffffc600, 
    buffer=@0x7fffffffc5e0, server=0x803333000) at auth_srv.cc:352
#8  0x0000000801293119 in isc::asiodns::IOFetch::operator() (this=0x80321e918, 
    ec={value_ = 52554000, category_ = 8}, length=34413405056)
    at io_fetch.cc:255
#9  0x0000000801295192 in boost::scoped_ptr<isc::asiolink::IOEndpoint>::operator* (this=0x801295192) at scoped_ptr.hpp:89
#10 0x0000000801292e70 in isc::asiodns::IOFetch::stop (this=0x803204388, 
    result=32767) at io_fetch.cc:380
#11 0x00007fffffffc840 in ?? ()
#12 0x8000000000000000 in ?? ()
#13 0x00007fffffffc7b8 in ?? ()
#14 0x00007fffffffc970 in ?? ()
#15 0x00007fffffffc958 in ?? ()
#16 0x00007fffffffc970 in ?? ()
#17 0x00007fffffffc7b8 in ?? ()
#18 0x00007fffffffc7e0 in ?? ()
#19 0x000000080129559d in boost::detail::lexical_stream_limited_src<char, boost::detail::lexical_streambuf_fake, std::char_traits<char> >::operator<< (this=Cannot access memory at address 0x100000000000078
)
    at lexical_cast.hpp:919
Cannot access memory at address 0x100000000000088

comment:2 Changed 7 years ago by jreed

And for thread 1:

(gdb) thread 1
[Switching to thread 1 (Thread 8032551c0 (LWP 100792))]#0  0x000000080219b2bc in __error () from /lib/libthr.so.3
#0  0x000000080219b2bc in __error () from /lib/libthr.so.3
#1  0x00000008021993b5 in pthread_cond_signal () from /lib/libthr.so.3
#2  0x00000008019bffa9 in isc::util::thread::CondVar::wait (this=0x803204418, 
    mutex=@0x803204420) at sync.cc:239
#3  0x000000000043d59e in isc::auth::datasrc_clientmgr_internal::DataSrcClientsBuilderBase<isc::util::thread::Mutex, isc::util::thread::CondVar>::run (
    this=0x803204440) at datasrc_clients_mgr.h:484
#4  0x0000000000433c42 in boost::_mfi::mf0<void, isc::auth::datasrc_clientmgr_internal::DataSrcClientsBuilderBase<isc::util::thread::Mutex, isc::util::thread::CondVar> >::operator() (this=0x80322dd40, p=0x803204440)
    at mem_fn_template.hpp:49
#5  0x0000000000434001 in boost::_bi::list1<boost::_bi::value<isc::auth::datasrc_clientmgr_internal::DataSrcClientsBuilderBase<isc::util::thread::Mutex, isc::util::thread::CondVar>*> >::operator()<boost::_mfi::mf0<void, isc::auth::datasrc_clientmgr_internal::DataSrcClientsBuilderBase<isc::util::thread::Mutex, isc::util::thread::CondVar> >, boost::_bi::list0> (this=0x80322dd50, f=@0x80322dd40, 
    a=@0x7fffffbfeeff) at bind.hpp:253
#6  0x0000000000434046 in boost::_bi::bind_t<void, boost::_mfi::mf0<void, isc::auth::datasrc_clientmgr_internal::DataSrcClientsBuilderBase<isc::util::thread::Mutex, isc::util::thread::CondVar> >, boost::_bi::list1<boost::_bi::value<isc::auth::datasrc_clientmgr_internal::DataSrcClientsBuilderBase<isc::util::thread::Mutex, isc::util::thread::CondVar>*> > >::operator() (this=0x80322dd40)
    at bind_template.hpp:20
#7  0x000000000043406d in boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void, boost::_mfi::mf0<void, isc::auth::datasrc_clientmgr_internal::DataSrcClientsBuilderBase<isc::util::thread::Mutex, isc::util::thread::CondVar> >, boost::_bi::list1<boost::_bi::value<isc::auth::datasrc_clientmgr_internal::DataSrcClientsBuilderBase<isc::util::thread::Mutex, isc::util::thread::CondVar>*> > >, void>::invoke (function_obj_ptr=@0x80322dd40)
    at function_template.hpp:153
#8  0x0000000800fddf91 in boost::function0<void>::operator() (this=0x80322dd38)
    at function_template.hpp:1013
#9  0x00000008019c4342 in isc::util::thread::Thread::Impl::run (
    impl_raw=0x80322dd30) at thread.cc:71
#10 0x0000000802191511 in pthread_getprio () from /lib/libthr.so.3
#11 0x0000000000000000 in ?? ()
Cannot access memory at address 0x7fffffbff000

comment:3 in reply to: ↑ 1 ; follow-up: Changed 7 years ago by jreed

Replying to jreed:

#8 0x0000000801293119 in isc::asiodns::IOFetch::operator() (this=0x80321e918,

ec={value_ = 52554000, category_ = 8}, length=34413405056)
at io_fetch.cc:255

Length is odd.

comment:4 Changed 7 years ago by jreed

Answering this from yesterday's jabber:

(10:14:00 PM) jinmei: it's probably due to excessive logging
(10:14:05 PM) jinmei: > abort
(10:14:36 PM) jinmei: I'd first check if it repeats, and then try decrease the log level to INFO (or debug 0 if it's possible)

Yes, I can repeat it.
I also set the severity to INFO and the problem still happened.

comment:5 in reply to: ↑ 3 Changed 7 years ago by jinmei

Replying to jreed:

Replying to jreed:

#8 0x0000000801293119 in isc::asiodns::IOFetch::operator() (this=0x80321e918,

ec={value_ = 52554000, category_ = 8}, length=34413405056)
at io_fetch.cc:255

This part of the backtrace doesn't make sense. AuthSrv's call back
shouldn't be called from IOFetch. Does it refer to the correct
library?

In either case, could we get at least one more backtrace? The first
backtrace also seems odd; it shows all threads are working as a
data source client builder, which is impossible.

comment:6 Changed 7 years ago by jreed

Here is another backtrace using correct libraries. (I realize the LD_LIBRARY_PATH was not being used, even though gdb showed it was set, so I reinstalled libraries for this.)

#0  0x00000008028c103c in thr_kill () from /lib/libc.so.7
#1  0x000000080295d1cb in abort () from /lib/libc.so.7
#2  0x00000008024aefa4 in __gnu_cxx::__verbose_terminate_handler ()
   from /usr/lib/libstdc++.so.6
#3  0x00000008024b34a3 in std::set_unexpected () from /usr/lib/libstdc++.so.6
#4  0x00000008024b34e3 in std::terminate () from /usr/lib/libstdc++.so.6
#5  0x00000008024fb75f in __cxa_pure_virtual () from /usr/lib/libstdc++.so.6
#6  0x000000000042e875 in AuthSrv::processMessage (this=0x80322f5e0, 
    io_message=@0x7fffffffc590, message=@0x80320f268, buffer=@0x80332e480, 
    server=0x803333000) at auth_srv.cc:502
#7  0x000000000043a69b in MessageLookup::operator() (this=0x80320d9c0, 
    io_message=@0x7fffffffc590, message=@0x7fffffffc600, 
    buffer=@0x7fffffffc5e0, server=0x803333000) at auth_srv.cc:352
#8  0x0000000801293119 in isc::asiodns::SyncUDPServer::handleRead (
    this=0x803333000, ec=@0x7fffffffc970, length=46) at sync_udp_server.cc:111
#9  0x0000000801295192 in boost::_mfi::mf2<void, isc::asiodns::SyncUDPServer, asio::error_code const&, unsigned long>::operator() (this=0x7fffffffc958, 
    p=0x803333000, a1=@0x7fffffffc970, a2=46) at mem_fn_template.hpp:280
#10 0x0000000801295511 in boost::_bi::list3<boost::_bi::value<isc::asiodns::SyncUDPServer*>, boost::arg<1>, boost::arg<2> >::operator()<boost::_mfi::mf2<void, isc::asiodns::SyncUDPServer, asio::error_code const&, unsigned long>, boost::_bi::list2<asio::error_code const&, unsigned long&> > (this=0x7fffffffc968, 
    f=@0x7fffffffc958, a=@0x7fffffffc790) at bind.hpp:392
#11 0x0000000801295566 in boost::_bi::bind_t<void, boost::_mfi::mf2<void, isc::asiodns::SyncUDPServer, asio::error_code const&, unsigned long>, boost::_bi::list3<boost::_bi::value<isc::asiodns::SyncUDPServer*>, boost::arg<1>, boost::arg<2> > >::operator()<asio::error_code, unsigned long> (this=0x7fffffffc958, 
    a1=@0x7fffffffc970, a2=@0x7fffffffc7b8) at bind_template.hpp:76
#12 0x000000080129559d in boost::detail::function::void_function_obj_invoker2<boost::_bi::bind_t<void, boost::_mfi::mf2<void, isc::asiodns::SyncUDPServer, asio::error_code const&, unsigned long>, boost::_bi::list3<boost::_bi::value<isc::asiodns::SyncUDPServer*>, boost::arg<1>, boost::arg<2> > >, void, asio::error_code const&, unsigned long>::invoke (function_obj_ptr=@0x7fffffffc958, 
    a0=@0x7fffffffc970, a1=46) at function_template.hpp:153
#13 0x0000000801296f81 in boost::function2<void, asio::error_code const&, unsigned long>::operator() (this=0x7fffffffc950, a0=@0x7fffffffc970, a1=46)
    at function_template.hpp:1013
#14 0x0000000801296fb8 in asio::detail::binder2<boost::function<void ()(asio::error_code const&, unsigned long)>, asio::error_code, unsigned long>::operator()
    (this=0x7fffffffc950) at bind_handler.hpp:96
#15 0x000000080129705c in asio::asio_handler_invoke<asio::detail::binder2<boost::function<void ()(asio::error_code const&, unsigned long)>, asio::error_code, unsigned long> > (function=@0x7fffffffc950) at handler_invoke_hook.hpp:63
#16 0x000000080129709b in asio_handler_invoke_helpers::invoke<asio::detail::binder2<boost::function<void ()(asio::error_code const&, unsigned long)>, asio::error_code, unsigned long>, boost::function<void ()(asio::error_code const&, unsigned long)> > (function=@0x7fffffffc9d0, context=@0x7fffffffc9d0)
    at handler_invoke_helpers.hpp:39
#17 0x0000000801297187 in asio::detail::reactive_socket_recvfrom_op<asio::mutable_buffers_1, asio::ip::basic_endpoint<asio::ip::udp>, boost::function<void ()(asio::error_code const&, unsigned long)> >::do_complete (owner=0x8032271d0, 
    base=0x803228350) at reactive_socket_recvfrom_op.hpp:115
#18 0x0000000800fcc495 in asio::detail::task_io_service_operation::complete (
    this=0x803228350, owner=@0x8032271d0) at task_io_service_operation.hpp:34
#19 0x0000000800fdbd67 in asio::detail::task_io_service::do_one (
    this=0x8032271d0, lock=@0x7fffffffcb70, this_idle_thread=0x7fffffffcb80)
    at task_io_service.ipp:277
#20 0x0000000800fdbed2 in asio::detail::task_io_service::run_one (
    this=0x8032271d0, ec=@0x7fffffffcbe0) at task_io_service.ipp:151
#21 0x0000000800fdc876 in asio::io_service::run_one (this=0x80320d1e0)
    at io_service.ipp:69
#22 0x0000000800fc8a19 in isc::cc::SessionImpl::readData (this=0x803275100, 
    data=0x803275148, datalen=4) at session.cc:192
#23 0x0000000800fc93ea in isc::cc::SessionImpl::readDataLength (
    this=0x803275100) at session.cc:160
#24 0x0000000800fc95f0 in isc::cc::Session::recvmsg (this=0x80320e530, 
    env=@0x7fffffffdc40, msg=@0x7fffffffdc30, nonblock=false, seq=18)
    at session.cc:384
#25 0x0000000800fcabb0 in isc::cc::Session::group_recvmsg (this=0x80320e530, 
    envelope=@0x7fffffffdc40, msg=@0x7fffffffdc30, nonblock=false, seq=18)
    at session.cc:502
#26 0x000000080179f6cb in releaseSocket (this=0x803277dd0, token=@0x80321ea00)
    at socket_request.cc:341
#27 0x000000080179618b in setAddresses (service=@0x7fffffffe590, 
    addresses=@0x7fffffffe130, 
    server_options=isc::asiodns::DNSServiceBase::SERVER_SYNC_OK)
    at portconfig.cc:87
#28 0x0000000801796a31 in isc::server_common::portconfig::installListenAddresses (new_addresses=@0x7fffffffe130, address_store=@0x8032043e8, 
    service=@0x7fffffffe590, 
    server_options=isc::asiodns::DNSServiceBase::SERVER_SYNC_OK)
    at portconfig.cc:129
#29 0x000000000042b621 in AuthSrv::setListenAddresses (this=0x80322f5e0, 
    addresses=@0x7fffffffe130) at auth_srv.cc:905
#30 0x000000000043f512 in build (this=0x80340c780, config=@0x7fffffffe410)
    at auth_config.cc:98
#31 0x000000000043ed40 in configureAuthServer (server=@0x80322f5e0, 
    config_set=@0x7fffffffe750) at auth_config.cc:200
#32 0x0000000000445e0d in main (argc=1, argv=0x7fffffffe9e8) at main.cc:218

And other thread:

(gdb) thread 1
[Switching to thread 1 (Thread 8032551c0 (LWP 100792))]#0  0x000000080219b2bc in __error () from /lib/libthr.so.3
(gdb) bt
#0  0x000000080219b2bc in __error () from /lib/libthr.so.3
#1  0x00000008021993b5 in pthread_cond_signal () from /lib/libthr.so.3
#2  0x00000008019bffa9 in isc::util::thread::CondVar::wait (this=0x803204418, 
    mutex=@0x803204420) at sync.cc:239
#3  0x000000000043d59e in isc::auth::datasrc_clientmgr_internal::DataSrcClientsBuilderBase<isc::util::thread::Mutex, isc::util::thread::CondVar>::run (
    this=0x803204440) at datasrc_clients_mgr.h:484
#4  0x0000000000433c42 in boost::_mfi::mf0<void, isc::auth::datasrc_clientmgr_internal::DataSrcClientsBuilderBase<isc::util::thread::Mutex, isc::util::thread::CondVar> >::operator() (this=0x80322dd40, p=0x803204440)
    at mem_fn_template.hpp:49
#5  0x0000000000434001 in boost::_bi::list1<boost::_bi::value<isc::auth::datasrc_clientmgr_internal::DataSrcClientsBuilderBase<isc::util::thread::Mutex, isc::util::thread::CondVar>*> >::operator()<boost::_mfi::mf0<void, isc::auth::datasrc_clientmgr_internal::DataSrcClientsBuilderBase<isc::util::thread::Mutex, isc::util::thread::CondVar> >, boost::_bi::list0> (this=0x80322dd50, f=@0x80322dd40, 
    a=@0x7fffffbfeeff) at bind.hpp:253
#6  0x0000000000434046 in boost::_bi::bind_t<void, boost::_mfi::mf0<void, isc::auth::datasrc_clientmgr_internal::DataSrcClientsBuilderBase<isc::util::thread::Mutex, isc::util::thread::CondVar> >, boost::_bi::list1<boost::_bi::value<isc::auth::datasrc_clientmgr_internal::DataSrcClientsBuilderBase<isc::util::thread::Mutex, isc::util::thread::CondVar>*> > >::operator() (this=0x80322dd40)
    at bind_template.hpp:20
#7  0x000000000043406d in boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void, boost::_mfi::mf0<void, isc::auth::datasrc_clientmgr_internal::DataSrcClientsBuilderBase<isc::util::thread::Mutex, isc::util::thread::CondVar> >, boost::_bi::list1<boost::_bi::value<isc::auth::datasrc_clientmgr_internal::DataSrcClientsBuilderBase<isc::util::thread::Mutex, isc::util::thread::CondVar>*> > >, void>::invoke (function_obj_ptr=@0x80322dd40)
    at function_template.hpp:153
#8  0x0000000800fddf91 in boost::function0<void>::operator() (this=0x80322dd38)
    at function_template.hpp:1013
#9  0x00000008019c4342 in isc::util::thread::Thread::Impl::run (
    impl_raw=0x80322dd30) at thread.cc:71
#10 0x0000000802191511 in pthread_getprio () from /lib/libthr.so.3
#11 0x0000000000000000 in ?? ()
Cannot access memory at address 0x7fffffbff000

comment:7 follow-up: Changed 7 years ago by jinmei

I can't find anything wrong yet, but I wonder if the attached patch
suppresses the problem.

Changed 7 years ago by jinmei

comment:8 in reply to: ↑ 7 ; follow-up: Changed 7 years ago by jreed

Replying to jinmei:

I can't find anything wrong yet, but I wonder if the attached patch
suppresses the problem.

It still fails.

comment:9 in reply to: ↑ 8 Changed 7 years ago by jreed

Replying to jreed:

It still fails.

But this time it did not output: "pure virtual method called
terminate called without an active exception".

And different signal:

2013-05-03 18:08:02.768 ERROR [b10-init.init/83744] BIND10_COMPONENT_FAILED comp
onent b10-auth (pid 83751) failed: process dumped core with exit status 139 (kil
led by signal 11: SIGSEGV)
2013-05-03 18:08:02.768 FATAL [b10-init.init/83744] BIND10_COMPONENT_UNSATISFIED
 component b10-auth is required to run and failed

and

#0  0x0000000000430da0 in isc::dns::Opcode::equals (this=0x1, other=@0x560a30)
    at opcode.h:97
#1  0x0000000000430ddd in isc::dns::Opcode::operator== (this=0x1, 
    other=@0x560a30) at opcode.h:100
#2  0x000000000042bec1 in makeErrorMessage (renderer=@0x803204388, 
    message=@0x80320f268, buffer=@0x80332e480, rcode=@0x56082c, 
    stats_attrs=@0x7fffffffbff0, tsig_context=@0x7fffffffbe50)
    at auth_srv.cc:439
#3  0x000000000042e2e6 in AuthSrvImpl::processNormalQuery (this=0x803204380, 
    io_message=@0x7fffffffc580, remote_edns=@0x7fffffffc230, 
    message=@0x80320f268, buffer=@0x80332e480, tsig_context=@0x7fffffffc220, 
    stats_attrs=@0x7fffffffbff0) at auth_srv.cc:673
#4  0x000000000042f98a in AuthSrv::processMessage (this=0x80322f5e0, 
    io_message=@0x7fffffffc580, message=@0x80320f268, buffer=@0x80332e480, 
    server=0x803333000) at auth_srv.cc:620
#5  0x000000000043a69b in MessageLookup::operator() (this=0x80320d9c0, 
    io_message=@0x7fffffffc580, message=@0x7fffffffc610, 
    buffer=@0x7fffffffc5f0, server=0x803333000) at auth_srv.cc:352
#6  0x000000080129312a in isc::asiodns::SyncUDPServer::handleRead (
    this=0x803333000, ec=@0x7fffffffc970, length=55) at sync_udp_server.cc:112
#7  0x00000008012951d2 in boost::_mfi::mf2<void, isc::asiodns::SyncUDPServer, asio::error_code const&, unsigned long>::operator() (this=0x7fffffffc958, 
    p=0x803333000, a1=@0x7fffffffc970, a2=55) at mem_fn_template.hpp:280
#8  0x0000000801295551 in boost::_bi::list3<boost::_bi::value<isc::asiodns::SyncUDPServer*>, boost::arg<1>, boost::arg<2> >::operator()<boost::_mfi::mf2<void, isc::asiodns::SyncUDPServer, asio::error_code const&, unsigned long>, boost::_bi::list2<asio::error_code const&, unsigned long&> > (this=0x7fffffffc968, 
    f=@0x7fffffffc958, a=@0x7fffffffc790) at bind.hpp:392
#9  0x00000008012955a6 in boost::_bi::bind_t<void, boost::_mfi::mf2<void, isc::asiodns::SyncUDPServer, asio::error_code const&, unsigned long>, boost::_bi::list3<boost::_bi::value<isc::asiodns::SyncUDPServer*>, boost::arg<1>, boost::arg<2> > >::operator()<asio::error_code, unsigned long> (this=0x7fffffffc958, 
    a1=@0x7fffffffc970, a2=@0x7fffffffc7b8) at bind_template.hpp:76
#10 0x00000008012955dd in boost::detail::function::void_function_obj_invoker2<boost::_bi::bind_t<void, boost::_mfi::mf2<void, isc::asiodns::SyncUDPServer, asio::error_code const&, unsigned long>, boost::_bi::list3<boost::_bi::value<isc::asiodns::SyncUDPServer*>, boost::arg<1>, boost::arg<2> > >, void, asio::error_code const&, unsigned long>::invoke (function_obj_ptr=@0x7fffffffc958, 
    a0=@0x7fffffffc970, a1=55) at function_template.hpp:153
#11 0x0000000801296fc1 in boost::function2<void, asio::error_code const&, unsigned long>::operator() (this=0x7fffffffc950, a0=@0x7fffffffc970, a1=55)
    at function_template.hpp:1013
#12 0x0000000801296ff8 in asio::detail::binder2<boost::function<void ()(asio::error_code const&, unsigned long)>, asio::error_code, unsigned long>::operator()
    (this=0x7fffffffc950) at bind_handler.hpp:96
#13 0x000000080129709c in asio::asio_handler_invoke<asio::detail::binder2<boost::function<void ()(asio::error_code const&, unsigned long)>, asio::error_code, unsigned long> > (function=@0x7fffffffc950) at handler_invoke_hook.hpp:63
#14 0x00000008012970db in asio_handler_invoke_helpers::invoke<asio::detail::binder2<boost::function<void ()(asio::error_code const&, unsigned long)>, asio::error_code, unsigned long>, boost::function<void ()(asio::error_code const&, unsigned long)> > (function=@0x7fffffffc9d0, context=@0x7fffffffc9d0)
    at handler_invoke_helpers.hpp:39
#15 0x00000008012971c7 in asio::detail::reactive_socket_recvfrom_op<asio::mutable_buffers_1, asio::ip::basic_endpoint<asio::ip::udp>, boost::function<void ()(asio::error_code const&, unsigned long)> >::do_complete (owner=0x8032271d0, 
    base=0x803228350) at reactive_socket_recvfrom_op.hpp:115
#16 0x0000000800fcc495 in asio::detail::task_io_service_operation::complete (
    this=0x803228350, owner=@0x8032271d0) at task_io_service_operation.hpp:34
#17 0x0000000800fdbd67 in asio::detail::task_io_service::do_one (
    this=0x8032271d0, lock=@0x7fffffffcb70, this_idle_thread=0x7fffffffcb80)
    at task_io_service.ipp:277
#18 0x0000000800fdbed2 in asio::detail::task_io_service::run_one (
    this=0x8032271d0, ec=@0x7fffffffcbe0) at task_io_service.ipp:151
#19 0x0000000800fdc876 in asio::io_service::run_one (this=0x80320d1e0)
    at io_service.ipp:69
#20 0x0000000800fc8a19 in isc::cc::SessionImpl::readData (this=0x803275100, 
    data=0x803275148, datalen=4) at session.cc:192
#21 0x0000000800fc93ea in isc::cc::SessionImpl::readDataLength (
    this=0x803275100) at session.cc:160
#22 0x0000000800fc95f0 in isc::cc::Session::recvmsg (this=0x80320e530, 
    env=@0x7fffffffdc40, msg=@0x7fffffffdc30, nonblock=false, seq=18)
    at session.cc:384
#23 0x0000000800fcabb0 in isc::cc::Session::group_recvmsg (this=0x80320e530, 
    envelope=@0x7fffffffdc40, msg=@0x7fffffffdc30, nonblock=false, seq=18)
    at session.cc:502
#24 0x000000080179f6cb in releaseSocket (this=0x803277dd0, token=@0x80321ea00)
    at socket_request.cc:341
#25 0x000000080179618b in setAddresses (service=@0x7fffffffe590, 
    addresses=@0x7fffffffe130, 
    server_options=isc::asiodns::DNSServiceBase::SERVER_SYNC_OK)
    at portconfig.cc:87
#26 0x0000000801796a31 in isc::server_common::portconfig::installListenAddresses (new_addresses=@0x7fffffffe130, address_store=@0x8032043e8, 
    service=@0x7fffffffe590, 
    server_options=isc::asiodns::DNSServiceBase::SERVER_SYNC_OK)
    at portconfig.cc:129
#27 0x000000000042b621 in AuthSrv::setListenAddresses (this=0x80322f5e0, 
    addresses=@0x7fffffffe130) at auth_srv.cc:905
#28 0x000000000043f512 in build (this=0x8032dd3a0, config=@0x7fffffffe410)
    at auth_config.cc:98
#29 0x000000000043ed40 in configureAuthServer (server=@0x80322f5e0, 
    config_set=@0x7fffffffe750) at auth_config.cc:200
#30 0x0000000000445e0d in main (argc=1, argv=0x7fffffffe9e0) at main.cc:218

comment:10 follow-up: Changed 7 years ago by jinmei

I believe I figured out the reason. If my guess is correct, it's
primarily because of #2903, but it has actually revealed a more
fundamental issue that's been existing even before that change.

Anyway, could you try the attached patch (forgetting the previous
one)? If my theory is correct, this should solve this particular
crash, also (to some extent) addressing the "fundamental" issue.

Changed 7 years ago by jinmei

comment:11 Changed 7 years ago by jinmei

  • Milestone changed from New Tasks to Next-Sprint-Proposed

comment:12 Changed 7 years ago by jinmei

#2556 is probably the same problem as this one (if my guess is correct
it's not specific to the SyncUDPServer class).

comment:13 follow-up: Changed 7 years ago by vorner

I can't say I understand the reason why it kept failing. But the patch makes me slightly nervous, deleting the stopped at several places (and not setting it to NULL). I understand this is just a proposal, but can we use something like shared pointer there, or something?

comment:14 in reply to: ↑ 13 Changed 7 years ago by jinmei

Replying to vorner:

I can't say I understand the reason why it kept failing. But the patch makes me slightly nervous, deleting the stopped at several places (and not setting it to NULL). I understand this is just a proposal, but can we use something like shared pointer there, or something?

The attached patch is just a quick hack to prove (or disprove) the theory.
Once proved, we should develop a cleaner fix.

comment:15 in reply to: ↑ 10 Changed 7 years ago by jreed

udp-server2.diff doesn't apply clean for clean source from bind10-1.1.0beta2

...

|--- a/src/lib/asiodns/sync_udp_server.h
|+++ b/src/lib/asiodns/sync_udp_server.h
--------------------------
Patching file src/lib/asiodns/sync_udp_server.h using Plan A...
Hunk #1 succeeded at 70.
Hunk #2 failed at 156.
Hunk #3 failed at 179.
Hunk #4 succeeded at 175 (offset -13 lines).
2 out of 4 hunks failed--saving rejects to src/lib/asiodns/sync_udp_server.h.rej
done

I didn't check why yet.

comment:16 Changed 7 years ago by jinmei

You'll probably first need to apply
fd6c9be84e460ead7e1fd3e0888828854adccb4f

(and this will be needed for beta anyway; without it, it doesn't
compile w/ SunStudio?)

comment:17 Changed 7 years ago by jreed

Okay new patch is in place and running on the as112 server. It appears to be working.

One thing I noticed is that I have four auth processes and they used to be near same WCPU usage, but now they aren't. One is around 47% and another is around 30%. Not sure if this is related though.

Changed 7 years ago by jinmei

comment:18 Changed 7 years ago by jinmei

This is a use-after-free problem.

According to the backtrace, what happened appears to be:

  • b10-auth first sets listen_on at the time of creating ModuleCCSession. It eventually creates corresponding SyncUDPServer (and other DNSServer) objects, which register themselves with read event callbacks for the ASIO io_service.
  • In the case of SyncUDPServer, async_receive_from() is called, which first checks if there's any readable data using non blocking I/O, and if there is post an event to the main loop of the io_service (at this point such events cannot be canceled any more). As the production AS112 server is very busy, I guess the SyncUDPServer effectively starts receiving queries at this point.
  • b10-auth then calls configureAuthServer() to all install user configurations. For listen_on, this means the previously created SyncUDPServer (and other DNSServer) objects are destroyed:
    void
    DNSService::clearServers() {
        BOOST_FOREACH(const DNSServiceImpl::DNSServerPtr& s, impl_->servers_) {
            s->stop();
        }
        impl_->servers_.clear();
    }
    
  • SyncUDPServer::stop() closes the socket, but at this point any already posted completed read event isn't affected. It will be still trigger the read callback eventually.
  • the SyncUDPServer object itself is destroyed at this point.
  • then, to complete configureAuthServer(), b10-auth communicates with other modules via the CC session, which causes calls to io_service::run_one(). One of such calls result in the call to the SyncUDPServer callback (this is actually happening according to the backtrace)
  • but at this point this server object has already been destroyed, so any behavior that relies on the local member variables of the object is undefined. That should be the reason we saw "calling pure virtual" or other strange crash.

Before #2903, this code should somehow prevented catastrophic results:

    if (checkin_callback_ != NULL) {
        (*checkin_callback_)(message);
        if (stopped_) {
            return;
        }
    }

While it still uses local member variables of the destroyed object and
should cause crash or other troubles, we were probably lucky and both
checkin_callback_ and stopped_ retained the original value.

So, simply reverting #2903 wouldn't be a real solution. What we
should do is to control the lifetime of the server objects so they are
not destroyed until all posted events are completed.

But, for the 1.1.0-release (beta), I suggest a workaround that is not
really correct but just as bad as pre-#2903
(http://bind10.isc.org/raw-attachment/ticket/2942/udp-server3.diff),
and developing a complete fix (not only for SyncUDPServer but also
for other DNS server classes) separately.

comment:19 Changed 7 years ago by jreed

Okay I am now running udp-server3.diff on the busy system. It appears to be working fine now. I will check again later.

comment:20 Changed 7 years ago by jinmei

the proposed workaround was committed at 89e0087.

followup ticket #2946 has been created for a complete solution.

closing this ticket.

comment:21 Changed 7 years ago by jinmei

  • Resolution set to fixed
  • Status changed from new to closed

comment:22 Changed 7 years ago by muks

  • Milestone Next-Sprint-Proposed deleted
Note: See TracTickets for help on using tickets.