Opened 3 years ago

Closed 3 years ago

Last modified 3 years ago

#5336 closed defect (fixed)

COMMAND_SOCKET_WRITE_FAIL Error while writing to command socket 25 : Broken pipe

Reported by: atan Owned by: Unassigned
Priority: medium Milestone: Kea1.3 beta
Component: Unclassified Version: git
Keywords: Cc:
CVSS Scoring: Parent Tickets:
Sensitive: no Defect Severity: N/A
Sub-Project: DHCP Feature Depending on Ticket:
Estimated Difficulty: 0 Add Hours to Ticket: 5
Total Hours: 5 Internal?: no

Description

I'm trying to get (set) configuration using API. Using 1.2 is not possible because of 64KB server response limit. interface re-detect is false (because of another unsolved bug). First 3-4 config-get request in a short period of time fails with this error. Than it responses with a config json, but following config-test (with added about 500 subnets to config) and config-set fails with the same error.

kea-ca.log

2017-07-17 14:44:16.760 DEBUG [kea-ctrl-agent.http/22667] HTTP_REQUEST_RECEIVE_START start receiving request from 172.20.100.100 with timeout 10
2017-07-17 14:44:16.760 DEBUG [kea-ctrl-agent.http/22667] HTTP_DATA_RECEIVED received 153 bytes from 172.20.100.100
2017-07-17 14:44:16.761 DEBUG [kea-ctrl-agent.http/22667] HTTP_REQUEST_RECEIVED received HTTP request from 172.20.100.100
2017-07-17 14:44:16.761 INFO  [kea-ctrl-agent.commands/22667] COMMAND_RECEIVED Received command 'config-get'
2017-07-17 14:44:16.761 DEBUG [kea-ctrl-agent.ctrl-agent/22667] CTRL_AGENT_COMMAND_FORWARD_BEGIN begin forwarding command config-get to service dhcp4
2017-07-17 14:44:16.763 DEBUG [kea-ctrl-agent.ctrl-agent/22667] CTRL_AGENT_COMMAND_FORWARD_FAILED failed forwarding command config-get: unable to forward command to the dhcp4 service: Bad file descriptor. The server is likely to be offline
2017-07-17 14:44:16.763 DEBUG [kea-ctrl-agent.http/22667] HTTP_RESPONSE_SEND sending HTTP response HTTP/1.1 200 OK
 to 172.20.100.100
2017-07-17 14:44:16.770 DEBUG [kea-ctrl-agent.http/22667] HTTP_CONNECTION_STOP stopping HTTP connection from 172.20.100.100

kea-dhcp4.log

2017-07-17 14:44:15.819 DEBUG [kea-dhcp4.packets/22659] DHCP4_BUFFER_WAIT waiting for next DHCPv4 packet with timeout 1 ms
2017-07-17 14:44:16.768 DEBUG [kea-dhcp4.packets/22659] DHCP4_BUFFER_WAIT_INTERRUPTED interrupted wait for the next packet due to timeout, signal or external socket callback (timeout value is 1)
2017-07-17 14:44:16.768 INFO  [kea-dhcp4.commands/22659] COMMAND_SOCKET_CONNECTION_OPENED Opened socket 25 for incoming command connection
2017-07-17 14:44:16.768 DEBUG [kea-dhcp4.commands/22659] COMMAND_SOCKET_READ Received 51 bytes over command socket 25
2017-07-17 14:44:16.768 INFO  [kea-dhcp4.commands/22659] COMMAND_RECEIVED Received command 'config-get'
2017-07-17 14:44:16.770 ERROR [kea-dhcp4.commands/22659] COMMAND_SOCKET_WRITE_FAIL Error while writing to command socket 25 : Broken pipe
2017-07-17 14:44:16.770 DEBUG [kea-dhcp4.packets/22659] DHCP4_BUFFER_WAIT waiting for next DHCPv4 packet with timeout 1 ms
2017-07-17 14:44:17.772 DEBUG [kea-dhcp4.packets/22659] DHCP4_BUFFER_WAIT_INTERRUPTED interrupted wait for the next packet due to timeout, signal or external socket callback (timeout value is 1)
2017-07-17 14:44:17.772 DEBUG [kea-dhcp4.packets/22659] DHCP4_BUFFER_WAIT waiting for next DHCPv4 packet with timeout 1 ms
2017-07-17 14:44:18.773 DEBUG [kea-dhcp4.packets/22659] DHCP4_BUFFER_WAIT_INTERRUPTED interrupted wait for the next packet due to timeout, signal or external socket callback (timeout value is 1)

Subtickets

Change History (15)

comment:1 Changed 3 years ago by atan

upd: First 3-4 config-get requests in a short period of time fail with this error.

comment:2 Changed 3 years ago by fdupont

EPIPE error means the other side is closed (and SIGPIPE signal ignored) so IMHO the issue should be at the other (than Kea server) side. Again strace (on Linux, other systems have the same tool with another name) will help to understand what happens.

Last edited 3 years ago by fdupont (previous) (diff)

comment:3 Changed 3 years ago by atan

kea-ctrl and kea-dhcp4 where started in separate shells at the same time.

kea-ctrl strace:

write(1, "2017-07-17 16:50:25.851 INFO  [k"..., 1882017-07-17 16:50:25.851 INFO  [kea-ctrl-agent.dctl/24457] DCTL_CONFIG_COMPLETE server has completed configuration: listening on 0.0.0.0, port 8080, control sockets: dhcp4 dhcp6, 0 lib(s):
) = 188
fcntl(3, F_SETLKW, {l_type=F_UNLCK, l_whence=SEEK_SET, l_start=0, l_len=1}) = 0
open("/usr/local/var/log/kea-dhcp4.log", O_WRONLY|O_CREAT|O_APPEND, 0666) = 10
lseek(10, 0, SEEK_END)                  = 45591217
open("/usr/local/var/log/kea-dhcp6.log", O_WRONLY|O_CREAT|O_APPEND, 0666) = 11
lseek(11, 0, SEEK_END)                  = 0
open("/usr/local/var/log/kea-ddns.log", O_WRONLY|O_CREAT|O_APPEND, 0666) = 12
lseek(12, 0, SEEK_END)                  = 0
open("/usr/local/var/log/kea-ca.log", O_WRONLY|O_CREAT|O_APPEND, 0666) = 13
lseek(13, 0, SEEK_END)                  = 68850
rt_sigaction(SIGHUP, {sa_handler=0x7fb506776c00, sa_mask=~[RTMIN RT_1], sa_flags=SA_RESTORER, sa_restorer=0x7fb50576e0c0}, NULL, 8) = 0
rt_sigaction(SIGINT, {sa_handler=0x7fb506776c00, sa_mask=~[RTMIN RT_1], sa_flags=SA_RESTORER, sa_restorer=0x7fb50576e0c0}, NULL, 8) = 0
rt_sigaction(SIGTERM, {sa_handler=0x7fb506776c00, sa_mask=~[RTMIN RT_1], sa_flags=SA_RESTORER, sa_restorer=0x7fb50576e0c0}, NULL, 8) = 0
fcntl(9, F_SETLKW, {l_type=F_WRLCK, l_whence=SEEK_SET, l_start=0, l_len=1}) = 0
clock_gettime(CLOCK_REALTIME, {tv_sec=1500303025, tv_nsec=855253311}) = 0
getpid()                                = 24457
write(13, "2017-07-17 16:50:25.855 INFO  [k"..., 127) = 127
fcntl(9, F_SETLKW, {l_type=F_UNLCK, l_whence=SEEK_SET, l_start=0, l_len=1}) = 0
epoll_wait(5, [{EPOLLIN, {u32=223290692, u64=93991287596356}}], 128, 0) = 1
epoll_wait(5, [{EPOLLIN, {u32=223290752, u64=93991287596416}}], 128, -1) = 1
accept(7, NULL, NULL)                   = 14
epoll_ctl(5, EPOLL_CTL_ADD, 14, {EPOLLIN|EPOLLPRI|EPOLLERR|EPOLLHUP|EPOLLET, {u32=223191040, u64=93991287496704}}) = 0
epoll_ctl(5, EPOLL_CTL_MOD, 7, {EPOLLIN|EPOLLPRI|EPOLLERR|EPOLLHUP|EPOLLET, {u32=223290752, u64=93991287596416}}) = 0
gettimeofday({tv_sec=1500303078, tv_usec=252328}, NULL) = 0
gettimeofday({tv_sec=1500303078, tv_usec=252620}, NULL) = 0
timerfd_settime(6, 0, {it_interval={tv_sec=0, tv_nsec=0}, it_value={tv_sec=9, tv_nsec=999708000}}, {it_interval={tv_sec=0, tv_nsec=0}, it_value={tv_sec=0, tv_nsec=0}}) = 0
getpeername(14, {sa_family=AF_INET, sin_port=htons(46450), sin_addr=inet_addr("172.20.100.100")}, [28->16]) = 0
ioctl(14, FIONBIO, [1])                 = 0
recvmsg(14, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="POST / HTTP/1.1\r\nHost: x.x.x."..., iov_len=4096}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 153
epoll_wait(5, [], 128, 0)               = 0
umask(0111)                             = 022
open("/usr/local/var/run/kea/logger_lockfile", O_RDWR|O_CREAT, 0660) = 15
umask(022)                              = 0111
fcntl(15, F_SETLKW, {l_type=F_WRLCK, l_whence=SEEK_SET, l_start=0, l_len=1}) = 0
clock_gettime(CLOCK_REALTIME, {tv_sec=1500303078, tv_nsec=255882180}) = 0
getpid()                                = 24457
write(13, "2017-07-17 16:51:18.255 INFO  [k"..., 109) = 109
fcntl(15, F_SETLKW, {l_type=F_UNLCK, l_whence=SEEK_SET, l_start=0, l_len=1}) = 0
eventfd2(0, EFD_CLOEXEC|EFD_NONBLOCK)   = 16
epoll_create1(EPOLL_CLOEXEC)            = 17
timerfd_create(CLOCK_MONOTONIC, TFD_CLOEXEC) = 18
epoll_ctl(17, EPOLL_CTL_ADD, 16, {EPOLLIN|EPOLLERR|EPOLLET, {u32=223304068, u64=93991287609732}}) = 0
write(16, "\1\0\0\0\0\0\0\0", 8)        = 8
epoll_ctl(17, EPOLL_CTL_ADD, 18, {EPOLLIN|EPOLLERR, {u32=223304080, u64=93991287609744}}) = 0
gettimeofday({tv_sec=1500303078, tv_usec=259114}, NULL) = 0
gettimeofday({tv_sec=1500303078, tv_usec=259455}, NULL) = 0
timerfd_settime(18, 0, {it_interval={tv_sec=0, tv_nsec=0}, it_value={tv_sec=4, tv_nsec=999659000}}, {it_interval={tv_sec=0, tv_nsec=0}, it_value={tv_sec=0, tv_nsec=0}}) = 0
socket(AF_UNIX, SOCK_STREAM, 0)         = 19
epoll_ctl(17, EPOLL_CTL_ADD, 19, {EPOLLIN|EPOLLPRI|EPOLLERR|EPOLLHUP|EPOLLET, {u32=223394480, u64=93991287700144}}) = 0
ioctl(19, FIONBIO, [1])                 = 0
connect(19, {sa_family=AF_UNIX, sun_path="/usr/local/var/kea/dhcp4.socket"}, 33) = 0
epoll_wait(17, [{EPOLLIN, {u32=223304068, u64=93991287609732}}], 128, 0) = 1
sendmsg(19, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="{ \"command\": \"config-get\", \"serv"..., iov_len=51}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 51
epoll_wait(17, [], 128, 0)              = 0
recvmsg(19, {msg_namelen=0}, 0)         = -1 EAGAIN (Resource temporarily unavailable)
epoll_wait(17, [], 128, 0)              = 0
recvmsg(19, {msg_namelen=0}, 0)         = -1 EAGAIN (Resource temporarily unavailable)
close(19)                               = 0
epoll_wait(17, [], 128, 0)              = 0
epoll_wait(17, [], 128, 0)              = 0
futex(0x7fb504ab01a0, FUTEX_WAKE_PRIVATE, 2147483647) = 0
close(17)                               = 0
close(18)                               = 0
close(16)                               = 0
gettimeofday({tv_sec=1500303078, tv_usec=271544}, NULL) = 0
sendmsg(14, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="HTTP/1.1 200 OK\r\nContent-Length:"..., iov_len=243}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 243
epoll_wait(5, [], 128, 0)               = 0
epoll_ctl(5, EPOLL_CTL_DEL, 14, 0x7ffe328fc8f4) = 0
close(14)                               = 0
epoll_wait(5, [], 128, 0)               = 0
epoll_wait(5, [], 128, 0)               = 0
epoll_wait(5, [], 128, 0)               = 0

kea-dhcp4 strace:

bind(12, {sa_family=AF_PACKET, sll_protocol=htons(0 /* ETH_P_??? */), sll_ifindex=if_nametoindex("eth0"), sll_hatype=ARPHRD_NETROM, sll_pkttype=PACKET_HOST, sll_halen=0}, 20) = 0
gettimeofday({tv_sec=1500303068, tv_usec=694399}, NULL) = 0
gettimeofday({tv_sec=1500303068, tv_usec=694549}, NULL) = 0
timerfd_settime(7, 0, {it_interval={tv_sec=0, tv_nsec=0}, it_value={tv_sec=9, tv_nsec=999850000}}, {it_interval={tv_sec=0, tv_nsec=0}, it_value={tv_sec=0, tv_nsec=0}}) = 0
gettimeofday({tv_sec=1500303068, tv_usec=694782}, NULL) = 0
open("/usr/local/var/log/kea-dhcp4.log", O_WRONLY|O_CREAT|O_APPEND, 0666) = 13
lseek(13, 0, SEEK_END)                  = 45591217
open("/usr/local/var/log/kea-dhcp6.log", O_WRONLY|O_CREAT|O_APPEND, 0666) = 14
lseek(14, 0, SEEK_END)                  = 0
open("/usr/local/var/log/kea-ddns.log", O_WRONLY|O_CREAT|O_APPEND, 0666) = 15
lseek(15, 0, SEEK_END)                  = 0
open("/usr/local/var/log/kea-ca.log", O_WRONLY|O_CREAT|O_APPEND, 0666) = 16
lseek(16, 0, SEEK_END)                  = 68977
gettimeofday({tv_sec=1500303068, tv_usec=696080}, NULL) = 0
gettimeofday({tv_sec=1500303068, tv_usec=696220}, NULL) = 0
sendto(9, "\27\3\3\0Q\1\305N+\365\273\357\7\247\333\272&\262\4\366\363\345`:\324G\370\262\254\207\363\265"..., 86, MSG_NOSIGNAL, NULL, 0) = 86
poll([{fd=9, events=POLLIN|POLLERR}], 1, -1) = 1 ([{fd=9, revents=POLLIN}])
recvfrom(9, "\27\3\3\0\35", 5, 0, NULL, NULL) = 5
recvfrom(9, "\345Pou\323jQ\341\301\271\325\37W#\215\2740\363?C\223\242\251\234\216bS\3142", 29, 0, NULL, NULL) = 29
poll([{fd=9, events=POLLIN|POLLERR}], 1, -1) = 1 ([{fd=9, revents=POLLIN}])
recvfrom(9, "\27\3\3\0q", 5, 0, NULL, NULL) = 5
recvfrom(9, "\345Pou\323jQ\342\370~\30\362}\342@\314\371\4\240\204\243{}\3122\0343yOs\373w"..., 113, 0, NULL, NULL) = 113
poll([{fd=9, events=POLLIN|POLLERR}], 1, -1) = 1 ([{fd=9, revents=POLLIN}])
recvfrom(9, "\27\3\3\0<", 5, 0, NULL, NULL) = 5
recvfrom(9, "\345Pou\323jQ\343\346_\303\206\267\32\337\33?\10&\202?^\260Eb\316Hc:J\v\275"..., 60, 0, NULL, NULL) = 60
poll([{fd=9, events=POLLIN|POLLERR}], 1, -1) = 1 ([{fd=9, revents=POLLIN}])
recvfrom(9, "\27\3\3\0\36", 5, 0, NULL, NULL) = 5
recvfrom(9, "\345Pou\323jQ\344|\3166\256\241\"\216\240v\260\264\371\341\225\362\213`\254|\341<#", 30, 0, NULL, NULL) = 30
gettimeofday({tv_sec=1500303068, tv_usec=699279}, NULL) = 0
gettimeofday({tv_sec=1500303068, tv_usec=699319}, NULL) = 0
gettimeofday({tv_sec=1500303068, tv_usec=699435}, NULL) = 0
gettimeofday({tv_sec=1500303068, tv_usec=699625}, NULL) = 0
gettimeofday({tv_sec=1500303068, tv_usec=699759}, NULL) = 0
gettimeofday({tv_sec=1500303068, tv_usec=699923}, NULL) = 0
gettimeofday({tv_sec=1500303068, tv_usec=699992}, NULL) = 0
gettimeofday({tv_sec=1500303068, tv_usec=700095}, NULL) = 0
gettimeofday({tv_sec=1500303068, tv_usec=700180}, NULL) = 0
gettimeofday({tv_sec=1500303068, tv_usec=700311}, NULL) = 0
gettimeofday({tv_sec=1500303068, tv_usec=700491}, NULL) = 0
gettimeofday({tv_sec=1500303068, tv_usec=700602}, NULL) = 0
rt_sigaction(SIGINT, {sa_handler=0x7fafba5f2c00, sa_mask=~[RTMIN RT_1], sa_flags=SA_RESTORER, sa_restorer=0x7fafb95ea0c0}, NULL, 8) = 0
rt_sigaction(SIGHUP, {sa_handler=0x7fafba5f2c00, sa_mask=~[RTMIN RT_1], sa_flags=SA_RESTORER, sa_restorer=0x7fafb95ea0c0}, NULL, 8) = 0
rt_sigaction(SIGTERM, {sa_handler=0x7fafba5f2c00, sa_mask=~[RTMIN RT_1], sa_flags=SA_RESTORER, sa_restorer=0x7fafb95ea0c0}, NULL, 8) = 0
fcntl(3, F_SETLKW, {l_type=F_WRLCK, l_whence=SEEK_SET, l_start=0, l_len=1}) = 0
clock_gettime(CLOCK_REALTIME, {tv_sec=1500303068, tv_nsec=701187963}) = 0
getpid()                                = 24463
write(13, "2017-07-17 16:51:08.701 INFO  [k"..., 112) = 112
fcntl(3, F_SETLKW, {l_type=F_UNLCK, l_whence=SEEK_SET, l_start=0, l_len=1}) = 0
select(13, [8 12], NULL, NULL, {tv_sec=1, tv_usec=0}) = 1 (in [12], left {tv_sec=0, tv_usec=999996})
recvfrom(11, 0x7ffe8a933da0, 1500, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
read(12, "\0\0^\0\1<\0\26>\252\2#\10\0E\20\1H\374M@\0@\6\312\341Q\31\20\337\254\24"..., 1500) = 342
rt_sigprocmask(SIG_BLOCK, [HUP INT TERM], NULL, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [HUP INT TERM], NULL, 8) = 0
gettimeofday({tv_sec=1500303068, tv_usec=702215}, NULL) = 0
gettimeofday({tv_sec=1500303068, tv_usec=702472}, NULL) = 0
gettimeofday({tv_sec=1500303068, tv_usec=702535}, NULL) = 0
epoll_wait(6, [{EPOLLIN, {u32=1848698516, u64=94276380845716}}], 128, 0) = 1
select(13, [8 12], NULL, NULL, {tv_sec=1, tv_usec=0}) = 1 (in [12], left {tv_sec=0, tv_usec=999998})
recvfrom(11, 0x7ffe8a933da0, 1500, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
read(12, "\0\26>\252\2#\0\0^\0\1<\10\0E\0\0004\215L@\0:\6A\7\254\24ddQ\31"..., 1500) = 66
umask(0111)                             = 022
open("/usr/local/var/run/kea/logger_lockfile", O_RDWR|O_CREAT, 0660) = 17
umask(022)                              = 0111
fcntl(17, F_SETLKW, {l_type=F_WRLCK, l_whence=SEEK_SET, l_start=0, l_len=1}) = 0
clock_gettime(CLOCK_REALTIME, {tv_sec=1500303068, tv_nsec=703515755}) = 0
getpid()                                = 24463
write(13, "2017-07-17 16:51:08.703 ERROR [k"..., 187) = 187
fcntl(17, F_SETLKW, {l_type=F_UNLCK, l_whence=SEEK_SET, l_start=0, l_len=1}) = 0
rt_sigprocmask(SIG_BLOCK, [HUP INT TERM], NULL, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [HUP INT TERM], NULL, 8) = 0
epoll_wait(6, [], 128, 0)               = 0
select(13, [8 12], NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 (Timeout)
rt_sigprocmask(SIG_BLOCK, [HUP INT TERM], NULL, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [HUP INT TERM], NULL, 8) = 0
epoll_wait(6, [], 128, 0)               = 0
select(13, [8 12], NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 (Timeout)
rt_sigprocmask(SIG_BLOCK, [HUP INT TERM], NULL, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [HUP INT TERM], NULL, 8) = 0
epoll_wait(6, [], 128, 0)               = 0
select(13, [8 12], NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 (Timeout)
rt_sigprocmask(SIG_BLOCK, [HUP INT TERM], NULL, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [HUP INT TERM], NULL, 8) = 0
epoll_wait(6, [], 128, 0)               = 0
select(13, [8 12], NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 (Timeout)
rt_sigprocmask(SIG_BLOCK, [HUP INT TERM], NULL, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [HUP INT TERM], NULL, 8) = 0
epoll_wait(6, [], 128, 0)               = 0
select(13, [8 12], NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 (Timeout)
rt_sigprocmask(SIG_BLOCK, [HUP INT TERM], NULL, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [HUP INT TERM], NULL, 8) = 0
epoll_wait(6, [], 128, 0)               = 0
select(13, [8 12], NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 (Timeout)
rt_sigprocmask(SIG_BLOCK, [HUP INT TERM], NULL, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [HUP INT TERM], NULL, 8) = 0
epoll_wait(6, [], 128, 0)               = 0
select(13, [8 12], NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 (Timeout)
rt_sigprocmask(SIG_BLOCK, [HUP INT TERM], NULL, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [HUP INT TERM], NULL, 8) = 0
epoll_wait(6, [], 128, 0)               = 0
select(13, [8 12], NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 (Timeout)
rt_sigprocmask(SIG_BLOCK, [HUP INT TERM], NULL, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [HUP INT TERM], NULL, 8) = 0
epoll_wait(6, [], 128, 0)               = 0
select(13, [8 12], NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 (Timeout)
rt_sigprocmask(SIG_BLOCK, [HUP INT TERM], NULL, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [HUP INT TERM], NULL, 8) = 0
epoll_wait(6, [], 128, 0)               = 0
select(13, [8 12], NULL, NULL, {tv_sec=1, tv_usec=0}) = 1 (in [8], left {tv_sec=0, tv_usec=461250})
rt_sigprocmask(SIG_BLOCK, [HUP INT TERM], NULL, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [HUP INT TERM], NULL, 8) = 0
epoll_wait(6, [{EPOLLIN, {u32=1848699664, u64=94276380846864}}], 128, 0) = 1
accept(8, NULL, NULL)                   = 18
epoll_ctl(6, EPOLL_CTL_ADD, 18, {EPOLLIN|EPOLLPRI|EPOLLERR|EPOLLHUP|EPOLLET, {u32=1848685952, u64=94276380833152}}) = 0
umask(0111)                             = 022
open("/usr/local/var/run/kea/logger_lockfile", O_RDWR|O_CREAT, 0660) = 19
umask(022)                              = 0111
fcntl(19, F_SETLKW, {l_type=F_WRLCK, l_whence=SEEK_SET, l_start=0, l_len=1}) = 0
clock_gettime(CLOCK_REALTIME, {tv_sec=1500303078, tv_nsec=262419930}) = 0
getpid()                                = 24463
write(13, "2017-07-17 16:51:18.262 INFO  [k"..., 139) = 139
fcntl(19, F_SETLKW, {l_type=F_UNLCK, l_whence=SEEK_SET, l_start=0, l_len=1}) = 0
gettimeofday({tv_sec=1500303078, tv_usec=263393}, NULL) = 0
ioctl(18, FIONBIO, [1])                 = 0
recvmsg(18, {msg_namelen=0}, 0)         = -1 EAGAIN (Resource temporarily unavailable)
epoll_ctl(6, EPOLL_CTL_MOD, 8, {EPOLLIN|EPOLLPRI|EPOLLERR|EPOLLHUP|EPOLLET, {u32=1848699664, u64=94276380846864}}) = 0
epoll_wait(6, [], 128, 0)               = 0
select(19, [8 12 18], NULL, NULL, {tv_sec=1, tv_usec=0}) = 1 (in [18], left {tv_sec=0, tv_usec=999314})
rt_sigprocmask(SIG_BLOCK, [HUP INT TERM], NULL, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [HUP INT TERM], NULL, 8) = 0
epoll_wait(6, [{EPOLLIN, {u32=1848685952, u64=94276380833152}}], 128, 0) = 1
recvmsg(18, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="{ \"command\": \"config-get\", \"serv"..., iov_len=8192}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 51
fcntl(19, F_SETLKW, {l_type=F_WRLCK, l_whence=SEEK_SET, l_start=0, l_len=1}) = 0
clock_gettime(CLOCK_REALTIME, {tv_sec=1500303078, tv_nsec=265503950}) = 0
getpid()                                = 24463
write(13, "2017-07-17 16:51:18.265 INFO  [k"..., 104) = 104
fcntl(19, F_SETLKW, {l_type=F_UNLCK, l_whence=SEEK_SET, l_start=0, l_len=1}) = 0
sendmsg(18, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="{ \"arguments\": { \"Dhcp4\": { \"con"..., iov_len=3348}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = -1 EPIPE (Broken pipe)
epoll_wait(6, [{EPOLLIN|EPOLLHUP, {u32=1848685952, u64=94276380833152}}], 128, 0) = 1
fcntl(19, F_SETLKW, {l_type=F_WRLCK, l_whence=SEEK_SET, l_start=0, l_len=1}) = 0
clock_gettime(CLOCK_REALTIME, {tv_sec=1500303078, tv_nsec=268576857}) = 0
getpid()                                = 24463
write(13, "2017-07-17 16:51:18.268 ERROR [k"..., 138) = 138
fcntl(19, F_SETLKW, {l_type=F_UNLCK, l_whence=SEEK_SET, l_start=0, l_len=1}) = 0
fcntl(19, F_SETLKW, {l_type=F_WRLCK, l_whence=SEEK_SET, l_start=0, l_len=1}) = 0
clock_gettime(CLOCK_REALTIME, {tv_sec=1500303078, tv_nsec=270145234}) = 0
getpid()                                = 24463
write(13, "2017-07-17 16:51:18.270 INFO  [k"..., 139) = 139
fcntl(19, F_SETLKW, {l_type=F_UNLCK, l_whence=SEEK_SET, l_start=0, l_len=1}) = 0
epoll_ctl(6, EPOLL_CTL_DEL, 18, 0x7ffe8a934580) = 0
close(18)                               = 0
epoll_wait(6, [], 128, 0)               = 0
epoll_wait(6, [], 128, 0)               = 0
select(13, [8 12], NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 (Timeout)
rt_sigprocmask(SIG_BLOCK, [HUP INT TERM], NULL, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [HUP INT TERM], NULL, 8) = 0
epoll_wait(6, [{EPOLLIN, {u32=1848698528, u64=94276380845728}}], 128, 0) = 1
gettimeofday({tv_sec=1500303079, tv_usec=274086}, NULL) = 0
gettimeofday({tv_sec=1500303079, tv_usec=274116}, NULL) = 0
timerfd_settime(7, 0, {it_interval={tv_sec=0, tv_nsec=0}, it_value={tv_sec=14, tv_nsec=420666000}}, {it_interval={tv_sec=0, tv_nsec=0}, it_value={tv_sec=0, tv_nsec=0}}) = 0
gettimeofday({tv_sec=1500303079, tv_usec=274294}, NULL) = 0
gettimeofday({tv_sec=1500303079, tv_usec=274439}, NULL) = 0
sendto(9, "\27\3\3\0w\1\305N+\365\273\357\10\343\252H\302Tt\302=\265\223\244\377\227\231\320\232\36\211Z"..., 124, MSG_NOSIGNAL, NULL, 0) = 124
poll([{fd=9, events=POLLIN|POLLERR}], 1, -1) = 1 ([{fd=9, revents=POLLIN}])
recvfrom(9, "\27\3\3\0\35", 5, 0, NULL, NULL) = 5
recvfrom(9, "\345Pou\323jQ\345.\"\311\270\3515\333\365p\377\226\261\372\304 \36\35M\306\376\342", 29, 0, NULL, NULL) = 29
poll([{fd=9, events=POLLIN|POLLERR}], 1, -1) = 1 ([{fd=9, revents=POLLIN}])
recvfrom(9, "\27\3\3\0010", 5, 0, NULL, NULL) = 5
recvfrom(9, "\345Pou\323jQ\346*\360\27\4OC\232p\3\245}:\375\266Y&\374\21\243V\n\242\331\r"..., 304, 0, NULL, NULL) = 304
poll([{fd=9, events=POLLIN|POLLERR}], 1, -1) = 1 ([{fd=9, revents=POLLIN}])
recvfrom(9, "\27\3\3\0&", 5, 0, NULL, NULL) = 5
recvfrom(9, "\345Pou\323jQ\347oK\372P\242E\221\223\362$\263\243\337\256N\340\35\206#F\306\272\325C"..., 38, 0, NULL, NULL) = 38
poll([{fd=9, events=POLLIN|POLLERR}], 1, -1) = 1 ([{fd=9, revents=POLLIN}])
recvfrom(9, "\27\3\3\0\36", 5, 0, NULL, NULL) = 5
recvfrom(9, "\345Pou\323jQ\350[\377\365\356n\221\1s\276F\304\202hC\377k\216R~\206,E", 30, 0, NULL, NULL) = 30
gettimeofday({tv_sec=1500303079, tv_usec=277066}, NULL) = 0
gettimeofday({tv_sec=1500303079, tv_usec=277153}, NULL) = 0
gettimeofday({tv_sec=1500303079, tv_usec=277249}, NULL) = 0
timerfd_settime(7, 0, {it_interval={tv_sec=0, tv_nsec=0}, it_value={tv_sec=9, tv_nsec=999904000}}, {it_interval={tv_sec=0, tv_nsec=0}, it_value={tv_sec=14, tv_nsec=417478497}}) = 0
epoll_wait(6, [], 128, 0)               = 0
select(13, [8 12], NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 (Timeout)
rt_sigprocmask(SIG_BLOCK, [HUP INT TERM], NULL, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [HUP INT TERM], NULL, 8) = 0
epoll_wait(6, [], 128, 0)               = 0
select(13, [8 12], NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 (Timeout)
rt_sigprocmask(SIG_BLOCK, [HUP INT TERM], NULL, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [HUP INT TERM], NULL, 8) = 0
epoll_wait(6, [], 128, 0)               = 0
select(13, [8 12], NULL, NULL, {tv_sec=1, tv_usec=0}

comment:4 Changed 3 years ago by marcin

Hello,

Would you be willing to try the Kea version compiled out of master branch in git? We have updated the control channel significantly after 1.2 release. In particular, we have removed the 64k limits and EAGAIN error code handling should be improved.

If you need support in building Kea out of master branch we can help. Refer to https://github.com/isc-projects/kea

Thanks,
Marcin Siodelski
ISC

comment:5 follow-up: Changed 3 years ago by atan

Hello, I already use git version of Kea (see Version of the ticket). Because of 1.2 64KB limit. There was no socket problem with 1.2.

comment:6 Changed 3 years ago by atan

system info:

uname -a
Linux kea1 4.9.0-3-amd64 #1 SMP Debian 4.9.30-2+deb9u2 (2017-06-26) x86_64 GNU/Linux
cat /etc/debian_version 
9.0

comment:7 in reply to: ↑ 5 Changed 3 years ago by marcin

Replying to atan:

Hello, I already use git version of Kea (see Version of the ticket). Because of 1.2 64KB limit. There was no socket problem with 1.2.

Hi,

Apologies. somehow I missed the fact that you had already been using the latest version of Kea. I will try to reproduce the problem and let you know what we do next.

Marcin Siodelski
ISC

comment:8 Changed 3 years ago by marcin

I tried to reproduce this problem on my local Debian machine but with no luck. Nevertheless, In the past I have already seen issues with EAGAIN and EWOULDBLOCK error codes being signaled on Debian systems when one wouldn't expect them.

I hate providing fixes without being able to reproduce the issue and then verifying that they address it. However, from my visual code inspection I can tell that the code is doing a wrong thing anyway, and that it needs to be corrected.

Can I ask you to apply the following patch, rebuild Kea and see if the issue goes away?

From 96cec624d92f12e8d5412cd2075b3b1d332e75c1 Mon Sep 17 00:00:00 2001
From: Marcin Siodelski <marcin@isc.org>
Date: Fri, 21 Jul 2017 18:11:49 +0200
Subject: [PATCH] [5336] Socket async handlers can't be ran as long as
 EWOULDBLOCK or EAGAIN.

---
 src/lib/asiolink/unix_domain_socket.cc | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/src/lib/asiolink/unix_domain_socket.cc b/src/lib/asiolink/unix_domain_socket.cc
index 44accc3..f17ec2e 100644
--- a/src/lib/asiolink/unix_domain_socket.cc
+++ b/src/lib/asiolink/unix_domain_socket.cc
@@ -213,8 +213,10 @@ UnixDomainSocketImpl::sendHandler(const UnixDomainSocket::Handler& remote_handle
     if ((ec.value() == boost::asio::error::would_block) ||
         (ec.value() == boost::asio::error::try_again)) {
         doSend(buffer, remote_handler);
+
+    } else {
+        remote_handler(ec, length);
     }
-    remote_handler(ec, length);
 }
 
 void
@@ -246,8 +248,10 @@ UnixDomainSocketImpl::receiveHandler(const UnixDomainSocket::Handler& remote_han
     if ((ec.value() == boost::asio::error::would_block) ||
         (ec.value() == boost::asio::error::try_again)) {
         doReceive(buffer, remote_handler);
+
+    } else {
+        remote_handler(ec, length);
     }
-    remote_handler(ec, length);
 }
 
 void
-- 
2.10.2


Thanks,

Marcin Siodelski
ISC

comment:9 Changed 3 years ago by atan

Thanks, the patch fixed the the problem. At least i can now config-get, config-test and config-set without any errors. But config-write doesn't work as it should. It saves some minimal default configuration, not in memory configuration. I'll open a new ticket. If you don't need to test anything else, please close this ticket. I guess this fix will be pushed to Git soon.

comment:10 Changed 3 years ago by tomek

  • Milestone changed from Kea-proposed to Kea1.3

Accepting in 1.3 as discussed on 2017-07-27 call.

comment:11 Changed 3 years ago by tmark

  • Owner set to tmark
  • Status changed from new to assigned

comment:12 Changed 3 years ago by tmark

  • Add Hours to Ticket changed from 0 to 5
  • Owner changed from tmark to Unassigned
  • Status changed from assigned to reviewing
  • Total Hours changed from 0 to 5

I spent some time trying to reproduce the issue under Debian 7.8 on debian78-amd64-dhcp2 (149.20.57.78 at the moment), as my Debian VM needs updating. Like Marcin I was not able to do so, despite fetching and submitting configs larger than 64k, using kea-shell.

I applied the above patch and retested, and everything still works fine so I submitting this for review.

ChangeLog?:

12xx.   [bug]       marcin,tmark
    Corrected handling of EAGAIN and EWOULDLBOCK errors in UnixDomainSocket.
    This was causing intermittent command channel write errors in
    kea-ctrl-agent on some Debian systems.
    (Trac #5336, git TDB)

comment:13 Changed 3 years ago by marcin

In that case maybe I should simply merge this corrected code in?

comment:14 Changed 3 years ago by marcin

  • Resolution set to fixed
  • Status changed from reviewing to closed

Per my chat with Thomas on jabber I consider this code reviewed by Thomas and thus I am merging it.

comment:15 Changed 3 years ago by vicky

  • Milestone changed from Kea1.3 to Kea1.3 beta

Milestone renamed

Note: See TracTickets for help on using tickets.