Opened 7 years ago

Closed 6 years ago

#3047 closed defect (worksforme)

Use of Kea memfile lease database causes SigAbrt exception

Reported by: jwright Owned by:
Priority: low Milestone: Kea0.9
Component: database-all Version:
Keywords: Cc:
CVSS Scoring: Parent Tickets:
Sensitive: no Defect Severity: Medium
Sub-Project: DHCP Feature Depending on Ticket:
Estimated Difficulty: 0 Add Hours to Ticket: 0
Total Hours: 0 Internal?: no

Description

Please see following Jabber chat log:

1:38:20 PM jeff: Have you guys seen this in the BIND10 logfile?

2013-07-03 18:54:10.109 ERROR [b10-init.init/5090] BIND10_COMPONENT_FAILED component b10-dhcp4 (pid 5677) failed: process terminated with exit status 6 (killed by signal 6: SIGABRT)

It seems to occur every once in a while.
1:52:10 PM jeff: I think debian 6 is dead.
1:52:11 PM jeff: 149.20.57.86
1:52:33 PM jeff: No answer to pings from inside lab; and all established SSH sessions are unresponsive.
1:55:46 PM TomM: Time to pull back and nuke the planet.
1:59:46 PM jeff: I have to way to remote-boot that puppy. I think TomM's latest Kea checkin caused a kernel panic.
2:00:23 PM TomM: let's see, I know I have the GPS coords for Jeff's new place.... where did I write those down....
2:01:40 PM TomM: Jeff, if you hear a high pitched whine soon, please ignore it... it's nothing
2:02:48 PM jeff: Actually, I *DID* have a way to reboot it and I did so. Thanks, Remote Desktop Client for Mac!
2:24:23 PM tomek: Kea doing SIGABRT may be a problem.
2:24:32 PM tomek: I have never seen this before.
2:44:43 PM jeff: I think I would like to file a bug ticket against this issue if it reoccurs.
2:45:09 PM jeff: Yes, it continues:
2:45:10 PM jeff: >::reference boost::shared_ptr< <template-parameter-1-1> >::operator*() const [with T = isc::dhcp::ClientId?]: Assertion `px != 0' failed.

2013-07-03 20:08:11.872 ERROR [b10-init.init/2000] BIND10_COMPONENT_FAILED component b10-dhcp4 (pid 2007) failed: process terminated with exit status 6 (killed by signal 6: SIGABRT)

with T = isc::dhcp::ClientId?]: Assertion `px != 0' failed.

2013-07-03 20:13:27.035 ERROR [b10-init.init/2000] BIND10_COMPONENT_FAILED component b10-dhcp4 (pid 2143) failed: process terminated with exit status 6 (killed by signal 6: SIGABRT)

with T = isc::dhcp::ClientId?]: Assertion `px != 0' failed.

2013-07-03 20:20:06.906 ERROR [b10-init.init/2000] BIND10_COMPONENT_FAILED component b10-dhcp4 (pid 2146) failed: process terminated with exit status 6 (killed by signal 6: SIGABRT)

2:53:43 PM TomM: looks like there's still a condition that allows a null client id
2:54:26 PM TomM: tomek you are fibbing when you say you have never seen this before ;)
2:55:45 PM jeff: Twice now the debian6 server has hung
2:55:59 PM jeff: I think this might be my first "real" bug find. :-)
2:56:18 PM jeff: What kind of info should I be gathering now to help us diagnose this?
2:58:42 PM tomek: The configuration used and traffic capture of a packet (or packets) that cause this.
3:00:34 PM marcin: I recall that the issues with NULL client id are still present when using Memfile backend.
3:00:50 PM marcin: When using MySQL it shouldn't be the case.
3:01:46 PM marcin: Do you actually use MySQL? when hitting assertion failure?
3:09:11 PM jeff: I do have the build configuired for mySQL backend
3:09:35 PM jeff: I am not even sending DHCP traffic so it's not ANVL packets…although it might be other random packet traffic.
3:10:06 PM jeff: How do you want me to capture the config? Show bindctl commands? Dump bind10-cfg.db?
3:10:27 PM jeff: Would any syslogs be helpful?
3:10:31 PM TomM: if I recall, Jeff, there are Windows machines that periodically make IPv4 requests on on the DHCP lab setup
3:10:56 PM jeff: Those Windows clients are disabled
3:11:20 PM TomM: curious
3:11:51 PM jeff: Well, I *THINK* those are disabled. Let me show you a screenie.
3:12:16 PM TomM: hey I don't want to see you screenie! put that thing away!
3:12:25 PM marcin: I doubt that you can hit this error without traffic at all
3:12:26 PM jeff: Just one peek TomM
3:12:36 PM TomM: GAGGHH!! RUN AWAY!
3:12:37 PM jeff: (being sent in jabber file xfer now)
3:13:15 PM jeff: If there is an alternative place to xfer files let me know
tomek left the room. (3:14:17 PM)
3:14:27 PM TomM: just sez I have one event....
3:14:36 PM TomM: I use Psi for client
3:14:59 PM TomM: nevermind, I got it
3:15:39 PM TomM: no I didn't
3:15:45 PM TomM: error'd out
3:19:37 PM jeff: Yeah it erroed out.
3:19:39 PM jeff: Let me try again
3:21:11 PM jeff: OK well that's a big waste of time
3:21:15 PM jeff: I'll email it to you
3:21:19 PM TomM: yep
3:21:46 PM TomM: what would be better would the the bind10 log output
3:21:55 PM jeff: What else?
3:22:05 PM jeff: Here is how I confogured everything:
3:22:28 PM TomM: you could include the config db
3:22:32 PM jeff: https://confluence-int.isc.org/display/QUAL/DHCP+Testing
3:22:49 PM jeff: I'm going to enter a bug ticket and include the logfiles from dhcpsrv and dhcp4
3:22:57 PM jeff: as well as the bind10 config db
3:23:34 PM marcin: Which machine is it?
3:23:40 PM marcin: debian6?
3:25:39 PM jeff: debian6-64-1-dhcp
3:25:52 PM jeff: (149.20.57.86)
3:25:54 PM marcin: Where is the bind installed?
3:26:10 PM jeff: /home/test/var/test-anvl
3:26:27 PM jeff: I just had to reboot the VM because it was not responding to ICMP Requests
3:26:28 PM marcin: ok, let me login and have a quick look on the configuration.
3:26:54 PM jeff: BIND10 is not running after reboot.
3:27:09 PM marcin: I will kick it of
3:27:54 PM jeff: cmd line:
3:28:12 PM jeff: sudo /home/test/var/test-anvl/sbin/bind10 &> /home/test/var/test-anvl/bind10.log
3:30:43 PM marcin: MySQL is not configured
3:31:00 PM jeff: Not true. I built as follows:
3:31:12 PM marcin: It is true according to bindctl
3:31:21 PM marcin: Building is one thing
3:31:26 PM jeff: sudo /home/test/var/test-anvl/sbin/bind10 &> /home/test/var/test-anvl/bind10.log
3:31:32 PM marcin: configuration is another thing
3:31:59 PM TomM: if someone could review 3030, it correct the build issue
3:32:02 PM marcin: Dhcp4/lease-database/type "" string (default)
3:32:05 PM TomM: its quite small
3:32:16 PM TomM: good catch, marcin
3:32:47 PM marcin: other things are name, user, password, host
3:33:10 PM marcin: they also have to be set to credentials you have added to MySQL
3:33:30 PM TomM: rtfm
3:33:40 PM TomM: ;)
3:34:16 PM marcin: I could add it now, but it will be better for you to do it to keep an eye on what is going on in your config
3:34:50 PM marcin: I now stopped bind and logged off
3:35:58 PM marcin: Tom, in the course of #2977 I found a little issue with d_controller_unittests
3:36:27 PM TomM: is this the logger init thing?
3:36:32 PM marcin: yes
3:37:05 PM TomM: what precisely is the issue?
3:37:27 PM TomM: you know I stole that line from DHCP4/DHCP6....
3:37:39 PM marcin: yes and now
3:37:47 PM marcin: DHCP4 doesn't test main()
3:37:52 PM TomM: ah
3:38:08 PM marcin: D2 tests launch function which does initLogger(blablabla)
3:38:14 PM TomM: right
3:38:15 PM marcin: This should not happen
3:38:25 PM TomM: what should not happen?
3:38:44 PM TomM: initLogger should not be called?
3:38:51 PM marcin: initLogger(...) should not be called when running unit tests
3:39:04 PM marcin: it shouldn't be called in launch()
3:39:14 PM marcin: because it is already called in main() of unit tests
3:39:23 PM TomM: OH
3:39:26 PM marcin: in a way that it should be run for unit tests
3:39:32 PM marcin: (without any arguments)
3:40:01 PM marcin: But, if I remove initLogger in launch I hit worse problem
3:40:10 PM TomM: then I will need to add a command line argument that skips it
3:40:11 PM marcin: crash in commandHandler
3:41:06 PM marcin: commandHandler has this second argument called "args"
3:41:15 PM TomM: right
3:41:17 PM marcin: which seems to be NULL
3:41:32 PM TomM: let me look at the code
3:42:23 PM marcin: The rest is a speculation so far...
3:42:37 PM TomM: let me catch up
3:42:41 PM marcin: genShutdownCommand in unit tests is responsible for this
3:42:59 PM jeff: Hold on guys - talking with my attorney - brb
3:43:01 PM marcin: sorry, genShutdownCallback
3:43:19 PM TomM: oh, because args isn't being checked before being handed to the log statement
3:43:37 PM marcin: well, getShutdownCallback passes empty argyment
3:43:45 PM marcin: NULL argument I should say
3:43:50 PM marcin: arg_set
3:43:55 PM TomM: if it passed in an empty ELementPtr it would run fine
3:44:22 PM marcin: isc::data::ElementPtr? arg_set;
3:44:27 PM marcin: makes arg_set NULL
3:44:47 PM marcin: and in the handler you do arg_set->str()
3:44:55 PM TomM: right
3:44:57 PM marcin: the -> operator causes assertion
3:45:11 PM TomM: yes, the log statement should test it
3:45:18 PM TomM: and adjust accordingly
3:45:42 PM marcin: I am wondering what will be the most elegant solution
3:45:42 PM TomM: (arg_set ? arg_set->str() : "no args")
3:46:02 PM marcin: yes, maybe that should be done
3:46:06 PM TomM: yes
3:46:10 PM TomM: it should be
3:46:16 PM marcin: and maybe we need to identify some more places like this
3:46:23 PM TomM: perhaps
3:46:34 PM marcin: I can do it together with 27977
3:46:37 PM TomM: tyring to figure out why I haven't hit this before
3:46:38 PM marcin: sorry 2977
3:47:06 PM marcin: Well, initialization of the logger has something to do with it, but I don't know why
3:47:19 PM marcin: and I don't know if I will find today :-)
3:47:38 PM marcin: s/find/find out
stephen left the room. (3:47:57 PM)
3:48:00 PM TomM: launch calls the init logger but launch is the first place the command line arguments are examined...
3:48:08 PM TomM: which is as it should be
3:48:19 PM TomM: s/but/because/
3:48:42 PM TomM: I did not want app specific things to be in main()
3:48:47 PM marcin: it would be ok if we added an extra argument to launch
3:48:59 PM TomM: yes
3:48:59 PM marcin: this argument would indicate that you're running in the test mode
3:49:12 PM marcin: and it would be used by unit tests only
3:49:15 PM TomM: yes
3:49:21 PM marcin: it could have a default value = false
3:49:25 PM TomM: it could default to false
3:49:27 PM TomM: lol
3:49:29 PM marcin: but it doesn't have to
3:49:48 PM TomM: in this case I would say it should because it is a test time only thing
3:50:12 PM TomM: might be handy for other things too...
3:50:15 PM marcin: yes, I agree but I don't know if you like defaults :P
3:50:24 PM TomM: case by case basis
3:50:42 PM marcin: in this case, let me fix it
3:50:54 PM marcin: as we just discussed it
3:51:11 PM TomM: ok
3:51:24 PM marcin: It will be added together with #2977
3:51:25 PM jeff: SO (and sorry for being absent) - this is a known issue?
3:51:33 PM marcin: which one?
3:51:35 PM jeff: I have not yet entered a new ticket
3:51:40 PM jeff: My shutdown
3:51:50 PM marcin: your shutdown is a known issue for me
3:51:53 PM jeff: The error message in BIND10 log
3:52:01 PM marcin: I am not sure if there is any ticket yet
3:52:19 PM jeff: I would be happy to put one in, or if you prefer, you can do it
3:52:30 PM TomM: the null client id "crash" is known and we had a ticket but as marcin pointed out it addressed MySQL only
3:52:37 PM jeff: BTW I love talking to attorneys on the phone for $380.00 per hour.
3:53:12 PM jeff: SO clue me in - even though I specified --mysql in the config, and I created the SQL "qa" database to use...
3:53:14 PM TomM: look if you have nothing better to do with your time and money, I'll talk to you and I only charge you $300 an hour
3:53:17 PM jeff: I'm not using SQL?
3:53:37 PM TomM: you have to set the database type
3:53:43 PM marcin: you're not using SQL and you should
3:53:56 PM TomM: in the bind10 config
3:53:56 PM marcin: otherwise you will loose your leases when server is off
3:53:58 PM marcin: :P
3:54:02 PM jeff: COuld you point me to the part of the user guide that says how to do that please
3:54:03 PM TomM: its right there in the manual
3:54:08 PM jeff: What section
3:54:13 PM TomM: 1 sec
3:54:30 PM jeff: I read it last night at 1130pm and I do admit it was very interesting.
3:54:31 PM jeff: Not.
3:55:10 PM TomM: 17.2.1 Database Configuration
3:55:15 PM marcin: http://bind10.isc.org/docs/bind10-guide.html#idm66502448
3:55:18 PM marcin: enjoy!
3:55:46 PM jeff: I found it
3:55:52 PM jeff: > config set Dhcp4/lease-database/type "mysql"
3:55:56 PM jeff: From Section 17
3:56:03 PM TomM: that'll $450.00 ... I'll send you a bill
3:56:31 PM jeff: Bill it to "Kea QA"
3:56:32 PM marcin: apart from setting a type of the database, you will have to setup user, password, host
3:56:46 PM jeff: Sorry I missed that part
3:56:55 PM jeff: Let me make absolutely certain:
3:57:11 PM TomM: were you reading it or just looking at the pretty pictures?
3:57:13 PM jeff: I am NOT going to enter a ticket, either for the error message in the log, or for the assert (debian reboot)
3:57:15 PM jeff: OK??
3:57:45 PM jeff: At 1130pm I think I read "SQL" and said to myself "I already did that, man."
3:57:45 PM marcin: IMHO, you shouldn't
3:58:05 PM jeff: OK then I will add the SQL config to my instructions on the COnfluence page, and retry.
3:58:16 PM jeff: THanks you for your time, gentlemen.
3:58:21 PM jeff: Oh, and TomM too.
3:58:48 PM marcin: nop
3:58:59 PM TomM: you betcha!
3:59:07 PM TomM: next time RTFM
3:59:08 PM TomM: lol
3:59:57 PM jeff: I guess in my case, I need to RTFM several times for comprehension.
4:00:27 PM TomM: or do your beer drinking after not during
4:00:42 PM TomM: lol
4:00:53 PM TomM: speaking of beer.... I need a beer
4:01:10 PM TomM: all this broken build chasing has left me thirsty
4:02:50 PM TomM: marcin, are you going to work much longer or are you thru?
4:03:13 PM marcin: What do you need? :P
4:03:15 PM marcin: a review?
4:03:27 PM TomM: well I was thinking you could look over 3030
4:03:38 PM TomM: its quite small and fixes the build break
4:03:41 PM marcin: let me check how many kilo-lines of code
4:03:53 PM TomM: however, I know it is quite late for you
4:04:26 PM TomM: and I can recruit someone from bind10 perhaps
4:05:52 PM TomM: its a follow on to 3007
4:06:23 PM marcin: Do we really want #if 0?
4:06:32 PM TomM: crap where is that?
4:06:39 PM TomM: must have missed one
4:06:39 PM marcin: ncr_msg.cc
4:06:43 PM TomM: damnit
4:06:44 PM marcin: a couple
4:07:36 PM TomM: no, of course we do not want them
4:07:41 PM TomM: Jeff distracted me
4:08:05 PM marcin: Oh we have time utils in bind10
4:08:08 PM marcin: nice :-)
4:08:54 PM TomM: yes we do
4:09:23 PM TomM: after you pointed out the wisdom of using such things... I went looking for it
4:10:23 PM marcin: hehe, as long as it doesn't affect performance too heavily
4:10:48 PM TomM: oh, it will be slower than honey flowing uphill in Gdansk in January
4:11:03 PM TomM: ;)
4:11:06 PM marcin: :-)
4:11:40 PM TomM: normally we would say "molasses" but I didn't figure you'd know what that is...
4:12:34 PM TomM: I have removed the #if 0s and it still runs/passes unit tests... have not checked that in yet
4:13:37 PM marcin: otherwise it is ok
4:13:45 PM marcin: I can make a comment in the ticket
4:13:58 PM TomM: thank you very much
4:17:18 PM marcin: done
4:17:25 PM marcin: anything else sir?
4:17:39 PM TomM: no, sir
4:18:27 PM marcin: ok, in this case I think I need to go now and do outstaning houseworks
4:18:58 PM TomM: i'd go have a beer if I were you
4:19:13 PM marcin: Usually, this is what I do
4:19:27 PM marcin: but this week, my wife has "angina"
4:19:38 PM marcin: and she is in bed since Sunday
4:19:54 PM marcin: I am pretty much doing everything nowadays
4:20:46 PM TomM: ah, well I hope she feels better soon!
4:20:55 PM TomM: and you need to make sure you get rest too
4:20:59 PM marcin: Yes, me too, otherwise I will get crazy
4:21:10 PM TomM: crazIER
4:21:14 PM marcin: Working and taking care of kids and shopping
4:21:20 PM TomM: yep
4:21:20 PM marcin: in the same time
4:21:27 PM TomM: that will make you nuts
4:21:56 PM marcin: that's also pretty much why I stay so late working
4:22:03 PM marcin: I can finally focus
4:22:06 PM marcin: :P
4:22:10 PM TomM: ah
4:22:34 PM marcin: but it has pitfalls too
4:22:52 PM marcin: If I go to sleep at 1AM I will be woken up by kids at 6AM anyway
4:23:04 PM TomM: lovely, isn't it?
4:23:05 PM marcin: that's what happend yesterday
4:23:15 PM marcin: :-)
4:23:32 PM marcin: anyways, good night gentlemen
4:23:39 PM TomM: good night!

Subtickets

Change History (6)

comment:1 Changed 7 years ago by shane

  • Milestone changed from New Tasks to DHCP Outstanding Tasks

comment:2 Changed 6 years ago by tomek

  • Component changed from dhcp4 to dhcpdb
  • Milestone changed from DHCP Outstanding Tasks to DHCP-Kea-proposed
  • Sub-Project changed from DNS to DHCP

After recent feedback from the Board, memfile importance has increased. Let's reevaluate this ticket.

comment:3 Changed 6 years ago by tomek

  • Milestone changed from DHCP-Kea-proposed to DHCP-Kea0.9-alpha

Moving tasks related to additional backends to Kea0.9-alpha (as discussed with Marcin, Stephen, Vicky and Jeff in London).

comment:4 Changed 6 years ago by tomek

  • Priority changed from medium to low

We need to check if the issue even reproduces after 8 months. There were some fixed in memfile during that time. Lowering priority.

comment:5 Changed 6 years ago by stephen

  • Milestone changed from DHCP-Kea0.9-alpha to DHCP-Kea0.9-beta

comment:6 Changed 6 years ago by tomek

  • Resolution set to worksforme
  • Status changed from new to closed

There are extensive Forge tests being run and there is no sign of SigAbort?. This ticket does not have any details regarding issue reproduction. There were lots of improvements done in memfile in the last 9 months (including many fixes and implementation of disk storage).

Closing ticket.

Note: See TracTickets for help on using tickets.