Opened 8 years ago

Closed 7 years ago

#1896 closed defect (duplicate)

logging continues to old rotated files

Reported by: jreed Owned by:
Priority: medium Milestone:
Component: logging Version:
Keywords: Cc:
CVSS Scoring: Parent Tickets:
Sensitive: no Defect Severity: Very High
Sub-Project: Core Feature Depending on Ticket:
Estimated Difficulty: 8 Add Hours to Ticket: 0
Total Hours: 0 Internal?: no

Description

I switched to log to file a couple weeks ago. Now multiple logs are written to.
The logging is not sequential. Due to log rotations, maybe it will remove new entries since logged to assumed-old files?

Here is the configuration:

> config show_json Logging
{"loggers": [{"debuglevel": 40, "severity": "DEBUG", "output_options": [{"output": "/var/log/bind10/bind10.log", "maxsize": 1048576, "maxver": 50, "destination": "file"}], "name": "*"}]}

Here are the file timestamps:

%ls -ltr /var/log/bind10/ ; date
total 49994
drwxr-xr-x  2 root  wheel      512 Mar 22 20:24 OLD
-rw-r--r--  1 root  wheel  1048611 Mar 31 07:46 bind10.log.50
-rw-r--r--  1 root  wheel  1048674 Apr  1 00:16 bind10.log.49
-rw-r--r--  1 root  wheel  1048597 Apr  1 03:18 bind10.log.46
-rw-r--r--  1 root  wheel  1048602 Apr  1 10:06 bind10.log.48
-rw-r--r--  1 root  wheel  1048614 Apr  1 22:50 bind10.log.43
-rw-r--r--  1 root  wheel  1048672 Apr  2 11:35 bind10.log.44
-rw-r--r--  1 root  wheel  1048634 Apr  2 18:22 bind10.log.41
-rw-r--r--  1 root  wheel  1048599 Apr  3 09:17 bind10.log.42
-rw-r--r--  1 root  wheel  1048617 Apr  3 13:53 bind10.log.39
-rw-r--r--  1 root  wheel  1048639 Apr  3 17:07 bind10.log.45
-rw-r--r--  1 root  wheel  1048672 Apr  3 22:54 bind10.log.40
-rw-r--r--  1 root  wheel    51479 Apr  4 05:37 output.log
-rw-r--r--  1 root  wheel  1048597 Apr  4 09:26 bind10.log.37
-rw-r--r--  1 root  wheel  1048608 Apr  5 05:00 bind10.log.34
-rw-r--r--  1 root  wheel     2187 Apr  5 05:16 bind10.log.47
-rw-r--r--  1 root  wheel        0 Apr  5 05:16 bind10.log.32
-rw-r--r--  1 root  wheel  1048653 Apr  5 10:14 bind10.log.35
-rw-r--r--  1 root  wheel  1048634 Apr  5 12:10 bind10.log.38
-rw-r--r--  1 root  wheel  1048634 Apr  6 00:32 bind10.log.33
-rw-r--r--  1 root  wheel  1048639 Apr  6 14:22 bind10.log.36
-rw-r--r--  1 root  wheel  1048611 Apr  6 20:04 bind10.log.29
-rw-r--r--  1 root  wheel  1048674 Apr  6 21:33 bind10.log.31
-rw-r--r--  1 root  wheel  1048597 Apr  7 15:37 bind10.log.27
-rw-r--r--  1 root  wheel  1048653 Apr  8 08:52 bind10.log.26
-rw-r--r--  1 root  wheel  1048608 Apr  8 11:10 bind10.log.25
-rw-r--r--  1 root  wheel  1048634 Apr  9 06:43 bind10.log.23
-rw-r--r--  1 root  wheel  1048703 Apr  9 11:33 bind10.log.28
-rw-r--r--  1 root  wheel  1048691 Apr  9 12:26 bind10.log.30
-rw-r--r--  1 root  wheel  1048674 Apr  9 20:11 bind10.log.24
-rw-r--r--  1 root  wheel  1048617 Apr 10 02:14 bind10.log.22
-rw-r--r--  1 root  wheel  1048597 Apr 10 21:47 bind10.log.18
-rw-r--r--  1 root  wheel  1048672 Apr 11 07:30 bind10.log.19
-rw-r--r--  1 root  wheel  1048608 Apr 11 17:19 bind10.log.17
-rw-r--r--  1 root  wheel  1048662 Apr 12 06:57 bind10.log.20
-rw-r--r--  1 root  wheel  1048639 Apr 12 08:48 bind10.log.21
-rw-r--r--  1 root  wheel  1048634 Apr 12 12:52 bind10.log.15
-rw-r--r--  1 root  wheel  1048653 Apr 12 18:50 bind10.log.16
-rw-r--r--  1 root  wheel  1048611 Apr 13 08:25 bind10.log.12
-rw-r--r--  1 root  wheel  1048597 Apr 14 03:58 bind10.log.10
-rw-r--r--  1 root  wheel  1048674 Apr 14 06:10 bind10.log.11
-rw-r--r--  1 root  wheel  1048608 Apr 14 23:32 bind10.log.9
-rw-r--r--  1 root  wheel  1048703 Apr 15 06:00 bind10.log.13
-rw-r--r--  1 root  wheel  1048586 Apr 15 12:40 bind10.log.14
-rw-r--r--  1 root  wheel  1048653 Apr 15 17:31 bind10.log.8
-rw-r--r--  1 root  wheel  1048634 Apr 15 19:05 bind10.log.7
-rw-r--r--  1 root  wheel  1048611 Apr 16 14:38 bind10.log.3
-rw-r--r--  1 root  wheel  1048672 Apr 17 04:51 bind10.log.4
-rw-r--r--  1 root  wheel  1048603 Apr 17 10:10 bind10.log.2
-rw-r--r--  1 root  wheel   885654 Apr 17 15:31 bind10.log.5
-rw-r--r--  1 root  wheel   870966 Apr 17 15:34 bind10.log.6
-rw-r--r--  1 root  wheel   317859 Apr 17 15:34 bind10.log.1
-rw-r--r--  1 root  wheel   288900 Apr 17 15:34 bind10.log
Tue Apr 17 15:34:28 UTC 2012

You can see when the files were rotated below:

%head -1 bind10.log.5 bind10.log.6 bind10.log.1 bind10.log
==> bind10.log.5 <==
2012-04-15 12:40:02.q DEBUG [b10-auth.datasrc] DATASRC_QUERY_NO_ZONE no zone containing 'ord.sns-pb.isc.org.' in class 'IN'

==> bind10.log.6 <==
2012-04-15 06:01:25.q INFO  [b10-xfrout.xfrout] XFROUT_NOTIFY_COMMAND received command to send notifies for bind10.isc.org./IN

==> bind10.log.1 <==
2012-04-17 04:51:29.q ERROR [b10-zonemgr.zonemgr] ZONEMGR_NO_MASTER_ADDRESS internal BIND 10 command did not contain address of master

==> bind10.log <==
2012-04-17 10:11:13.q DEBUG [b10-xfrin.datasrc] DATASRC_SQLITE_NEWCONN SQLite3Database is being initialized

Sorry I need to upgrade the log4cplus (see the .q timestamp). But see the mixed up logging:

%tail -3 bind10.log.5 bind10.log.6 bind10.log.1 bind10.log
==> bind10.log.5 <==
2012-04-17 15:34:52.q DEBUG [b10-auth.datasrc] DATASRC_QUERY_NO_ZONE no zone containing 'n10.isc.org.' in class 'IN'
2012-04-17 15:34:52.q DEBUG [b10-auth.datasrc] DATASRC_QUERY_NO_ZONE no zone containing 'ord.sns-pb.isc.org.' in class 'IN'
2012-04-17 15:34:52.q DEBUG [b10-auth.datasrc] DATASRC_QUERY_NO_ZONE no zone containing 'sfba.sns-pb.isc.org.' in class 'IN'

==> bind10.log.6 <==
2012-04-17 15:35:01.q INFO  [b10-xfrout.xfrout] XFROUT_NOTIFY_COMMAND received command to send notifies for bind10.isc.org./IN
2012-04-17 15:35:32.q INFO  [b10-xfrout.xfrout] XFROUT_NOTIFY_COMMAND received command to send notifies for bind10.isc.org./IN
2012-04-17 15:36:02.q INFO  [b10-xfrout.xfrout] XFROUT_NOTIFY_COMMAND received command to send notifies for bind10.isc.org./IN

==> bind10.log.1 <==
2012-04-17 15:35:32.q ERROR [b10-zonemgr.zonemgr] ZONEMGR_NO_MASTER_ADDRESS internal BIND 10 command did not contain address of master
2012-04-17 15:36:02.q DEBUG [b10-zonemgr.zonemgr] ZONEMGR_REFRESH_ZONE refreshing zone bind10.isc.org. (class IN)
2012-04-17 15:36:02.q ERROR [b10-zonemgr.zonemgr] ZONEMGR_NO_MASTER_ADDRESS internal BIND 10 command did not contain address of master

==> bind10.log <==
2012-04-17 15:36:02.q DEBUG [b10-xfrin.datasrc] DATASRC_SQLITE_CONNOPEN Opening sqlite database file '/var/bind10-devel/zone.sqlite3'
2012-04-17 15:36:02.q DEBUG [b10-xfrin.datasrc] DATASRC_SQLITE_DROPCONN SQLite3Database is being deinitialized
2012-04-17 15:36:02.q DEBUG [b10-xfrin.datasrc] DATASRC_SQLITE_CONNCLOSE Closing sqlite database

Above you can see that files are still written to.
You can see that different modules are associated to different file handles.

We need to somehow notify all modules when a log file is rotated. We can not have old files continue to be written to.

If I had different files per module I assume this problem wouldn't happen.

Also zero byte bind10.log.32 file looks bad. Did we lose something?

Subtickets

Change History (5)

comment:1 Changed 8 years ago by jreed

Here are the log files still open:

[jreed@n10 /var/log/bind10]$ fstat | egrep 'b10|python' | grep /var | grep -v " wd "
root     python     55552    1 /var     942098 -rw-r--r--   51479  w
root     python     55552    2 /var     942098 -rw-r--r--   51479  w
root     python     55552    4 /var     942089 -rw-r--r--       0  w
root     b10-resolver 55551    1 /var     942098 -rw-r--r--   51479  w
root     b10-resolver 55551    2 /var     942098 -rw-r--r--   51479  w
root     b10-resolver 55551    7 /var     942097 -rw-r--r--  1049336  w
root     python     55550    1 /var     942098 -rw-r--r--   51479  w
root     python     55550    2 /var     942098 -rw-r--r--   51479  w
root     python     55550    4 /var     942203 -rw-r--r--    3445  w
root     python     55549    1 /var     942098 -rw-r--r--   51479  w
root     python     55549    2 /var     942098 -rw-r--r--   51479  w
root     python     55549    5 /var     942197 -rw-r--r--  338277  w
root     python     55548    1 /var     942098 -rw-r--r--   51479  w
root     python     55548    2 /var     942098 -rw-r--r--   51479  w
root     python     55548    4 /var     942192 -rw-r--r--  881380  w
root     python     55547    1 /var     942098 -rw-r--r--   51479  w
root     python     55547    2 /var     942098 -rw-r--r--   51479  w
root     python     55547    8 /var     942097 -rw-r--r--  1049336  w
root     python     55546    1 /var     942098 -rw-r--r--   51479  w
root     python     55546    2 /var     942098 -rw-r--r--   51479  w
root     python     55546    5 /var     942198 -rw-r--r--  325800  w
root     b10-auth   55545    1 /var     942098 -rw-r--r--   51479  w
root     b10-auth   55545    2 /var     942098 -rw-r--r--   51479  w
root     b10-auth   55545   11 /var     942193 -rw-r--r--  909786  w
root     b10-auth   55545   12 /var     1012738 -rw-r--r--   31744 rw
root     python     55544    1 /var     942098 -rw-r--r--   51479  w
root     python     55544    2 /var     942098 -rw-r--r--   51479  w
root     python     55544    4 /var     942097 -rw-r--r--  1049336  w
root     python     55543    2 /var     942098 -rw-r--r--   51479  w
root     b10-sockcreator 55542    2 /var     942098 -rw-r--r--   51479  w
root     python     55541    1 /var     942098 -rw-r--r--   51479  w
root     python     55541    2 /var     942098 -rw-r--r--   51479  w
root     python     55541   11 /var     942097 -rw-r--r--  1049336  w
[jreed@n10 /var/log/bind10]$ fstat | egrep 'b10|python' | grep /var | grep -v " wd " | awk '{print $6}' | sort -u | while read inum ; do find /var/log/ -inum $inum -ls ; done
942089        0 -rw-r--r--    1 root             wheel                   0 Apr  5 05:16 /var/log/bind10/bind10.log.32
942098      104 -rw-r--r--    1 root             wheel               51479 Apr  4 05:37 /var/log/bind10/output.log
942192     1760 -rw-r--r--    1 root             wheel              881888 Apr 17 16:17 /var/log/bind10/bind10.log.6
942193     1824 -rw-r--r--    1 root             wheel              909786 Apr 17 16:15 /var/log/bind10/bind10.log.5
942197      704 -rw-r--r--    1 root             wheel              339273 Apr 17 16:17 /var/log/bind10/bind10.log.1
942198      672 -rw-r--r--    1 root             wheel              327600 Apr 17 16:17 /var/log/bind10/bind10.log
942203        8 -rw-r--r--    1 root             wheel                3445 Apr 17 16:10 /var/log/bind10/bind10.log.47

At least bind10.log.32 is explained ... it is:

root    55552  0.0  1.9 78276 20128   1- S    22Mar12   2:07.09 /usr/local/bin/python /usr/local/libexec/bind10-devel/b10-stats-httpd -v

I will open a new ticket for that ... #1897

comment:2 Changed 8 years ago by jreed

On May 1, a log4cplus developer said this problem has been reported and it should be fixed on trunk and upcoming 1.1.0 for FileAppender? and RollingFileAppender?. DailyRollingFileAppender? still has issues with synchronization between multiple processes.

comment:3 Changed 8 years ago by shane

  • Milestone changed from New Tasks to Next-Sprint-Proposed

Since 1.1.0 is not released yet, I guess we need to come up with some sort of workaround. We discussed some sort of shared lock between processes (file, mutex, whatever), which I think we need to add.

comment:4 Changed 8 years ago by jreed

The problem continues and the 1.1.0 is still not released.

These files still open:

%fstat | egrep 'b10|python' | grep /var | grep -v " wd " | awk '{print $6}' | sort -u | xargs -n 1 find /var/log/bind10 -inum | xargs ls -ltr
-rw-r--r--  1 root  wheel   422401 Jul 30 18:50 /var/log/bind10/output.log
-rw-r--r--  1 root  wheel  1049154 Jul 31 06:44 /var/log/bind10/bind10.log.47
-rw-r--r--  1 root  wheel      865 Aug  1 07:21 /var/log/bind10/bind10.log.42
-rw-r--r--  1 root  wheel     5106 Aug  1 13:33 /var/log/bind10/bind10.log.44
-rw-r--r--  1 root  wheel     8892 Aug  1 13:33 /var/log/bind10/bind10.log.43
-rw-r--r--  1 root  wheel   170087 Aug  1 14:19 /var/log/bind10/bind10.log.30
-rw-r--r--  1 root  wheel   523032 Aug  1 14:22 /var/log/bind10/bind10.log

comment:5 Changed 7 years ago by jreed

  • Resolution set to duplicate
  • Status changed from new to closed

This is duplicating #1622. So closing this one.

Note: See TracTickets for help on using tickets.