Opened 8 years ago

Closed 8 years ago

#1698 closed defect (fixed)

log initialization causes real fiasco for MacOS 10.7

Reported by: jinmei Owned by: stephen
Priority: medium Milestone: Sprint-20120320
Component: logging Version:
Keywords: Cc:
CVSS Scoring: Parent Tickets:
Sensitive: no Defect Severity: N/A
Sub-Project: Core Feature Depending on Ticket:
Estimated Difficulty: 5 Add Hours to Ticket: 0
Total Hours: 0 Internal?: no

Description

After upgrading to MacOS 10.7, the unittest for src/lib/cache starts
failing due to segfault. According to the dumped core, it seems to be
a typical symptom of the static initialization fiasco:

#0  0x00007fff95b6d950 in _malloc_initialize ()
#1  0x00007fff95b6e192 in malloc ()
#2  0x00007fff93a1768e in operator new ()
#3  0x00007fff93a0480a in std::string::_Rep::_S_create ()
#4  0x00007fff93a062fa in std::string::_S_construct<char const*> ()
#5  0x00007fff93a06412 in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string ()
#6  0x000000010a8b6d49 in isc::log::Logger::Logger ()
    at /Users/jinmei/src/isc/git/bind10/src/lib/log/logger.h:20
#7  0x000000010a8b6d49 in __static_initialization_and_destruction_0 [inlined]
    () at /Users/jinmei/src/isc/git/bind10/src/lib/cache/logger.cc:20
#8  0x000000010a8b6d49 in global constructors keyed to _ZN3isc5cache6loggerE ()
    at logger.cc:24

In fact, if I did this it worked again:

  • make cache/logger.cc empty (not to statically initialize 'logger')
  • add the following cache_messages.cc:
    isc::log::Logger&
    getLogger() {
        static const isc::log::MessageInitializer initializer(values);
        static isc::log::Logger logger("cache");
        return (logger);
    }
    
    and remove the static initialization of "initializer".
  • replace any reference to the 'logger' variable to a call to getLogger().

In my environment this type of fiasco only happened with this library,
but in theory there should be no reason it cannot happen for others.

I think it's now time to make the framework safer. The indirect
access via the proxy function like getLogger doesn't look nice, but
basically something like this is the only guaranteed way to avoid the
fiasco. We can only pray for luck with other ways, and,
unfortunately, it's proven we are not that lucky.

Subtickets

Change History (14)

comment:1 follow-up: Changed 8 years ago by stephen

It does look as if the crash is in the static initialization of Logger(). However, Logger uses the pimpl idiom, with the implementation class not instantiated until the first time Logger is used; all Logger holds is a string storing the name of the logger, and that is constructed when Logger is constructed. I'm surprised by the crash - I can only think that somehow the memory allocation code is not properly initialized at the time the program runs.

Does replacing:

std::string name_;

with something like

char name_[16];

... and copyying the name with which the logger is instantiated to name_ in the constructor solve the problem? (Obviously there would need to be appropriate checks as to the length of the string; however, most logger names are less than 15 characters long.)

With regards to

and remove the static initialization of "initializer".

"initializer" is more a namespace than anything else - the class MessageInitializer has no member attributes of its own. The constructor adds the values passed in the constructor argument to the singleton global message dictionary. The dictionary is obtained via a call to a function that holds it as a static variable (in much the way as the suggested getLogger() code does.) However, I will concede that if there are some operating systems where system libraries are not fully initialized before control is transferred to user code, all bets are off.

I'd be loath to remove the static initialization if we can avoid it because a long-term intention of the logger was to allow users to load their own messages at sratup time, replacing messages already there. Deferring the initialization of the message dictionary until the first time a message is logged means that we can't check that the IDs in the user message list are correct at startup.

On the other hand, the suggested code does couple the messages to the logger used to log them (cache_messages.cc is generated from cache_messages.mes) and given recent discussions of what documentation we give to users about what loggers they need to enable to get particular messages, that may be no bad thing.

We can only pray for luck with other ways, and, unfortunately, it's proven we are not that lucky.

I think that's a bit extreme - we test on a number of systems and have not had a problem, and none of our users have reported anything similar. And although C++ does have an issue with the static initialization fiasco, static initialization is a useful tool.

comment:2 in reply to: ↑ 1 Changed 8 years ago by jinmei

Replying to stephen:

Does replacing:

std::string name_;

with something like

char name_[16];

... and copyying the name with which the logger is instantiated to name_ in the constructor solve the problem?

Not tested this time, but probably not because MessageInitializer?
(also statically initialized) involves more complicated things that
require memory allocation:

    MessageDictionary& global = MessageDictionary::globalDictionary();
    // load() also needs a lot of memory allocation
    std::vector<std::string> repeats = global.load(values);

    // Append the IDs in the list just loaded (the "repeats") to the global list
    // of duplicate IDs.
    if (!repeats.empty()) {
        std::vector<std::string>& duplicates = getDuplicates();
        duplicates.insert(duplicates.end(), repeats.begin(), repeats.end());
    }

In fact, when I first hit this I tried to just use a proxy for the
"logger" object, and then find it crashed in the MessageInitializer?
initializer.

comment:3 Changed 8 years ago by jelte

  • Estimated Difficulty changed from 0 to 5

comment:4 Changed 8 years ago by jelte

  • Milestone changed from Next-Sprint-Proposed to Sprint-20120320

comment:5 follow-up: Changed 8 years ago by stephen

In fact, when I first hit this I tried to just use a proxy for the "logger" object, and then find it crashed in the MessageInitializer initializer.

As I don't have a Lion system on which to test, I've committed a possible solution to trac1698-test. The solution:

  • Stores the name passed to the Logger constructor in a pre-declared char array.
  • Stores a pointer to the array of messages passed to each MessageInitializer in another (pre-declared) array. The message IDs are loaded into the global dictionary once main() starts executing.

This should avoid use of heap storage during the execution of the constructors of statically-initialised objects, something seems to be the cause of the problem. (It should also avoid the need to use a call to getLogger() in the code, which should reduce the scope of the changes required.)

Last edited 8 years ago by stephen (previous) (diff)

comment:6 in reply to: ↑ 5 Changed 8 years ago by jinmei

I've not looked into the changes, but it indeed worked on my Lion box.

comment:7 Changed 8 years ago by stephen

  • Owner set to UnAssigned
  • Status changed from new to reviewing

In that case, please review the code in trac1698-test as the solution to the ticket. The proposed ChangeLog entry is:

xxx.	[bug]		stephen
	Fix crash on OS X 10.7 by altering logging so as not to allocate
        heap storage in the static initialization of logging objects.
        (Trac #1698, git xxx)

comment:8 Changed 8 years ago by jinmei

  • Owner changed from UnAssigned to jinmei

comment:9 Changed 8 years ago by jinmei

general

The proposed solution basically seems to be reasonable. But I'd
suggest increasing MAX_LOGGERS to even much larger value because it's
a hard limit. Maybe something like 1024, and the required memory
footprint is still just 8KB for 64-bit machines. I'd also suggest
this parameter can be modified at least at compile time. And I'd
suggest documenting this restriction (and possible way of extension)
in the header file.

changelog

Maybe "Mac OS (X) 10.7" instead of just "OS X"?

logger.h

  • Does MAX_LOGGER_NAME_SIZE have to be public?
  • Why did you use the enum hack for it? I thought we normally simply use a static const member variable in such cases.
  • Ideally we should do a death test to trigger the assertion in the Logger constructor. At least I'd add a test for the possible longest log name.
  • This note sounds a bit ambiguous:
        /// \note The name of the logger may be no longer than MAX_LOGGER_NAME_SIZE
    
    Does the "name" include the terminating nul character? According to the condition for assert, it does, and then this statement is correct, but I guess this statement would normally be interpreted as "assert(strlen(name) <= MAX_LOGGER_NAME_SIZE)". To be super clear I'd rephrase it to: "The name of the logger (including the terminating NUL character) may be no longer than MAX_LOGGER_NAME_SIZE".
  • While this strcpy is actually safe, I'd be even redundantly paranoid when we deal with this type of low-level dangerous API, e.g.:
        Logger(const char* name) : loggerptr_(NULL) {
            assert(std::strlen(name) < sizeof(name_));
            std::strncpy(name_, name, sizeof(name_));
            // strncpy ensures the following condition as long as name is valid:
            assert(name_[sizeof(name_) - 1] == '\0');
        }
    

message_initializer.h

  • This comment doesn't parse:
    /// -# The constructor a pointer to the values array to a pre-defined array
    ///    of pointers.
    
    "The constructor *adds* a pointer..."?
  • MessageInitializer constructor: If I understand it correctly, the caller must ensure that values array be valid at least until loadDictionary() is called (so for example it cannot be a temporary array defined within some function). In practice a misuse is probably unlikely to happen, but I think we should note that explicitly.

message_initializer.cc

  • typo: s/the a/a/?
    // 1) In the MessageInitializer constructor, the a pointer to the array of
    
  • nit: I don't thinks this is correct:
    //    messages is stored in a pre-defined array.  Since the MessageInitializers
    //    are declared external to a program unit, this takes place before main()
    //    is called.  As no heap storage is allocated in this process, we should
    
    This is not because MessageInitializer objects have external linkage (btw they actually don't even have external linkage because they are defined as const), but because they are initialized at a namespace level (i.e., "statically" in a translation unit).
  • Why is MAX_LOGGERS an int? I think size_t is better for this purpose:
    const int MAX_LOGGERS = 64;
    
  • Like MAX_LOGGER_NAME_SIZE, I'd add a test with a max number of initializers, and ideally a death test when they exceed the limit

misc

I made a few minor editorial changes directly to the branch.

comment:10 Changed 8 years ago by jinmei

  • Owner changed from jinmei to stephen

comment:11 follow-up: Changed 8 years ago by stephen

  • Owner changed from stephen to jinmei

The proposed solution basically seems to be reasonable. But I'd suggest increasing MAX_LOGGERS to even much larger value because it's a hard limit. Maybe something like 1024, and the required memory footprint is still just 8KB for 64-bit machines.

The limit of 64 was chosen on the basis that it's taken three years of development to get up to 26 messages files so it is unlikely that we will significantly increase that number in the remaining life of BIND 10. I've renamed the variable to MAX_MESSAGE_ARRAYS (see below) and upped the value to 256 (10 times what we have now) as a compromise between future expansion and wasted space.

I'd also suggest this parameter can be modified at least at compile time. And I'd suggest documenting this restriction (and possible way of extension) in the header file.

Done. I've also documented the restriction on logger name length (see below).

Maybe "Mac OS (X) 10.7" instead of just "OS X"?

I'll use "Mac OS X" (as used in the Wikipedia entry) as I don't think any literature places the "X" in brackets.

Does MAX_LOGGER_NAME_SIZE have to be public?

It can be private, but I've left it as public for cases where a logger name is constructed and run-time and the logger created dynamically. It allows a check to be made on the name and the caller to take action if the name is too long instead of blindly triggering an assertion failure. (It's also useful to know this length in the test code.)

Why did you use the enum hack for it? I thought we normally simply use a static const member variable in such cases.

I'm not up to date with such modern thinking :-) Changed.

Does the "name" include the terminating nul character?

It did but when I looked at it again I realised that the name of the symbol was incorrect. MAX_LOGGER_NAME_SIZE is now the maximum length of the name excluding the trailing null; the name_ member is declared with a length of "MAX_LOGGER_NAME_SIZE + 1".

While this strcpy is actually safe, I'd be even redundantly paranoid when we deal with this type of low-level dangerous API, e.g.:

It does no harm, so I've changed it as you suggested.

Ideally we should do a death test to trigger the assertion in the Logger constructor. At least I'd add a test for the possible longest log name.

Done.

This comment doesn't parse

Isn't it amazing how what you type often bears no relationship to what you think you've typed? :-) Corrected.

MessageInitializer constructor: If I understand it correctly, the caller must ensure that values array be valid at least until loadDictionary() is called (so for example it cannot be a temporary array defined within some function). In practice a misuse is probably unlikely to happen, but I think we should note that explicitly.

Good point, comment added.

typo: s/the a/a/?

Corrected.

I don't think this is correct:

I used the term "external" to mean "outside" and not "external linkage", but I can see that this is confusing. (I'm trying to distinguish between a static variable declared inside a function, which is initialised when the function is called and one declared outside a function, which is initialized before main() is called.) I've rewritten the comment to remove the ambiguity.

Why is MAX_LOGGERS an int? I think size_t is better for this purpose:

Agreed - changed.

I've changed the name of the constant to MAX_MESSAGE_ARRAYS as better describing its purpose and, for the purposes of testing, made it public within the MessageInitializer class.

Like MAX_LOGGER_NAME_SIZE, I'd add a test with a max number of initializers, and ideally a death test when they exceed the limit

Done. As part of this I needed to add a getPendingCount() method and split the MessageInitializer tests into separate programs to avoid one test affecting the starting conditions of the next.

comment:12 in reply to: ↑ 11 Changed 8 years ago by jinmei

Replying to stephen:

The branch now looks almost okay for merge.

I've fixed a few minor points directly in the branch, and have some
more additional comments (below). I don't think we need another
review cycle for them, so after addressing these please feel free to
merge.

Maybe "Mac OS (X) 10.7" instead of just "OS X"?

I'll use "Mac OS X" (as used in the Wikipedia entry) as I don't think any literature places the "X" in brackets.

By using parentheses I meant "either 'Mac OS 10.7' or 'Mac OS X
10.7'.

Other comments:

  • the newly created test .cc files (e.g. message_initializer_1_unittest.cc)
    • the Copyright year should be 2012?
    • we should probably order the header files as we agreed before (internal first, standard last)
  • message_initializer.h, getPendingCount(): "the message of message arrays" doesn't seem to be correct. You meant "the array of message arrays" or something?
        /// Returns a count of the message of message arrays that have been
        /// registered with this class and will be loaded with the next call
        /// to loadDictionary().
    
  • tests/Makefile.am: I think we normally use AM_CPPFLAGS, etc for things like COMMON_CPPFLAGS, etc. (But I don't insist).
  • about death tests: it may not be available for all buildbots. I see some existing death tests were ifdef'ed out. we should probably do the same.

comment:13 Changed 8 years ago by jinmei

  • Owner changed from jinmei to stephen

comment:14 Changed 8 years ago by stephen

  • Resolution set to fixed
  • Status changed from reviewing to closed

Merged in command a8e53be7039ad50d8587c0972244029ff3533b6e

Note: See TracTickets for help on using tickets.