Re: c++: dynamic_cast woes

From: Dimitry Andric <dim_at_FreeBSD.org>
Date: Tue, 08 Aug 2023 09:33:40 UTC
On 8 Aug 2023, at 02:20, Mark Millard <marklmi@yahoo.com> wrote:
> 
> On Aug 6, 2023, at 14:40, Christoph Moench-Tegeder <cmt@burggraben.net> wrote:
>> 
>> while updating our kicad port I'm facing major problems with
>> dynamic_cast on FreeBSD 13.2/amd64 - issues are seen with both base
>> clang and devel/llvm15, and I guess we're rather talking libc++ here.
>> Specifically, dynamic_cast fails when casting from a base to a derived
>> type ("downcast" as in C++ lingo?). I already know about "--export-dynamic"
>> but that does not help here - and as far as I read other platform's
>> build scripts, no other platform requires any kind of "special
>> handling" for dynamic_cast to work - what am I holding wrong here,
>> or what's missing from the picture?
>> 
>> But for the gory details: here's exhibit A, the code in question:
>> https://gitlab.com/kicad/code/kicad/-/blob/7.0/include/settings/settings_manager.h?ref_type=heads#L110
>> That m_settings is declared as
>> std::vector<std::unique_ptr<JSON_SETTINGS>> m_settings;
>> and contains objects of types derived from JSON_SETTINGS (there's
>> the intermediate type APP_SETTINGS_BASE, and specific types like
>> KICAD_SETTINGS etc.) The function GetAppSettings<KICAD_SETTING>() is
>> called, so that the find_if() in there should return that one
>> KICAD_SETTINGS object from m_settings as only that should
>> satisfy dynamic_cast - but in fact, no object is found.
>> That also happens if I unwind the find_if() and build a simple
>> for-loop (as one did, back in the last millenium).
>> 
>> If I point gdb at that m_settings (with "set print object on" and
>> "set print vtbl on"), I do find my KICAD_SETTINGS object smack
>> in the middle of m_settings:
>> 
>> (gdb) print *(m_settings[5])
>> $18 = (KICAD_SETTINGS) {<APP_SETTINGS_BASE> = {<JSON_SETTINGS> = {
>>     _vptr$JSON_SETTINGS = 0xed1578 <vtable for KICAD_SETTINGS+16>,
>> 
>> so the type info isn't totally lost here.
>> 
>> When testing this, CXXFLAGS passed via cmake are rather minimal,
>> like "-std=c++17 -O0 -fstack-protector-strong -fno-strict-aliasing
>> -Wl,--export-dynamic" (and cmake sprinkles some "-g" and a few
>> other standard flags into the mix), LDFLAGS ("CMAKE_...LINKER_FLAGS")
>> are set up similarly) (I have some inkling that these cmakefiles
>> in this project are not always very strict on compiling vs linking).
>> 
>> I had similar issues with dynamic_cast before, as witnessed here:
>> https://cgit.freebsd.org/ports/tree/cad/kicad/files/patch-job_use_dynamic_cast_for_updating
>> but now that I'm facing the current problem, I have a strong feeling
>> that my diagnosis back than was rather bullshit.
>> 
>> Help?
> 
> May be the following type of thing from
> 
> https://www.gnu.org/software/gcc/java/faq.html
> 
> might be relevant to your context? I can not
> tell from the description. (A different ABI
> could have differing details. But the below
> could be at least suggestive of considerations.)
> 
> 
> 
> dynamic_cast, throw, typeid don't work with shared libraries
> 
> The new C++ ABI in the GCC 3.0 series uses address comparisons, rather than string compares, to determine type equality. This leads to better performance. Like other objects that have to be present in the final executable, these std::type_info objects have what is called vague linkage because they are not tightly bound to any one particular translation unit (object file). The compiler has to emit them in any translation unit that requires their presence, and then rely on the linking and loading process to make sure that only one of them is active in the final executable. With static linking all of these symbols are resolved at link time, but with dynamic linking, further resolution occurs at load time. You have to ensure that objects within a shared library are resolved against objects in the executable and other shared libraries.
> 
>    • For a program which is linked against a shared library, no additional precautions are needed.
>    • You cannot create a shared library with the "-Bsymbolic" option, as that prevents the resolution described above.
>    • If you use dlopen to explicitly load code from a shared library, you must do several things. First, export global symbols from the executable by linking it with the "-E" flag (you will have to specify this as "-Wl,-E" if you are invoking the linker in the usual manner from the compiler driver, g++). You must also make the external symbols in the loaded library available for subsequent libraries by providing the RTLD_GLOBAL flag to dlopen. The symbol resolution can be immediate or lazy.
> 
> Template instantiations are another, user visible, case of objects with vague linkage, which needs similar resolution. If you do not take the above precautions, you may discover that a template instantiation with the same argument list, but instantiated in multiple translation units, has several addresses, depending in which translation unit the address is taken. (This is not an exhaustive list of the kind of objects which have vague linkage and are expected to be resolved during linking & loading.)
> 
> If you are worried about different objects with the same name colliding during the linking or loading process, then you should use namespaces to disambiguate them. Giving distinct objects with global linkage the same name is a violation of the One Definition Rule (ODR) [basic.def.odr].
> 
> For more details about the way that GCC implements these and other C++ features, please read the C++ ABI specification. Note the std::type_info objects which must be resolved all begin with "_ZTS". Refer to ld's documentation for a description of the "-E" & "-Bsymbolic" flags.

Yes, this is a typical problem when type info is replicated across dynamic library boundaries. The best thing to prevent this is to ensure that the key functions for a class (typically constructors and destructors) are only in one translation unit (object file), and that object file is only in one .so file.

-Dimitry