llvm-capstone/lld/MachO/Dwarf.cpp

//===- DWARF.cpp ----------------------------------------------------------===//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//

#include "Dwarf.h"
#include "InputFiles.h"
#include "InputSection.h"
#include "OutputSegment.h"

#include <memory>

using namespace lld;
using namespace lld::macho;
using namespace llvm;

std::unique_ptr<DwarfObject> DwarfObject::create(ObjFile *obj) {
  auto dObj = std::make_unique<DwarfObject>();
  bool hasDwarfInfo = false;
  // LLD only needs to extract the source file path from the debug info, so we
  // initialize DwarfObject with just the sections necessary to get that path.
  // The debugger will locate the debug info via the object file paths that we
  // emit in our STABS symbols, so we don't need to process & emit them
  // ourselves.
  for (const InputSection *isec : obj->debugSections) {
    if (StringRef *s =
            StringSwitch<StringRef *>(isec->name)
                .Case(section_names::debugInfo, &dObj->infoSection.Data)
                .Case(section_names::debugAbbrev, &dObj->abbrevSection)
                .Case(section_names::debugStr, &dObj->strSection)
                .Default(nullptr)) {
      *s = toStringRef(isec->data);
      hasDwarfInfo = true;
    }
  }

  if (hasDwarfInfo)
    return dObj;
  return nullptr;
}
[lld-macho] Emit STABS symbols for debugging, and drop debug sections Debug sections contain a large amount of data. In order not to bloat the size of the final binary, we remove them and instead emit STABS symbols for `dsymutil` and the debugger to locate their contents in the object files. With this diff, `dsymutil` is able to locate the debug info. However, we need a few more features before `lldb` is able to work well with our binaries -- e.g. having `LC_DYSYMTAB` accurately reflect the number of local symbols, emitting `LC_UUID`, and more. Those will be handled in follow-up diffs. Note also that the STABS we emit differ slightly from what ld64 does. First, we emit the path to the source file as one `N_SO` symbol instead of two. (`ld64` emits one `N_SO` for the dirname and one of the basename.) Second, we do not emit `N_BNSYM` and `N_ENSYM` STABS to mark the start and end of functions, because the `N_FUN` STABS already serve that purpose. @clayborg recommended these changes based on his knowledge of what the debugging tools look for. Additionally, this current implementation doesn't accurately reflect the size of function symbols. It uses the size of their containing sectioins as a proxy, but that is only accurate if `.subsections_with_symbols` is set, and if there isn't an `N_ALT_ENTRY` in that particular subsection. I think we have two options to solve this: 1. We can split up subsections by symbol even if `.subsections_with_symbols` is not set, but include constraints to ensure those subsections retain their order in the final output. This is `ld64`'s approach. 2. We could just add a `size` field to our `Symbol` class. This seems simpler, and I'm more inclined toward it, but I'm not sure if there are use cases that it doesn't handle well. As such I'm punting on the decision for now. Reviewed By: clayborg Differential Revision: https://reviews.llvm.org/D89257 2020-12-01 22:45:01 +00:00			`//===- DWARF.cpp ----------------------------------------------------------===//`
			`//`
			`// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.`
			`// See https://llvm.org/LICENSE.txt for license information.`
			`// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception`
			`//`
			`//===----------------------------------------------------------------------===//`

			`#include "Dwarf.h"`
			`#include "InputFiles.h"`
			`#include "InputSection.h"`
			`#include "OutputSegment.h"`

			`#include <memory>`

			`using namespace lld;`
			`using namespace lld::macho;`
			`using namespace llvm;`

			`std::unique_ptr<DwarfObject> DwarfObject::create(ObjFile *obj) {`
			`auto dObj = std::make_unique<DwarfObject>();`
			`bool hasDwarfInfo = false;`
[lld-macho] Don't attempt to emit rebase opcodes for debug sections This was causing a crash as we were attempting to look up the nonexistent parent OutputSection of the debug sections. We didn't detect it earlier because there was no test for PIEs with debug info (PIEs require us to emit rebases for X86_64_RELOC_UNSIGNED). This diff filters out the debug sections while loading the ObjFiles. In addition to fixing the above problem, it also lets us avoid doing redundant work -- we no longer parse / apply relocations / attempt to emit dyld opcodes for these sections that we don't emit. Fixes llvm.org/PR48392. Reviewed By: thakis Differential Revision: https://reviews.llvm.org/D92904 2020-12-09 01:47:19 +00:00			`// LLD only needs to extract the source file path from the debug info, so we`
			`// initialize DwarfObject with just the sections necessary to get that path.`
			`// The debugger will locate the debug info via the object file paths that we`
			`// emit in our STABS symbols, so we don't need to process & emit them`
			`// ourselves.`
[lld-macho][NFC] add const to pointer/reference induction variables of range-based for loops Pointer and reference induction variables of range-based for loops are often const, and code authors often lax about qualifying them. Differential Revision: https://reviews.llvm.org/D98317 2021-03-10 05:41:34 +00:00			`for (const InputSection *isec : obj->debugSections) {`
[lld-macho][NFC] define more strings in section_names:: and segment_names:: As preparation for a subsequent diff that implements builtin section renaming, define more `constexpr` strings in namespaces `lld::macho::segment_names` and `lld::macho::section_names`, and use them to replace string literals. Differential Revision: https://reviews.llvm.org/D101393 2021-04-27 19:22:44 +00:00			`if (StringRef *s =`
			`StringSwitch<StringRef *>(isec->name)`
			`.Case(section_names::debugInfo, &dObj->infoSection.Data)`
			`.Case(section_names::debugAbbrev, &dObj->abbrevSection)`
			`.Case(section_names::debugStr, &dObj->strSection)`
			`.Default(nullptr)) {`
[lld-macho] Don't attempt to emit rebase opcodes for debug sections This was causing a crash as we were attempting to look up the nonexistent parent OutputSection of the debug sections. We didn't detect it earlier because there was no test for PIEs with debug info (PIEs require us to emit rebases for X86_64_RELOC_UNSIGNED). This diff filters out the debug sections while loading the ObjFiles. In addition to fixing the above problem, it also lets us avoid doing redundant work -- we no longer parse / apply relocations / attempt to emit dyld opcodes for these sections that we don't emit. Fixes llvm.org/PR48392. Reviewed By: thakis Differential Revision: https://reviews.llvm.org/D92904 2020-12-09 01:47:19 +00:00			`*s = toStringRef(isec->data);`
			`hasDwarfInfo = true;`
[lld-macho] Emit STABS symbols for debugging, and drop debug sections Debug sections contain a large amount of data. In order not to bloat the size of the final binary, we remove them and instead emit STABS symbols for `dsymutil` and the debugger to locate their contents in the object files. With this diff, `dsymutil` is able to locate the debug info. However, we need a few more features before `lldb` is able to work well with our binaries -- e.g. having `LC_DYSYMTAB` accurately reflect the number of local symbols, emitting `LC_UUID`, and more. Those will be handled in follow-up diffs. Note also that the STABS we emit differ slightly from what ld64 does. First, we emit the path to the source file as one `N_SO` symbol instead of two. (`ld64` emits one `N_SO` for the dirname and one of the basename.) Second, we do not emit `N_BNSYM` and `N_ENSYM` STABS to mark the start and end of functions, because the `N_FUN` STABS already serve that purpose. @clayborg recommended these changes based on his knowledge of what the debugging tools look for. Additionally, this current implementation doesn't accurately reflect the size of function symbols. It uses the size of their containing sectioins as a proxy, but that is only accurate if `.subsections_with_symbols` is set, and if there isn't an `N_ALT_ENTRY` in that particular subsection. I think we have two options to solve this: 1. We can split up subsections by symbol even if `.subsections_with_symbols` is not set, but include constraints to ensure those subsections retain their order in the final output. This is `ld64`'s approach. 2. We could just add a `size` field to our `Symbol` class. This seems simpler, and I'm more inclined toward it, but I'm not sure if there are use cases that it doesn't handle well. As such I'm punting on the decision for now. Reviewed By: clayborg Differential Revision: https://reviews.llvm.org/D89257 2020-12-01 22:45:01 +00:00			`}`
			`}`

			`if (hasDwarfInfo)`
			`return dObj;`
			`return nullptr;`
			`}`