PR27701 implemented curl handle reuse in debuginfod_client objects,
but with an unexpected bug. Server responses returning an error
"latched" because the curl_easy handles for error cases weren't all
systematically removed from the curl multi handle. This prevented
their proper re-addition the next time.
This version of the code simplfies matters by making only the curl
curl_multi handle long-lived. This turns out to be enough, because it
can maintain a pool of long-lived http/https connections and related
data, and lend them out to short-lived curl_easy handles. This mode
handles errors or hung downloads even better, because the easy handles
don't undergo complex state transitions between reuse.
A new test case confirms this correction via the federating debuginfod
instance (cleaning caches between subtests to make sure http* is being
used and reused).
Signed-off-by: Frank Ch. Eigler <fche@redhat.com>
New function in system.h that returns true if a string has a given
prefix, false otherwise. Use it in place of strncmp.
Signed-off-by: Martin Liška <mliska@suse.cz>
Add debuginfod_config_cache for reading and writing to cache
configuration files, make use of the function within
debuginfod_clean_cache and debuginfod_query_server.
In debuginfod_query_server, create 000-permission file on failed
queries. Before querying each BUILDID, if corresponding 000 file
detected, compare its stat mtime with parameter from
.cache/cache_miss_s. If mtime is fresher, then return ENOENT and
exit; otherwise unlink the 000 file and proceed to a new query.
tests: add test in run-debuginfod-find.sh
test if the 000 file is created on failed query; if querying the
same failed BUILDID, whether the query should proceed without
going through server; set the cache_miss_s to 0 and query the same
buildid, and this time should go through the server.
Signed-off-by: Alice Zhang <alizhang@redhat.com>
With PR25365, we accidentally lost the ability to rmdir client-cache
directories corresponding to buildids. Bring this back, with some
attention to a possible race between a client doing cleanup and
another client doing lookups at the same time.
Signed-off-by: Frank Ch. Eigler <fche@redhat.com>
Whenever we encounter an attribute with DW_FORM_indirect, we need to
read its true form from the DIE data. Then, we can continue normally.
This adds support to the most obvious places: __libdw_find_attr() and
dwarf_getattrs(). There may be more places that need to be updated.
I encountered this when inspecting a file that was processed by our BOLT
tool: https://github.com/facebookincubator/BOLT. This also adds a couple
of test cases using a file generated by that tool.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Client objects now carry long-lived curl handles for outgoing
connections. This makes it more efficient for multiple sequential
queries, because the TCP connections and/or TLS state info are kept
around awhile, avoiding O(100ms) setup latencies. debuginfod is
adjusted to take advantage of this for federation. Other clients
should gradually do this too, perhaps including elfutils itself (in
the libdwfl->debuginfod_client hooks).
A large gdb session with 117 debuginfo downloads was observed to run
twice as fast (45s vs. 1m30s wall-clock time), just in nuking this
extra setup latency. This was tested via a debuginfod intermediary:
it should be even faster once gdb reuses its own debuginfod_client.
Signed-off-by: Frank Ch. Eigler <fche@redhat.com>
In order to assist problem diagnosis / monitoring, use this
gnu-flavoured pthread function to set purpose names to the various
child threads debuginfod starts. libmicrohttpd already sets this for
its threads.
Signed-off-by: Frank Ch. Eigler <fche@redhat.com>
We were looking at a less-than-ideal metric to check the effects
of grooming on the database. It turns out there is a counter
just for removed files/archives, which will have the same value
regardless of the presence of other test configurations.
Signed-off-by: Frank Ch. Eigler <fche@redhat.com>
To help diagnose timing glitches in debuginfod testing, print more
diagnostics on a metric-timeout failure.
Signed-off-by: Frank Ch. Eigler <fche@redhat.com>
While inspecting some type units I noticed the type offset seemed off.
We were printing the offset as is, but it should include the offset of
the unit. There was actually a testcase for this, run-readelf-types.sh
but that had the same bug in the expected output. Fixed both.
Signed-off-by: Mark Wielaard <mark@klomp.org>
Commit eb922a1b8f ("tests: use ${CC} instead of 'gcc' in tests")
exports ${CC} into the test environment, but doesn't quote the
value for the assignment. That doesn't work properly if the value
contains whitespace. In a multilib/biarch environment however, it's
common to set CC="gcc -m32" or similar. That causes tests to print
error messages: "/bin/sh: line 2: -m32: command not found".
Fix that by adding quotes around all make variables (not just $CC)
used in setting up TESTS_ENVIRONMENT.
Signed-off-by: Alexander Miller <alex.miller@gmx.de>
A couple of closely related pieces of work allow more early warning
about low storage/memory conditions:
- New prometheus metrics to track filesystem freespace, and more
details about some errors.
- Frequent checking of $TMPDIR freespace, to trigger fdcache
emergency flushes.
- Switch to floating point prometheus metrics, to communicate
fractions - and short time intervals - accurately.
- Fix startup-time pthread-creation error handling.
Testing is smoke-test-level only as it is hard to create
free-space-limited $TMPDIRs. Locally tested against tiny through
medium tmpfs filesystems, with or without sqlite db also there. Shows
a pleasant stream of diagnostics and metrics during shortage but
generally does not fail outright. However, catching an actual
libstdc++- or kernel-level OOM is beyond our ken.
Signed-off-by: Frank Ch. Eigler <fche@redhat.com>
To better support cross-compilation Gentoo provides a way
to configure system without 'gcc' binary and only provide
tool-prefixed tools, like 'x86_64-pc-linux-gnu-gcc'.
The packages are built as ./configure --host=x86_64-pc-linux-gnu.
In https://bugs.gentoo.org/718872 Agostino Sarubbo found
a few test failures that use hardcoded 'gcc' instead of
expected ${CC}. The change propagates detected ${CC} at
configure time to test scripts.
Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
Move subdirectory parts of the top level .gitignore into appropriate
subdirectories. This would be consistent with ChangeLog files,
currently one has to update the top level ChangeLog file when
the top level .gitignore file is changed in a way that affects
a specific subdirectory only.
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
readelf -S now shows 'R' when SHF_GNU_RETAIN is set.
elflint accepts SHF_GNU_RETAIN when set on section in --gnu mode.
Signed-off-by: Mark Wielaard <mark@klomp.org>
Move the definition of _(Str) macro to lib/eu-config.h which already
provides a definition of N_(Str) macro. Since lib/eu-config.h is
appended to config.h, it is included into every compilation unit
and therefore both macros are now universally available.
Remove all other definitions of N_(Str) and _(Str) macros from other files
to avoid conflicts and redundancies.
The next step is to replace all uses of gettext(Str) with _(Str).
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
tests/configure.ac was introduced 15 years ago by commit
d7f8d0caa7. However, the ability to build
tests as a separate project was broken by the same author 4 years later
by commit 22359e2653, if not earlier.
An attempt to run autoreconf in tests would currently fail
with the following automake error:
automake: error: cannot open < config/eu.am: No such file or directory
Apparently, nobody builds tests as a separate project for more than 10
years, so cleanup the remains of that unused and broken code.
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Add metrics for tracking sqlite3 error counts and query performance.
The former looks like a new sibling of the "error_count" family, and
is tested by dd-corrupting a live database file then triggering some
debuginfod activity.
error_count{sqlite3="file is not a database"} 1
The latter looks like _count/_sum pairs for each type of sqlite
prepared-statement used in the code, and is grep smoke-tested. They
should assist a sysadmin in tuning db storage. This example shows a
6.4 ms/operation cost:
sqlite3_milliseconds_count{step-done="rpm-file-intern"} 318
sqlite3_milliseconds_sum{reset="rpm-file-intern"} 2033
Signed-off-by: Frank Ch. Eigler <fche@redhat.com>
Improve monitoring of debuginfod instances by tracking thread_busy
status for the threads responding to http requests. While these are
usually short-lived, longer archive-uncompress operations can take
long enough time to show up on top/uptime. This should also assist
noticing abusive clients and guide scaling of the service.
Signed-off-by: Frank Ch. Eigler <fche@redhat.com>
We used to try to trigger an error during debuginfod scanning using
a chmod 000 file. But this doesn't always result in an error. Create
a cyclic symlink instead, which always results in a failure to open/read.
Signed-off-by: Mark Wielaard <mark@klomp.org>
Use defined constants for permission values. Also add fallback
definitions for them in system.h, to allow for compatibility with
systems that don't provide these macros.
Include system.h in all tests/ files that required it.
Signed-off-by: Érico Rolim <erico.erc@gmail.com>
Signed-off-by: Mark Wielaard <mark@klomp.org>
Added new metrics for scanning that allow estimation of its reading
bandwidth. Accelerated responsivity to SIGINT shutdown during
archive-scanning phase, which previously insisted on completely
processing the current archive. Noted in systemd service file that in
the worst case, it might still take a long time. Accelerated
traversals by moving regex -I/-X handling to apply to file names only
(as always documented), so directory traversal metrics are accurate
regardless of their name.
Signed-off-by: Frank Ch. Eigler <fche@redhat.com>
Some combination of old glibc and valgrind create unsuppressable memory
leak warnings when the process is run under valgrind, is multi-threaded
and uses dlopen. libdw will try to dlopen libdebuginfod be default.
So simply override dlopen and always return NULL to make sure
libdebuginfod is never loaded. The dwfl-proc-attach test doesn't rely
on libdebuginfod anyway.
This was seen on the armbian buildbot which uses valgrind 3.14.0 and
glibc 2.28.
Signed-off-by: Mark Wielaard <mark@klomp.org>
Debian uses dash as /bin/sh which is pretty strict about syntax. It
didn't like the == in the test for strings in the test-wrapper.sh.
Replace by single =.
Signed-off-by: Mark Wielaard <mark@klomp.org>
When configuring with --enable-valgrind we were only running valgrind
on tests with a shell wrapper script. This patch makes sure to also run
valgrind on "pure" binary tests. This found one small issue in libasm
where we could be writing some uninitialized padding to an ELF file.
And there were a couple tests that didn't clean up all the resources
they used. Both issues are also fixed with this patch.
Signed-off-by: Mark Wielaard <mark@klomp.org>
... and add new metrics about progress of traversal and groom
processes. Correct one control flow abnormality that could
prematurely end a scanner thread and might have accounted for
the inconsistent test results from the previous patch.
Signed-off-by: Frank Ch. Eigler <fche@redhat.com>
On very large servers, it's desirable to be able to interrupt a rescan
or groom cycle. SIGUSR[12] now do that. (Unfortunately, this is not
practically testable in the testsuite, since these cycles are so fast
on that small dataset.) We also expose more internal progress count
about the grooming pass, so the administrator can assess possible need
to interrupt.
Signed-off-by: Frank Ch. Eigler <fche@redhat.com>
debuginfod now knows to handle a case where a buildid search is
satisfiable from more than one source (e.g., archive location), but
some of them are invalid. New exception catching beneath the sqlite
scanning loop ensures all possible matches are scanned in case of
errors.
Signed-off-by: Frank Ch. Eigler <fche@redhat.com>
Run tests/read_unaligned 1 on a big endian and little endian machine
to generate the le_mem and be_mem arrays. The one byte variants are
kind of impossible to get wrong, but including them makes sure the
other variants are not naturally aligned in memory.
Signed-off-by: Mark Wielaard <mark@klomp.org>
PR 26773 points out that some sleb128 values are decoded incorrectly.
This version of the fix only examines the sleb128 conversion.
Overlong encodings are not handled, and the uleb128 decoders are not
touched. The approach taken here is to do the work in an unsigned
type, and then rely on an implementation-defined cast to convert to
signed.
Signed-off-by: Tom Tromey <tom@tromey.com>
GCC11 will warn about a mismatch in the declaration of dwarf_frame_register:
dwarf_frame_register.c:37:61: error: argument 3 of type ‘Dwarf_Op *’
declared as a pointer [-Werror=array-parameter=]
37 | dwarf_frame_register (Dwarf_Frame *fs, int regno, Dwarf_Op *ops_mem,
| ~~~~~~~~~~^~~~~~~
libdw.h:1068:43: note: previously declared as an array ‘Dwarf_Op[3]’
1068 | Dwarf_Op ops_mem[3],
| ~~~~~~~~~^~~~~~~~~~
When fixing that it will show an actual bug in the addrcfi testcase:
addrcfi.c:98:16: error: ‘dwarf_frame_register’ accessing 96 bytes in a
region of size 64 [-Werror=stringop-overflow=]
98 | int result = dwarf_frame_register (stuff->frame, regno, ops_mem, &ops, &nops);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
addrcfi.c:98:16: note: referencing argument 3 of type ‘Dwarf_Op *’
1069 | extern int dwarf_frame_register (Dwarf_Frame *frame, int regno,
| ^~~~~~~~~~~~~~~~~~~~
Fix the declaration, fix the bug and add an extra comment to the description
in libdw.h.
Signed-off-by: Mark Wielaard <mark@klomp.org>
Support for the Tilera TILE-Gx processor has been removed or deprecated
in gcc and binutils already. There are no users and there is no way to
test it.
Signed-off-by: Mark Wielaard <mark@klomp.org>
Add an error_count{} family of metrics for each libc/libarchive/http
exception instance created during operation. Add a family of fdcache*
metrics for tracking fdcache operations and status. Test via a
injecting a permission-000 empty nothing.rpm in the testsuite.
Signed-off-by: Frank Ch. Eigler <fche@redhat.com>
Newer kernels might be compressed using ZSTD add support to libdwfl open
so we can can automatically read ZSTD compressed files and kernel images.
The support is very similar to the bzip2 and lzma support, but slightly
different. With a bit more macros it could maybe have used the gzip.c
USE_INFLATE code path. But I felt that the many macros didn't really help
understand the code. So the unzip routine has a slightly different code
path for ZSTD.
https://sourceware.org/bugzilla/show_bug.cgi?id=26632
Signed-off-by: Mark Wielaard <mark@klomp.org>
DW_CFA_AARCH64_negate_ra_state is used on aarch64 to indicate whether
or not the return address is mangled or not. This has the same value
as the DW_CFA_GNU_window_save. So we have to pass around the e_machine
value of the process or core we are inspecting to know which one to
use.
Note that it isn't actually implemented yet. It needs ARMv8.3 hardware.
If we don't have such hardware it is enough to simply ignore the
DW_CFA_AARCH64_negate_ra_state (and not confuse it with
DW_CFA_GNU_window_save) to get backtraces to work on aarch64.
Add a testcase for eu-readelf --debug-dump=frames to show the value
is correctly recognized. Also don't warn we cannot find any DWARF
if we are just dumping frames (those will come from .eh_frame if
there is no .debug_frame).
Signed-off-by: Mark Wielaard <mark@klomp.org>
When building with gcc -mbranch-protection= we might get a gnu property
note indicating BTI (Branch Target Identification) and/or PAC (Pointer
Authentication Code) is being used.
Add a small testcase to show eu-readelf -n now properly lists those
bits in the gnu property note.
Signed-off-by: Mark Wielaard <mark@klomp.org>
Since commit 287a18452 libasm.h defines an opague Ebl handle.
This is fine, except for (internal) code that also includes libebl.h.
Since C11 having multiple typedefs for the same thing is fine, but we
do build using GNU/C99. This also allows multiple same typedefs, except
for (very) old GCCs.
This only affects internal code, since libebl.h isn't a public header.
For internal code, only add the typedef in libebl.h when libasm.h
hasn't been included. Make sure all code that includes both headers
includes libasm.h first.
Signed-off-by: Mark Wielaard <mark@klomp.org>
The public headers should be usable when includes as is.
libasm.h wasn't because it was using gelf.h data structures without
include gelf.h. Include it now in libasm.h.
Add a new testcase run-test-includes.sh to test all public headers
can be included "standalone".
https://sourceware.org/bugzilla/show_bug.cgi?id=26176
Signed-off-by: Mark Wielaard <mark@klomp.org>
Check scheme instead of effective url so that user may abbreviate
DEBUGINFOD_URL. Add one test for scheme free http url.
Notice that libcurl does not provide an almighty scheme free url
support, /path/to/something without FILE:// can not be recognized
in most circumstances, therefore for the neatness of our code
structure, DEBUGINFOD_ URL of scheme "FILE" must be input as URI.
Signed-off-by: Alice Zhang <alizhang@redhat.com>
Make it possible to build just the debuginfod client or to create a
dummy libdebuginfod that doesn't link against libcurl. The dummy library
can be used for bootstrapping. For testing purposes you can also build
debuginfod against the dummy libdebuginfod but then the debuginfod
server will not be able to do delegation.
Signed-off-by: Mark Wielaard <mark@klomp.org>