Re: s6-envuidgid: Weird errors with GNU libc's getgrent() and endgrent()

From: Guillermo <gdiazhartusch_at_gmail.com>
Date: Sun, 9 Jun 2019 17:28:49 -0300

Well, this one required GDB to tell what was going on, but I managed
to find a workaround.

> Short version: For recent libc releases, and at least on Gentoo,
> getgrent() and endgrent() seem to magically set errno to EINVAL (I
> think),
> [...]
> s6-envuidgid [...] fails with a strange "invalid argument" error
> whenever it tries to set GIDLIST. s6-setuidgid invokes s6-envuidgid,
> so it also fails.

For those who don't know, in GNU libc, getgrent(3) and endgrent(3) are
implemented by __nss_getent_r() and __nss_endent(), respectively,
which are part of the Name Service Switch (NSS) mechanism. You know,
the one that Laurent's nsss project is a replacement of. My
/etc/nsswitch.conf, which I don't recall having ever modified, says:

group: db files

With this configuration, the first time __nss_getent_r() is called, it
tries to call the implementation of setgrent(3) from each of these
services, "db", implemented by module /lib64/libnss_db.so.2, and
"files", implemented by module /lib64/libnss_files.so.2. The first one
is _nss_db_setgrent(), which tries to open /var/db/group.db, fails
because on my machine that file does not exist, and returns an
'unavailable' status (NSS_STATUS_UNAVAIL). The second one is
_nss_files_setgrent(), which tries to open /etc/group, succeeds, and
returns a 'successful' status (NSS_STATUS_SUCCESS). From then on,
__nss_getent_r() always calls the implementation of getgrent() from
libnss_files.so, named _nss_files_getgrent_r(). Relevant output of an
strace of the test program in my OP:

openat(AT_FDCWD, "/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/etc/nsswitch.conf", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/lib64/libnss_db.so.2", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/lib64/libnss_files.so.2", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/var/db/group.db", O_RDONLY|O_CLOEXEC) = -1 ENOENT
(No such file or directory)
openat(AT_FDCWD, "/etc/group", O_RDONLY|O_CLOEXEC) = 3

But it turns out that, with this configuration, __nss_endent() *also*
wants to call the implementation of endgrent(3) from each of these
services. And the one from libnss_db.so, named _nss_db_endgrent(), is
just a wrapper around a munmap(2) system call, via an intermediate
internal_endent() function:

* https://sourceware.org/git/?p=glibc.git;a=blob;f=nss/nss_db/db-open.c;h=8a83d6b9302b39a071d0ddca5ab686e6ecfd6178;hb=56c86f5dd516284558e106d04b92875d5b623b7a

In my case, this results in a 'munmap(NULL, 0)' call that… you
guessed, fails with EINVAL (remember that _nss_files_setgrent() said
the service was unavaliable?). And strace happens to see it:

write(1, "End of file or error (errno = Su"..., 39) = 39
munmap(NULL, 0) = -1 EINVAL (Invalid argument)
close(3) = 0
write(1, "errno = Invalid argument\n", 25) = 25

The implementation from libnss_files.so, _nss_files_endgrent(), is
also called, and succeeds, but errno is already set. So, the
workaround? I removed the "db" service for the group database in
/etc/nsswitch.conf:

group: files

With this change, the output of the test program looks exactly like
Casper's and Brett's, strace no longer shows openat() calls for
/lib64/libnss_db.so.2 and /var/db/group.db, and both s6-setuidgid and
s6-envuidgid work again.

I have no idea what changed, why this used to work before my upgrade
of the libc, or why it apparently never failed for anyone else not on
Gentoo.

G.
Received on Sun Jun 09 2019 - 20:28:49 UTC

This archive was generated by hypermail 2.3.0 : Sun May 09 2021 - 19:38:49 UTC