From nobody Thu Aug 12 14:46:52 2021 X-Original-To: ports-bugs@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 40352137D4F6 for ; Thu, 12 Aug 2021 14:46:52 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4GlqJr1Fxpz4shQ for ; Thu, 12 Aug 2021 14:46:52 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2610:1c1:1:606c::50:1d]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 14AA824BA0 for ; Thu, 12 Aug 2021 14:46:52 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org ([127.0.1.5]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id 17CEkpJt084009 for ; Thu, 12 Aug 2021 14:46:51 GMT (envelope-from bugzilla-noreply@freebsd.org) Received: (from www@localhost) by kenobi.freebsd.org (8.15.2/8.15.2/Submit) id 17CEkp80084008 for ports-bugs@FreeBSD.org; Thu, 12 Aug 2021 14:46:51 GMT (envelope-from bugzilla-noreply@freebsd.org) X-Authentication-Warning: kenobi.freebsd.org: www set sender to bugzilla-noreply@freebsd.org using -f From: bugzilla-noreply@freebsd.org To: ports-bugs@FreeBSD.org Subject: [Bug 257788] databases/postgres{12|13|14}-{server|client}: severe Kernel TLS issues Date: Thu, 12 Aug 2021 14:46:52 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Ports & Packages X-Bugzilla-Component: Individual Port(s) X-Bugzilla-Version: Latest X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Many People X-Bugzilla-Who: ohartmann@walstatt.org X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: ports-bugs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version rep_platform op_sys bug_status bug_severity priority component assigned_to reporter Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated List-Id: Ports bug reports List-Archive: https://lists.freebsd.org/archives/freebsd-ports-bugs List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-ports-bugs@freebsd.org X-BeenThere: freebsd-ports-bugs@freebsd.org MIME-Version: 1.0 X-ThisMailContainsUnwantedMimeParts: N https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D257788 Bug ID: 257788 Summary: databases/postgres{12|13|14}-{server|client}: severe Kernel TLS issues Product: Ports & Packages Version: Latest Hardware: Any OS: Any Status: New Severity: Affects Many People Priority: --- Component: Individual Port(s) Assignee: ports-bugs@FreeBSD.org Reporter: ohartmann@walstatt.org Since May 2021 we face severe DB issues with a couple of systems running 14-CURRENT, at this time FreeBSD 14.0-CURRENT #11 main-n248668-aecd31a8a3b:= Thu Aug 12 15:15:58 CEST 2021 amd64, dual stack (IPv4/IPv6) configurations. The ports database/postgresql13-{server|client|contrib} have been recompiled via "portmaster -df postgresql" for several times now on two specific hosts wit= hout success so far. Before I describe the phenomenon, I state that we use customized kernel configurations, kernel TLS is enabled in the kernel by default and we also played with the kernel OID kern.ipc.tls.enable=3D0|1 but I'll report later. For the tests described below, kern.ipc.tls.enable= =3D0 is set to ZERO ("0"). Otherwise an error occurs, see below. For the record: both systems in question I report are running on an older I= ntel IvyBridge hardware (Intel(R) Core(TM) i5-3470 CPU and Intel(R) Xeon(R) CPU E3-1245 V2). The XEON host also acts as a poudriere package builder, see below, it seems important to me to mention this here. The phenomenon is as follows. On the hosts running PostgreSQL 12, 13 or 14 = as server, login via "psql -U postgres -d postgres" is always possible via loc= al socket, but "psql -U postgres -d postgres -h localhost" (or replace localho= st by 127.0.0.1 or ::1 to exclude any misunderstandings) fails, after a while = the client hit a timeout: #: psql -U postgres -d postgres -h 192.168.0.223 psql: error: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. Checking via sockstat -4|-6 indicates, that postgresql is listening on its default port 5432 on those machines in question and IPFW is setup properly = or disabled (simply set to "OPEN") for test pusrposes.=20 Configuring the PostgreSQL server's logging to debug does not give anything useful, the only thing one can see in the log is, if logging is set to "inf= o": root@:~ # 2021-08-12 13:51:05.137 GMT [2132] LOG: connection received: host=3Dhost1.local.net port=3D41162 Then - silence! As the server went deaf. To make sure that not a corrupted DB causes the problems or a hidden misconfiguration in either pg_hba.conf and/or postgresql.conf, we installed= on both systems version 12, 13 and even 14 of the software (compiled via class= ical make). It is with all versions the same problem on that hosts. To exclude any issues regarding self-compiling postgresql, we also fetched = the pkg tarball from an official FreeBSD mirror of posygresql13-server and installed that one. The problem remains and leaves us with either a broken world or kernel so far. Recompiling world and kernel with vanilla settings = did not change anything so far. Using GENERIC as a kernel does also not mitigat= e or resolve the problem. As initially mentioned, the XEON box also acts as a poudriere package host building with the very same make.conf as the host (and so the non working db host itself) packages also for 13-STABLE. From a client running a recent 13-STABLE and equipted with the packages bui= lt from the host in question above, IT IS POSSIBLE to connect to the PostgreSQ= L 13 server, as long as kern.ipc.tls.enable=3D0 is set to =3D0. If one sets kern.ipc.tls.enable=3D1 to "1", the client (run= ning psql 13.3) receives: psql: error: SSL SYSCALL error: EOF detected So, the Postgresql 13.3 server itself on the failing host is serving as expected, so it seems to be the client having severe problems. The problems occured on all infected systems almost the same time arounf Ma= y, 26th this year, when we did our weekly updates of the 14-CURRENT base system and portmaster jobs for ports, that might be a hint since I do not remember when LLVM 12 has been introduced or KTLS has been activated. Also, to exclude any issue with iflib and the i350 NICs on the servers, we disabled any hardware checksum offloading for vlan and RX/TX, so that at the end a "naked" interface without any hardware support is used. But that didn= 't resolve anything, too. Another test went really sideways. We moved the complete configuration (base system, kernel, sysctl.conf, postgresql13 configs and databases) to another, more modern platform (it is a XEON based system, it's remotely not accessib= le, so I can't report about its hardware specs). On this box, based on 14-CURRE= NT and postgresql13 in a jail, the server acts as expected and local connectio= sn as well as remote connections are possible. This is really weird and leaves= me with a preliminary conclusion, that something is really wrong. I'm out of ideas here and floating like a dead man in the water ... --=20 You are receiving this mail because: You are the assignee for the bug.=