From nobody Fri May 27 22:13:52 2022 X-Original-To: freebsd-hackers@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id EB4831B495D6 for ; Fri, 27 May 2022 22:14:02 +0000 (UTC) (envelope-from paulf2718@gmail.com) Received: from mail-wr1-x42d.google.com (mail-wr1-x42d.google.com [IPv6:2a00:1450:4864:20::42d]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4L8zbt2pz8z3mqR for ; Fri, 27 May 2022 22:14:02 +0000 (UTC) (envelope-from paulf2718@gmail.com) Received: by mail-wr1-x42d.google.com with SMTP id l30so7536357wrb.8 for ; Fri, 27 May 2022 15:14:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:date:mime-version:user-agent:to:content-language:from :subject; bh=jmuJkAPX0ZC9EcusUHMu9ppp4z/uyg+Qoq4s1K+2yXg=; b=kpFxElhW5oqPOFjTFkjRG4CUBlOCKHdcLqrJeft5STFMgozHo/c78tUK+afJXyIF4a P+/cYd4tjshXSMBdtTd2uoI0up4iJjy56cCL48DjE9eD/GKDdtUZuw1VXca4ADcTVXXW xkotxzSW9Dq5dzjmEDw3fqOW2vLm9TbdXDkkOrcFxRvWYI66qjY2Zp2d+BK+S8AwpDzx z6h6FLRV1dZs9AovL9iaQrySqBzpAPwVQoSrCXf9P1wGjDH1Qw7rtTNw5I+P/emqY7WV m3nqXFsCjhNfEdYQ4NyfaWqBKv8WwxWALsMhDZsT8K7jyn6bG5W3bAEfevFrWbFT0DVo cqGg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent:to :content-language:from:subject; bh=jmuJkAPX0ZC9EcusUHMu9ppp4z/uyg+Qoq4s1K+2yXg=; b=rDppTi+VUXMLx2fbqaw85s3sI4IzBRVjXGRr11I+BbuZtiVf5GfaFJGyFDsWT+xThb uy1/R1sp+RO1iuOFLswwmvBzO/xPKmwi0UYa421GVfX4YkJj3WCAu4a+lQ1akHqkswZx 5RwCvRiSS8nmr2gS2fuZsrO/7JwqzGGP95uiKaSrEuwHlhehitEDxlo+BfqprZgXDD3K hAc/ZXxZlYw996/u8yayN2nc0QlvqiZssEyjF1cFK3N7lHdqiiQVbr2dhsFuVhLoQk3I qRmwO0LMU4mleaaGPRroU9ekOn7CpHAslBeXs3s64f9E+R2nDvgsStdDvEUPDUz0fYcS htdw== X-Gm-Message-State: AOAM533OSx9G6WnCHXBq9ymSgTPlMyHeTP50CCwaMTvtqkOlNFrFdqzh yDdE9evQW8amorUgBQxFFhdvfDuct00= X-Google-Smtp-Source: ABdhPJwxGPLTGXBLR9EF5KlqRxyFIP3QrYHno2RxzJiSHaHQ0tM4qJ9Ln15fKq8rhIepeq3cYXJRlA== X-Received: by 2002:a05:6000:1634:b0:210:1dee:4bc0 with SMTP id v20-20020a056000163400b002101dee4bc0mr560391wrb.537.1653689641372; Fri, 27 May 2022 15:14:01 -0700 (PDT) Received: from [192.168.1.28] (lfbn-lyo-1-398-93.w2-7.abo.wanadoo.fr. [2.7.225.93]) by smtp.gmail.com with ESMTPSA id k32-20020a05600c1ca000b003975c7058bfsm3181045wms.12.2022.05.27.15.14.00 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 27 May 2022 15:14:00 -0700 (PDT) Content-Type: multipart/alternative; boundary="------------5XxUQTN4h6fYAxHzJ5NA3cZn" Message-ID: <84015bf9-8504-1c3c-0ba5-58d0d7824843@gmail.com> Date: Sat, 28 May 2022 00:13:52 +0200 List-Id: Technical discussions relating to FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-hackers List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-hackers@freebsd.org MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:91.0) Gecko/20100101 Thunderbird/91.9.1 To: FreeBSD Hackers Content-Language: en-US From: Paul Floyd Subject: Hang ast / pipelk / piperd X-Rspamd-Queue-Id: 4L8zbt2pz8z3mqR X-Spamd-Bar: --- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=gmail.com header.s=20210112 header.b=kpFxElhW; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (mx1.freebsd.org: domain of paulf2718@gmail.com designates 2a00:1450:4864:20::42d as permitted sender) smtp.mailfrom=paulf2718@gmail.com X-Spamd-Result: default: False [-3.20 / 15.00]; RCVD_VIA_SMTP_AUTH(0.00)[]; R_SPF_ALLOW(-0.20)[+ip6:2a00:1450:4000::/36:c]; FREEMAIL_FROM(0.00)[gmail.com]; RCVD_COUNT_THREE(0.00)[3]; TO_DN_ALL(0.00)[]; DKIM_TRACE(0.00)[gmail.com:+]; DMARC_POLICY_ALLOW(-0.50)[gmail.com,none]; NEURAL_HAM_SHORT(-0.20)[-0.197]; RECEIVED_SPAMHAUS_PBL(0.00)[2.7.225.93:received]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+,1:+,2:~]; FREEMAIL_ENVFROM(0.00)[gmail.com]; ASN(0.00)[asn:15169, ipnet:2a00:1450::/32, country:US]; MID_RHS_MATCH_FROM(0.00)[]; DWL_DNSWL_NONE(0.00)[gmail.com:dkim]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; R_DKIM_ALLOW(-0.20)[gmail.com:s=20210112]; FROM_HAS_DN(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; MIME_GOOD(-0.10)[multipart/alternative,text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-hackers@freebsd.org]; RCPT_COUNT_ONE(0.00)[1]; RCVD_IN_DNSWL_NONE(0.00)[2a00:1450:4864:20::42d:from]; MLMMJ_DEST(0.00)[freebsd-hackers]; RCVD_TLS_ALL(0.00)[] X-ThisMailContainsUnwantedMimeParts: N This is a multi-part message in MIME format. --------------5XxUQTN4h6fYAxHzJ5NA3cZn Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Hi I'm debugging two issues with Valgrind on FreeBSD 13.1 and 14, one on amd64 and one on i386. The 1st testcase, on i386, creates 10 threads that all just then call pause(). Then there is a fork(), the parent does a pause() and the child kills the parent(). The error is reproducible. The second testcase, on amd64, runs a loop for 7 tests, each one creating 2 threads. The thread function writes either to a global variable or various types of TLS, using a nanosleep as a way to yeild between the threads. This hang is intermittent. The above detail is probably not that relevant. In both examples Valgrind is hanging with 100% CPU use. In ktrace where things seem to go wrong there is |9340 none-amd64-freebsd GIO fd 28503 read 1 byte "X" 9340 none-amd64-freebsd RET read 1 9340 none-amd64-freebsd CSW stop user "ast" 9340 none-amd64-freebsd CSW resume kernel "pipelk" 9340 none-amd64-freebsd CSW stop kernel "piperd" 9340 none-amd64-freebsd CSW resume kernel "pipelk" 9340 none-amd64-freebsd CSW stop kernel "piperd" ... repeat until killed That read is a pipe used for the Valgrind scheduler lock. The scheduler runs single threaded, and the read above means that one thread has acquired the lock and should be able to run. Instead it looks like there is an ast that gets the kernel stuck in context switches to pipe read and pipe lock states. kill -9 is the only way out. This all worked OK from FreeBSD 11.3 to 13.0. It's quite difficult to trace this within Valgrind. Both hangs seem quite sensitive to timing - in both cases adding or changing nanosleep times seem to make them no longer hang. Adding debug statements to Valgrind can also change the behaviour (and is also unsafe when not holding the scheduler lock). Does this look like a kernel bug? A+ Paul | --------------5XxUQTN4h6fYAxHzJ5NA3cZn Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: 7bit

Hi

I'm debugging two issues with Valgrind on FreeBSD 13.1 and 14, one on amd64 and one on i386.

The 1st testcase, on i386, creates 10 threads that all just then call pause(). Then there is a fork(), the parent does a pause() and the child kills the parent(). The error is reproducible.

The second testcase, on amd64, runs a loop for 7 tests, each one creating 2 threads. The thread function writes either to a global variable or various types of TLS, using a nanosleep as a way to yeild between the threads. This hang is intermittent.

The above detail is probably not that relevant.

In both examples Valgrind is hanging with 100% CPU use.

In ktrace where things seem to go wrong there is


  9340 none-amd64-freebsd GIO   fd 28503 read 1 byte
       "X"
  9340 none-amd64-freebsd RET   read 1
  9340 none-amd64-freebsd CSW   stop user "ast"
  9340 none-amd64-freebsd CSW   resume kernel "pipelk"
  9340 none-amd64-freebsd CSW   stop kernel "piperd"
  9340 none-amd64-freebsd CSW   resume kernel "pipelk"
  9340 none-amd64-freebsd CSW   stop kernel "piperd"
... repeat until killed


That read is a pipe used for the Valgrind scheduler lock. The scheduler runs single threaded, and the read above means that one thread has acquired the lock and should be able to run.

Instead it looks like there is an ast that gets the kernel stuck in context switches to pipe read and pipe lock states. kill -9 is the only way out.

This all worked OK from FreeBSD 11.3 to 13.0.


It's quite difficult to trace this within Valgrind. Both hangs seem quite sensitive to timing - in both cases adding or changing nanosleep times seem to make them no longer hang.
Adding debug statements to Valgrind can also change the behaviour (and is also unsafe when not holding the scheduler lock).

Does this look like a kernel bug?

A+
Paul

--------------5XxUQTN4h6fYAxHzJ5NA3cZn--