From nobody Tue Mar 05 19:30:50 2024 X-Original-To: bugs@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4Tq5JV4Knwz5DGtf for ; Tue, 5 Mar 2024 19:30:50 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4Tq5JV291Fz4bsX for ; Tue, 5 Mar 2024 19:30:50 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1709667050; a=rsa-sha256; cv=none; b=kzNt3c4iaz2JMktJPNr87Ns1DarhhO2x8cg967TpvTVj4/IHfHKp8IGqJguUhAet1b1LSR tywB1gDBdCOnBLzT7+4KBnuH89omH4ECHu828hovZM4aGxtHkbbkTbihWDnOLeMnG0BaQK MsT5/T/RhbxtwDKvm2Faow261yVHWalBDmKQEwUIQuutQyspr4sPdgaBDWUjkd0v20UEFD m7a31rafB2A2xAaF0vGowS2h+CRAbeHINHY7R2IzGRQxTySE/PDCPJbtFWhQnilgPFhSs2 PbORWYJ8UQkm/eK3O3wQ5zJJFt4tO0xhPhchX5TRmqcTbmRFMkofj+Zpt6wj/w== ARC-Authentication-Results: i=1; mx1.freebsd.org; none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1709667050; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=nL1RkB8BIWvPvW26CGhLAh/XWq/Bxc83GAsJ0uyg3GU=; b=Uz0Dyy6OjFEpFrRprJ7sO53Ud+KyC6FvoxxrWpDoVji2bzdfEE7jzGcMTHt4GKTM5L29+S t5n0PpS9OoOfNrlCl7oDxJn6YNKCYeMOYlrPynxM5JppaMb+7HjCxAVQq67e8BYqDgGvx3 Bim18+8jHOSo96UoT9rTaZHkYy4JvdoQ0vtgxLfyMu8gO4XfdeHEL5VZCqlHmkleSFPWJ7 Yeto9DvcyWgxq6xq07IUDSfpCPV9EaMm/23jcSbLYZ7xvdBtaKH5k6nJBzCXFwES5ZkfTA HJVuj95eeIMVgF1b7XlSp0tJ76t0hO9NnT8alYPJWeVeHNXJozQfFksrEo8jbA== Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2610:1c1:1:606c::50:1d]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 4Tq5JV1VGnzJjd for ; Tue, 5 Mar 2024 19:30:50 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org ([127.0.1.5]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id 425JUoDG014429 for ; Tue, 5 Mar 2024 19:30:50 GMT (envelope-from bugzilla-noreply@freebsd.org) Received: (from www@localhost) by kenobi.freebsd.org (8.15.2/8.15.2/Submit) id 425JUo8J014424 for bugs@FreeBSD.org; Tue, 5 Mar 2024 19:30:50 GMT (envelope-from bugzilla-noreply@freebsd.org) X-Authentication-Warning: kenobi.freebsd.org: www set sender to bugzilla-noreply@freebsd.org using -f From: bugzilla-noreply@freebsd.org To: bugs@FreeBSD.org Subject: [Bug 277499] panic in doneq0 xpt_done_td xpt_done_process after HDD falling off the bus (Periph destroyed) Date: Tue, 05 Mar 2024 19:30:50 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 15.0-CURRENT X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: imp@FreeBSD.org X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: bugs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated List-Id: Bug reports List-Archive: https://lists.freebsd.org/archives/freebsd-bugs List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-bugs@freebsd.org MIME-Version: 1.0 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D277499 --- Comment #3 from Warner Losh --- commit 6c8ab086fed37a6b44fa84377e48c499f223ae80 Author: Warner Losh Date: Sun May 1 10:39:04 2022 -0600 ada: Retry commands with retries left on CAM_SEL_TIMEOUT The AHCI and ATA SIMs will return CAM_SEL_TIMEOUT when an underlying device has stopped responding. This is usually seen after a timeouted out command and can be a transient event. Rather than fail the peripheral immediately after seeing this, queue a retry. For transient events, this allows drives to continue to provide data, though with some added latency, just like we do when we have some other kind of retriable error. If the error isn't transient (the drive is truly gone), then we'll discover that eventually and fail the transaction and invalidate the drive like we do today. This helps us avoid a panic at the end of camperiphfree when CAM_PERIPH_NEW_DEV_FOUND is set. However, the deferred callback should be queued to xpt_async_td instead of being made inline there. This issue will be solved in a different patch that does that. PR 263703. This also helps us avoid another bug where we can drop all references to the device (causing us to go through camperiphfree and destroy the path) while we have an I/O pending in the ata_da state machine (usually in state ADA_STATE_RAHEAD with ATA_SETFEATURES ATA_SF_ENAB_RCACHE command). It's not clear why the reference that we take out to do the reprobe isn't effective at blocking this. By retrying this condition, though we avoid this bug (at least more often, I don't have a good reproduction test case, I just see this panic a few times a month at work on systems that have transient disk errors on ahci connected SATA SSDs). PR 263704. It's too soon to know how much this helps us avoid this bug. Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D34977 --=20 You are receiving this mail because: You are the assignee for the bug.=