Re: MFC fixes for uninitialized kernel stack variables in sys/cam or do direct fix for pvscsi driver
- Reply: Zhenlei Huang : "Re: MFC fixes for uninitialized kernel stack variables in sys/cam or do direct fix for pvscsi driver"
- In reply to: Zhenlei Huang : "Re: MFC fixes for uninitialized kernel stack variables in sys/cam or do direct fix for pvscsi driver"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Wed, 15 Jan 2025 04:37:08 UTC
Great. This looks good. Warner On Tue, Jan 14, 2025, 9:36 PM Zhenlei Huang <zlei@freebsd.org> wrote: > > > On Jan 13, 2025, at 5:06 PM, Zhenlei Huang <zlei@FreeBSD.org> wrote: > > > > On Jan 13, 2025, at 4:22 PM, Zhenlei Huang <zlei@FreeBSD.org> wrote: > > > > On Dec 28, 2024, at 6:03 AM, Warner Losh <imp@bsdimp.com> wrote: > > > > On Mon, Dec 2, 2024 at 2:41 AM Zhenlei Huang <zlei@freebsd.org> wrote: > >> Hi Warner, >> >> Recently I upgraded some ESXi vms from 13.3 to 13.4 and noticed weird >> report for sas speed. >> The boot console has the following, >> >> ``` >> da0 at pvscsi0 bus 0 scbus2 target 0 lun 0 >> da0: <VMware Virtual disk 2.0> Fixed Direct Access SPC-4 SCSI device >> da0: 4294967.295MB/s transfers >> ``` >> But camcontrol report the correct value, >> ``` >> # camcontrol inquiry da0 -R >> pass1: 750.000MB/s transfers, Command Queueing Enabled >> ``` >> >> The `4294967.295MB` is actually 0xffff_ffff or -1 but I do not see any >> logic set those values. >> >> Finally I managed to get the stack trace, >> ``` >> _scsi_announce_periph >> scsi_announce_periph_sbuf >> xpt_announce_periph_sbuf >> dadone_proberc >> xpt_done_process >> xpt_done_td >> fork_exit >> fork_trampoline >> ``` >> >> and noticed that the last param `cts` of `_scsi_announce_periph(struct >> cam_periph *periph, u_int *speed, u_int *freq, struct ccb_trans_settings >> *cts)` >> is from kernel stack and is not properly initialized, latter I found some >> commits related to this, >> >> 076686fe0703 cam: make sure to clear CCBs allocated on the stack >> ec5325dbca62 cam: make sure to clear even more CCBs allocated on the stack >> 0f206cc91279 cam: add missing zeroing of a stack-allocated CCB. >> 616a676a0535 cam: clear stack-allocated CCB in the target layer >> >> I applied them to stable/13, rebuild and reboot, now the speed of da0 is >> reported correctly. I also tried to patch the pvscsi driver with few lines >> and >> it also works as intended. >> >> ``` >> --- a/sys/dev/vmware/pvscsi/pvscsi.c >> +++ b/sys/dev/vmware/pvscsi/pvscsi.c >> @@ -1444,6 +1444,10 @@ pvscsi_action(struct cam_sim *sim, union ccb *ccb) >> cts->proto_specific.scsi.flags = CTS_SCSI_FLAGS_TAG_ENB; >> cts->proto_specific.scsi.valid = CTS_SCSI_VALID_TQ; >> >> + /* Prefer connection speed over sas port speed */ >> + /* cts->xport_specific.sas.bitrate = 0; */ >> + cts->xport_specific.sas.valid = 0; >> + >> ccb_h->status = CAM_REQ_CMP; >> xpt_done(ccb); >> ``` >> >> Things come clear and I know why this weird speed happens, now it is time >> to decide how to fix it. >> >> Fixing the consumer of cam, aka pvscsi driver, is quite simple and >> promising. I did a quick search it appears other consumers set >> `cts->xport_specific.sas.valid` correctly. It does not convince me as I'm >> quite new to cam subsystem. >> > > Yes. sas.valid is set when the sas.bitrate and other data has been set > correctly. Setting it to 0 like this ensures it's ignored. So if you know > the speed, set sas.bitrate to that speed and sas.valid to 1. > > > That is clear. > > > I'm not sure I answered the question right, but if not, please clarify or > point out what I missed and I'll try again. > > >> Which one do you prefer, MFC commits to stable/13, or do direct fix for >> pvscsi driver to stable/13 ? >> > > [[ Sorry for the delay, I missed this all month ]] > > I generally prefer a MFC, unless the code is no longer in -current. > > > The code live in -current and all supported stable branches. > > Even if there's two different fixes for this logical bug, fixing it in > current, then MFCing that (with the current hash) is fine, even if the > stable/13 changes are completely different. > > > The bug does not exist in current and stable/14 but only in stable/13, due > to Edward's work ( commits those zero stack-allocated CCBs ) were not MFCed > into stable/13 branch. > > For stable/13 I guess it matters a bit less than stable/14 since I'll be > merging to it less, but if it's a commit from -current that doesn't need to > be made to -stable because of the new commit on stable, I tend to include > the MFC hash text. > > > Do you mean the `cherry picked from commit` commit message ? > > > Warner > > > I'm preparing and testing the MFCing. Bless me to not make things messed > up :) > > > And the individual fix for pvscsi is posted to > https://reviews.freebsd.org/D48438 . > > > Landed in -current. Will be MFCed to stable/13 after a few days. > > > > Best regards, > Zhenlei > > > > >