couple of nvidia-driver issues
Aaron Plattner
aplattner at nvidia.com
Thu Dec 7 16:05:49 UTC 2017
On 12/07/2017 07:35 AM, Alan Somers wrote:
> On Thu, Dec 7, 2017 at 2:33 AM, Andriy Gapon <avg at freebsd.org
> <mailto:avg at freebsd.org>> wrote:
>
>
> [cc-ing current@ to raise more awareness]
>
> On 05/12/2017 16:03, Alexey Dokuchaev wrote:
> > On Fri, Nov 24, 2017 at 11:31:51AM +0200, Andriy Gapon wrote:
> >>
> >> I have reported a couple of nvidia-driver issues in the FreeBSD
> section
> >> of the nVidia developer forum, but no replies so far.
> >>
> >> Well, the first issue is not with the driver, but with a utility
> that
> >> comes with it, nvidia-smi:
> >>
> https://devtalk.nvidia.com/default/topic/1026589/freebsd/nvidia-smi-query-gpu-spins-forever-on-freebsd-head-amd64-/
> <https://devtalk.nvidia.com/default/topic/1026589/freebsd/nvidia-smi-query-gpu-spins-forever-on-freebsd-head-amd64-/>
> >> I wonder if I am the only one affected or if I see the problem
> because
> >> I am on head or something else.
> >> I am pretty sure that the problem is caused by a programming bug
> related
> >> to strtok_r.
> >
> > I'll try to reproduce it and report back.
>
> I've done some work with a debugger and it seems that there is code
> that does
> something like this:
>
> char *last = NULL;
>
> while (1) {
> if (last == NULL)
> p = strtok_r(str, sep, &last);
> else
> p = strtok_r(NULL, sep, &last);
> if (p == NULL)
> break;
> ...
> }
>
> The problem is that when 'p' points to the last token, 'last' is
> NULL (in
> FreeBSD implementation of strtok_r). That means that when we go to
> the next
> iteration the parsing starts all over again leading to the endless loop.
> The code is incorrect from the standards point of view, because the
> value of
> 'last' is completely opaque and should not be used for anything else
> but passing
> it back to strtok_r.
>
> I used gdb -w to change the logic to:
>
> char *last = 1;
>
> While (1) {
> if (last == 1)
> p = strtok_r(str, sep, &last);
> else
> p = strtok_r(NULL, sep, &last);
> ...
> }
>
> Where 1 is used as an "impossible" pointer value which is neither
> NULL nor a
> valid pointer that can be set by strtok_r. It's not ideal, but
> binary code
> editing is not as easy as that of source code.
>
> The binary patch is here:
> https://people.freebsd.org/~avg/nvidia-smi.bsdiff
> <https://people.freebsd.org/~avg/nvidia-smi.bsdiff>
>
> >> The second issue is with the FreeBSD support for the kernel driver:
> >>
> https://devtalk.nvidia.com/default/topic/1026645/freebsd/panic-related-to-nvkms_timers-lock-sx-lock-/
> <https://devtalk.nvidia.com/default/topic/1026645/freebsd/panic-related-to-nvkms_timers-lock-sx-lock-/>
> >> I would like to get some feedback on my analysis.
> >> I am testing this patch right now:
> >>
> https://people.freebsd.org/~avg/extra-patch-src_nvidia-modeset_nvidia-modeset-freebsd.c
> <https://people.freebsd.org/~avg/extra-patch-src_nvidia-modeset_nvidia-modeset-freebsd.c>
> >
> > Unfortunately, I'm not an expert on kernel locking primitives to
> give you
> > a proper review, let's see what others have to say.
>
> It's been a while since I posted the patch and there are no comments
> yet.
> I can only add that I am running an INVARIANTS and WITNESS enabled
> kernel all
> the time and before the patch I was getting kernel panics every now
> and then.
> Since I started using the patch I haven't had a single nvidia panic yet.
>
> >> Also, what's the best place or who are the best people with whom to
> >> discuss such issues?
> >
> > Yes, this is a problem now: since Christian Zander had left
> nVidia, he
> > could not tell me who'd be their next liaison to talk to from FreeBSD
> > community. :-(
>
> Oh, I didn't know about Christian's departure.
> So, we are not in a very good position now.
>
>
> How about Aaron Plattner (CC'd). Aaron, are you still working on
> FreeBSD driver issues?
Thanks for the heads up, Alan. I filed bug 2032249 to track this.
-- Aaron
More information about the freebsd-current
mailing list