Linux kernel compatability

Jeff Roberson jroberson at jroberson.net
Tue Jan 4 23:10:23 UTC 2011


On Tue, 4 Jan 2011, Julian Elischer wrote:

> On 1/4/11 12:53 PM, Jeff Roberson wrote:
>> On Tue, 4 Jan 2011, Alexander Kabaev wrote:
>> 
>>> On Mon, 3 Jan 2011 19:03:01 -1000 (HST)
>>> Jeff Roberson <jroberson at jroberson.net> wrote:
>>> 
>>>> On Mon, 3 Jan 2011, Alexander Kabaev wrote:
>>>> 
>>>>> On Mon, 3 Jan 2011 10:31:24 -1000 (HST)
>>>>> Jeff Roberson <jroberson at jroberson.net> wrote:
>>>>> 
>>>>>> Hello Folks,
>>>>>> 
>>>>>> Some of you may have seen my infiniband work proceed in svn.  It is
>>>>>> coming to a close soon and I will be integrating it into current.
>>>>>> I have a few patches to the kernel to send for review but I wanted
>>>>>> to bring up the KPI wrapper itself for discussion.
>>>>>> 
>>>>>> The infiniband port has been done by creating a 10,000 line KPI
>>>>>> compatability layer.  With this layer the vast majority of the
>>>>>> driver code runs unmodified.  The exceptions are anything that
>>>>>> interfaces with skbs and most of the code that deals with network
>>>>>> interfaces.
>>>>>> 
>>>>>> Some examples of things supported by the wrapper:
>>>>>> 
>>>>>> atomics, types, bitops, byte order conversion, character devices,
>>>>>> pci devices, dma, non-device files, idr tables, interrupts,
>>>>>> ioremap, hashes, kobjects, radix trees, lists, modules, notifier
>>>>>> blocks, rbtrees, rwlock, rwsem, semaphore, schedule, spinlocks,
>>>>>> kalloc, wait queues, workqueues, timers, etc.
>>>>>> 
>>>>>> Obviously a complete wrapper is impossible and I only implemented
>>>>>> the features that I needed.  The build is accomplished by pointing
>>>>>> the linux compatible code at sys/ofed/include/ which has a
>>>>>> simulated linux kernel include tree.  There are some config(8)
>>>>>> changes to help this along as well.
>>>>>> 
>>>>>> I have seen that some attempt at similar wrappers has been made
>>>>>> elsewhere. I wonder if instead of making each one tailored to
>>>>>> individual components, which mostly seem to be filesystems so far,
>>>>>> should we put this in a central place under compat somewhere?  Is
>>>>>> this project doomed to be tied to a single consumer by the specific
>>>>>> nature of it?
>>>>>> 
>>>>>> Other comments or concerns?
>>>>>> 
>>>>>> Thanks,
>>>>>> Jeff
>>>>> 
>>>>> 
>>>>> This probably will go against popular opinion here, but having 10k
>>>>> linux emulation layer that _almost_ work in the tree will be an
>>>>> unfortunate event and will do more damage to FreeBSD as a platform
>>>>> than good in the long run. I would rather see this code never hit
>>>>> main repository.
>>>> 
>>>> I would argue that the layer works very well for infiniband.  Much
>>>> better than almost.  It is only almost complete in that there is no
>>>> need for me to implement features that we're not using.
>>>> 
>>>> I am interested in hearing your other concerns however.
>>>> 
>>>> Thanks,
>>>> Jeff
>>>> 
>>> 
>> 
>> Alexander, let me first start out by saying I have a great deal of respect 
>> for you and I hear your concerns.  I see that this is a somewhat heated 
>> issue and I can really only address the technical points.  The more 
>> existential questions about FreeBSD will have to be left to others.
>> 
>>> The considerations are simple enough. First, we do not have many IB
>>> users of FreeBSD in the wild and those that we have (Isilon) seem to be
>>> perfectly capable of managing the IB stack out of the tree, without
>>> dumping the thousands of lines of the code into the base. If they had
>>> the stack before, but were not willing/capable to provide adequate care
>>> for it in the past, there is no reason to expect things to change with
>>> second stack, which now will rot in our tree instead of theirs.
>> 
>> They provided adequate care for it to keep their product running on old 
>> versions of FreeBSD.  Unfortunately it is a large stack and there are a 
>> great number of people and organizations working on improving and advancing 
>> it on Linux via OFED and having a private stack does not give you the 
>> benefit of their work.  The motivation for making the wrapper layer was 
>> entirely to keep pace with this development and make it less likely that 
>> what is in the tree will rot.
>> 
>>> 
>>> Second, semi-complete Linux compat layer in kernel will have the
>>> same effect as linuxulator in userland - we do have some vendors still
>>> trying to bother with FreeBSD drivers for their hardware now and we
>>> will have none after we provide the possibility to hack their Linux
>>> code to run somewhat stably on top of Linux compat layer. Due to
>>> intentional fluidity of Linux kAPI, our shims will never quite walk and
>>> quack like their original implementation in Linux kernel and combined
>>> result will always be lees stable than native Linux linux drivers in
>>> Linux kernel.
>> 
>> I have heard this argument about the linuxulator and what we're really 
>> talking about is slipping FreeBSD marketshare.  I don't share the view that 
>> the linuxulator futhered this slip but rather my view is that it allows us 
>> to stay relevant in areas where companies can not justify an independent 
>> FreeBSD effort.  Adobe is a good example of this.
>> 
>> Let's talk nuts and bolts about what this thing does.  In the vast majority 
>> of cases it simply shuffles arguments and function names around where there 
>> is a 1:1 correlation between linux api and FreeBSD API.  Think about things 
>> like atomics, callouts, locks, jiffies vs ticks, etc.  In these areas the 
>> systems are trivially different.  In a very small number of areas where 
>> this wasn't the case I did a direct port and noted it with an #ifdef.
>> 
>> This works specifically in the infiniband case because it is its own middle 
>> layer.  You can't write a scsi driver for linux and use it on BSD with 
>> this.  You can't write a network driver even.  But if you do bring in code 
>> from linux you don't have to worry about changing every kmalloc to malloc 
>> and every printk to printf so diffs can be reduced in trivial cases.  I 
>> thought given your work on XFS for FreeBSD that would make sense to you.
>> 
>> Our options are, to leave FreeBSD users without infiniband, which I can 
>> tell you has cost us more market share as I know of specific cases we have 
>> lost due to it.  To maintain our own stack independently, which no one has 
>> the budget for.  Or to try to integrate with OFED.  Do you see some other 
>> approach?
> As you may know Alexander and I both work for a company that produces a
> "large" driver that runs on everything from windows to FreeBSD and everything
> in between (linux, osx, esx, aix, solaris)..  we have a porting layer 
> (Alexander hates it I know).
> But it's not a freebsd to linux layer, it's a freebsd (or whatever) to 
> 'internal' layer,
> (though it started out being the first).
> Looking at what we have and what you have it seems to me that we could take a
> subset of the basic CS101
> methods that are there and make a linux driver porter's toolkit.

Yes the problem in this case is that OFED controls the code and they 
specifically removed a portability layer.  So the portability layer is the 
Linux APIs they are currently using.  Really in some sense it ends up 
being the same thing.

>
> things like linux style linked lists vs out queue macros can be
> addressed easily, but it's just a pain in the neck to do so..
> also linux run queues and such are different but making a small
> set of toolkits to do so wouldn't be a bad idea.
> some of the more esoteric parts might stay with the ib code however.

After this discussion I'm leaning towards leaving the layer I have in the 
ofed/ directory and leaving it tied to the version of ofed we currently 
have imported.  They actually have a set of scripts to 'backport' their 
stack to different linux versions if we were to standardize on some older 
linux kernel release but I don't think it's worth the effort.

I understand wanting to limit the spread of hybridized linux kernel code. 
It is not my first choice but comparing it with the alternative of not 
having some desired feature I will choose the feature.

Thanks,
Jeff

>
>
>
>> 
>> Thanks,
>> Jeff
>> 
>>> 
>>> 
>>> -- 
>>> Alexander Kabaev
>>> 
>> _______________________________________________
>> freebsd-arch at freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-arch
>> To unsubscribe, send any mail to "freebsd-arch-unsubscribe at freebsd.org"
>> 
>


More information about the freebsd-arch mailing list