pNFS server Plan B

John Nielsen lists at jnielsen.net
Fri Jul 15 20:10:22 UTC 2016


Sorry for the very delayed reply but I've been behind on list emails for a while. I've enjoyed reading the whole thread and have a few comments below. I'm interested in a lot of this stuff as both a consumer and an enterprise sysadmin (but I'm not much of a developer). Hopefully I can provide some of the perspective Rick was looking for.

> On Jun 24, 2016, at 2:21 AM, Willem Jan Withagen <wjw at digiware.nl> wrote:
> 
> On 24-6-2016 09:35, Jordan Hubbard wrote:
>> 
>>> On Jun 22, 2016, at 1:56 AM, Willem Jan Withagen <wjw at digiware.nl>
>>> wrote:
>>> 
>>> In the spare time I have left, I'm trying to get a lot of small
>>> fixes into the ceph tree to get it actually compiling, testing, and
>>> running on FreeBSD. But Ceph is a lot of code, and since a lot of
>>> people are working on it, the number of code changes are big.
>> 
>> Hi Willem,
>> 
>> Yes, I read your paper on the porting effort!

Indeed, thank you again. I've been wanting to test your patches but haven't had time; hopefully that will change soon.

>> I also took a look at porting ceph myself, about 2 years ago, and
>> rapidly concluded that it wasn’t a small / trivial effort by any
>> means and would require a strong justification in terms of ceph’s
>> feature set over glusterfs / moose / OpenAFS / RiakCS / etc.   Since
>> that time, there’s been customer interest but nothing truly “strong”
>> per-se.  
> 
> I've been going at it since last November... And all I go in are about 3
> batches of FreeBSD specific commits. Lots has to do with release windows
> and code slush, like we know on FreeBSD. But then still reviews tend to
> slow and I need people to push to look at them. Whilst in the mean time
> all kinds of thing get pulled and inserted in the tree, that seriously
> are not FreeBSD. Sometimes I see them during commit, and "negotiate"
> better comparability with the author. At other times I missed the whole
> thing, and I need to rebase to get ride of merge conflicts. To find out
> the hard way that somebody has made the whole
> peer communication async. And has thrown kqueue for the BSDs at it. But
> they don't work (yet). So to get my other patches in, if First need to
> fix this. Takes a lot of time .....
> 
> That all said I was in Geneva and a lot of the Ceph people were there
> including Sage Weil. And I go the feeling they appreciated a larger
> community. I think they see what ZFS has done with OpenZFS and see that
> communities get somewhere.

I think too that you're probably wearing them down. :)

> Now on of the things to do to continue, now that I sort of can compile
> and run the first testset, is set up sort of my own Jenkins stuff. So
> that I can at least test drive some of the tree automagically to get
> some testcoverage of the code on FreeBSD. In my mind (and Sage warned me
> that that will be more or less required) it is the only way to actually
> get a serious foot in the door with the Ceph guys.
> 
>> My attraction to ceph remains centered around at least these
>> 4 things:
>> 
>> 1. Distributed Object store with S3-compatible ReST API 
>> 2. Interoperates with Openstack via Swift compatibility 
>> 3. Block storage > (RADOS) - possibly useful for iSCSI and other block storage
>> requirements 
>> 4. Filesystem interface

I will admit I don't have a lot of experience with other things like GlusterFS, but for me Ceph is very compelling for similar reasons:

1. Block storage (RADOS Block Device). This is the top of my list since it makes it easy to run a resilient farm of hypervisors that supports live migration _without_ NFS, iSCSI or anything else. For small deployments (like I have at home), you can run Ceph and the hypervisors on the same hardware and still reboot them one at a time without any storage interruption or having to stop any VMs (just shuffle them around). No NAS/SAN required at all. Another similar use case (which just got easier on Linux at least with the release of rbd-nbd support) is (Docker) containers with persistent data volumes not being tied to any specific host. I would _love_ to see librbd support in Bhyve but obviously a working librbd on FreeBSD is a prerequisite for that.

2. Distributed object store with S3 and Swift compatibility. A lot of different enterprises need this for a lot of different reasons. I know for a fact that some of the pricey commercial offerings use Ceph under the covers. For shops where budget is more important than commercial support this is a great option.

3. Everything else, including but not limited to native object store (RADOS), POSIX filesystem (which as mentioned is now advertised as production-quality with experimental support for multiple metadata servers, support for arbitrary topologies, custom CRUSH maps, erasure coding for efficient replication, ...

I do think Ceph on ZFS would be fantastic (and actually have a Fedora box with a ZFSonLinux-backed OSD). Not sure if BlueStore will be a good thing or not (even ignoring the porting hurdles, which are unfortunate). It would be interesting to compare features and performance of a ZFS OSD and a ZVOL-backed BlueStore OSD.

>> Is there anything we can do to help?  
> 
> I'll get back on that in a separate Email.

With my $work hat on, I'd be interested in a TrueNAS S3 appliance that came with support.

Anyway, glad to see that both pNFS and Ceph on FreeBSD are potentially in the works.

JN



More information about the freebsd-fs mailing list