Netflix's New Peering Appliance Uses FreeBSD

Thu Jun 7 14:27:29 UTC 2012

On Jun 7, 2012, at 3:09 AM, Daniel Kalchev wrote:

> 
> 
> On 06.06.12 03:16, Scott Long wrote:
> 
> [...]
>> Each disk has its own UFS+J filesystem, except for
>> the SSDs that are mirrored together with gmirror.  The SSDs hold the OS image
>> and cache some of the busiest content.  The other disks hold nothing but the
>> audio and video files for our content streams.
> 
> Could you please explain the rationale of using UFS+J for this large storage. Your published documentation states that you have reasonable redundancy in case of multiple disk failure and I wonder how you handle this with "plain" UFS. Things like avoiding hangs and panics when an disk is going to die.

Redundancy happens by allowing the streaming clients to choose multiple other sources for their stream, and buffer enough of the stream to make a switchover appear seamless.  That other source might be a peer node on the same network, or might be a node that is upstream or on a different network.  The point of the caches is to hold as much content as possible, and we've found that it's more effective to maximize capacity but allow drives to fail in place than to significantly reduce capacity with hardware or software RAID.  When a disk starts having problems that affect its ability to deliver data on time, any clients affected by it simply switch to a different source.  When the disk does finally die, it is removed from the available pool and content is reshuffled on the other drives during the next daily content update.  Once enough disks fail that the cache is no longer effective, it gets replaced.

Scott