docs/159897: [patch] improve HAST section of Handbook

Tue Aug 23 00:08:11 UTC 2011

On Mon, 22 Aug 2011, Benjamin Kaduk wrote:

> On Sun, 21 Aug 2011, Warren Block wrote:
>
>> On Sat, 20 Aug 2011, Benjamin Kaduk wrote:
>> 
>>> On Thu, 18 Aug 2011, Warren Block wrote:
>>> 
>>>> -	  <para>File system agnostic, thus allowing to use any file
>>>> +	  <para>File system agnostic, thus allowing use of any file
>>> 
>>> I think "allowing the use" is better here.
>> 
>> "allowing any" might be even better.
>
> I don't think that would be correct usage -- "allowing any file system" to do 
> what?

Allowing any file system versus allowing only file systems made for 
HAST.  Looking at it again, the problem is the word "allowing".  What 
this is really saying is: "File system agnostic, compatible with any 
file system supported by &os;."

>>>> -	<para>In order to fix this situation the administrator has to
>>>> +	<para>The administrator must
>>>> 	  decide which node has more important changes (or merge them
>>>> -	  manually) and let the <acronym>HAST</acronym> perform
>>>> +	  manually) and let <acronym>HAST</acronym> perform
>>>> 	  the full synchronization of the node which has the broken
>>> 
>>> Just "full synchronization", I think.
>> 
>> Changing "of" to "on" ("full synchronization on the node") also helps a 
>> bit.
>
> I think I still prefer "of", but would not object to "on".

The idea is that "synchronization of the node" is ambiguous about 
which node is being changed, where "synchronization on the node", er, 
isn't.

> Can you prepare an updated patch with these changes?

Yes.  These changes and those from your earlier post are in the attached 
patch.  Thanks!
-------------- next part --------------

--- en_US.ISO8859-1/books/handbook/disks/chapter.sgml.orig	2011-05-27 04:22:55.000000000 -0600
+++ en_US.ISO8859-1/books/handbook/disks/chapter.sgml	2011-08-22 17:54:28.000000000 -0600
@@ -4038,7 +4038,7 @@
     <sect2>
       <title>Synopsis</title>
 
-      <para>High-availability is one of the main requirements in serious
+      <para>High availability is one of the main requirements in serious
 	business applications and highly-available storage is a key
 	component in such environments.  Highly Available STorage, or
 	<acronym>HAST<remark role="acronym">Highly Available
@@ -4109,7 +4109,7 @@
 	  drives.</para>
 	</listitem>
 	<listitem>
-	  <para>File system agnostic, thus allowing to use any file
+	  <para>File system agnostic, compatible with any file
 	    system supported by &os;.</para>
 	</listitem>
 	<listitem>
@@ -4152,7 +4152,7 @@
 	total.</para>
       </note>
 
-      <para>Since the <acronym>HAST</acronym> works in
+      <para>Since <acronym>HAST</acronym> works in a
 	primary-secondary configuration, it allows only one of the
 	cluster nodes to be active at any given time.  The
 	<literal>primary</literal> node, also called
@@ -4175,7 +4175,7 @@
       </itemizedlist>
 
       <para><acronym>HAST</acronym> operates synchronously on a block
-	level, which makes it transparent for file systems and
+	level, making it transparent to file systems and
 	applications.  <acronym>HAST</acronym> provides regular GEOM
 	providers in <filename class="directory">/dev/hast/</filename>
 	directory for use by other tools or applications, thus there is
@@ -4252,7 +4252,7 @@
 	For stripped-down systems, make sure this module is available.
 	Alternatively, it is possible to build
 	<literal>GEOM_GATE</literal> support into the kernel
-	statically, by adding the following line to the custom kernel
+	statically, by adding this line to the custom kernel
 	configuration file:</para>
 
       <programlisting>options	GEOM_GATE</programlisting>
@@ -4290,10 +4290,10 @@
 	  class="directory">/dev/hast/</filename>) will be called
 	<filename><replaceable>test</replaceable></filename>.</para>
 
-      <para>The configuration of <acronym>HAST</acronym> is being done
+      <para>Configuration of <acronym>HAST</acronym> is done
 	in the <filename>/etc/hast.conf</filename> file.  This file
 	should be the same on both nodes.  The simplest configuration
-	possible is following:</para>
+	possible is:</para>
 
       <programlisting>resource test {
 	on hasta {
@@ -4317,9 +4317,9 @@
 	  alternatively in the local <acronym>DNS</acronym>.</para>
       </tip>
 
-      <para>Now that the configuration exists on both nodes, it is
-	possible to create the <acronym>HAST</acronym> pool.  Run the
-	following commands on both nodes to place the initial metadata
+      <para>Now that the configuration exists on both nodes,
+	the <acronym>HAST</acronym> pool can be created.  Run these
+	commands on both nodes to place the initial metadata
 	onto the local disk, and start the &man.hastd.8; daemon:</para>
 
       <screen>&prompt.root; <userinput>hastctl create test</userinput>
@@ -4334,52 +4334,52 @@
 	  available.</para>
       </note>
 
-      <para>HAST is not responsible for selecting node's role
-	(<literal>primary</literal> or <literal>secondary</literal>).
-	Node's role has to be configured by an administrator or other
-	software like <application>Heartbeat</application> using the
+      <para>A HAST node's role (<literal>primary</literal> or
+        <literal>secondary</literal>) is selected by an administrator
+        or other
+        software like <application>Heartbeat</application> using the
 	&man.hastctl.8; utility.  Move to the primary node
 	(<literal><replaceable>hasta</replaceable></literal>) and
-	issue the following command:</para>
+	issue this command:</para>
 
       <screen>&prompt.root; <userinput>hastctl role primary test</userinput></screen>
 
-      <para>Similarly, run the following command on the secondary node
+      <para>Similarly, run this command on the secondary node
 	(<literal><replaceable>hastb</replaceable></literal>):</para>
 
       <screen>&prompt.root; <userinput>hastctl role secondary test</userinput></screen>
 
       <caution>
-	<para>It may happen that both of the nodes are not able to
-	  communicate with each other and both are configured as
-	  primary nodes; the consequence of this condition is called
-	  <literal>split-brain</literal>.  In order to troubleshoot
+	<para>When the nodes are unable to
+	  communicate with each other, and both are configured as
+	  primary nodes, the condition is called
+	  <literal>split-brain</literal>.  To troubleshoot
 	  this situation, follow the steps described in <xref
 	  linkend="disks-hast-sb">.</para>
       </caution>
 
-      <para>It is possible to verify the result with the
+      <para>Verify the result with the
 	&man.hastctl.8; utility on each node:</para>
 
       <screen>&prompt.root; <userinput>hastctl status test</userinput></screen>
 
-      <para>The important text is the <literal>status</literal> line
-	from its output and it should say <literal>complete</literal>
+      <para>The important text is the <literal>status</literal> line,
+	which should say <literal>complete</literal>
 	on each of the nodes.  If it says <literal>degraded</literal>,
 	something went wrong.  At this point, the synchronization
 	between the nodes has already started.  The synchronization
-	completes when the <command>hastctl status</command> command
+	completes when <command>hastctl status</command>
 	reports 0 bytes of <literal>dirty</literal> extents.</para>
 
 
-      <para>The last step is to create a filesystem on the
+      <para>The next step is to create a filesystem on the
 	<devicename>/dev/hast/<replaceable>test</replaceable></devicename>
-	GEOM provider and mount it.  This has to be done on the
-	<literal>primary</literal> node (as the
+	GEOM provider and mount it.  This must be done on the
+	<literal>primary</literal> node, as
 	<filename>/dev/hast/<replaceable>test</replaceable></filename>
-	appears only on the <literal>primary</literal> node), and
-	it can take a few minutes depending on the size of the hard
-	drive:</para>
+	appears only on the <literal>primary</literal> node.
+	Creating the filesystem can take a few minutes, depending on the
+	size of the hard drive:</para>
 
       <screen>&prompt.root; <userinput>newfs -U /dev/hast/test</userinput>
 &prompt.root; <userinput>mkdir /hast/test</userinput>
@@ -4387,9 +4387,9 @@
 
       <para>Once the <acronym>HAST</acronym> framework is configured
 	properly, the final step is to make sure that
-	<acronym>HAST</acronym> is started during the system boot time
-	automatically.  The following line should be added to the
-	<filename>/etc/rc.conf</filename> file:</para>
+	<acronym>HAST</acronym> is started automatically during the system
+	boot.  Add this line to
+	<filename>/etc/rc.conf</filename>:</para>
 
       <programlisting>hastd_enable="YES"</programlisting>
 
@@ -4397,26 +4397,25 @@
 	<title>Failover Configuration</title>
 
 	<para>The goal of this example is to build a robust storage
-	  system which is resistant from the failures of any given node.
-	  The key task here is to remedy a scenario when a
-	  <literal>primary</literal> node of the cluster fails.  Should
-	  it happen, the <literal>secondary</literal> node is there to
+	  system which is resistant to the failure of any given node.
+	  The scenario is that a
+	  <literal>primary</literal> node of the cluster fails.  If
+	  this happens, the <literal>secondary</literal> node is there to
 	  take over seamlessly, check and mount the file system, and
 	  continue to work without missing a single bit of data.</para>
 
-	<para>In order to accomplish this task, it will be required to
-	  utilize another feature available under &os; which provides
+	<para>To accomplish this task, another &os; feature provides
 	  for automatic failover on the IP layer —
-	  <acronym>CARP</acronym>.  <acronym>CARP</acronym> stands for
-	  Common Address Redundancy Protocol and allows multiple hosts
+	  <acronym>CARP</acronym>.  <acronym>CARP</acronym> (Common Address
+	  Redundancy Protocol) allows multiple hosts
 	  on the same network segment to share an IP address.  Set up
  	  <acronym>CARP</acronym> on both nodes of the cluster according
 	  to the documentation available in <xref linkend="carp">.
-	  After completing this task, each node should have its own
+	  After setup, each node will have its own
 	  <devicename>carp0</devicename> interface with a shared IP
 	  address <replaceable>172.16.0.254</replaceable>.
-	  Obviously, the primary <acronym>HAST</acronym> node of the
-	  cluster has to be the master <acronym>CARP</acronym>
+	  The primary <acronym>HAST</acronym> node of the
+	  cluster must be the master <acronym>CARP</acronym>
 	  node.</para>
 
 	<para>The <acronym>HAST</acronym> pool created in the previous
@@ -4430,17 +4429,17 @@
 
 	<para>In the event of <acronym>CARP</acronym> interfaces going
 	  up or down, the &os; operating system generates a &man.devd.8;
-	  event, which makes it possible to watch for the state changes
+	  event, making it possible to watch for the state changes
 	  on the <acronym>CARP</acronym> interfaces.  A state change on
 	  the <acronym>CARP</acronym> interface is an indication that
-	  one of the nodes failed or came back online.  In such a case,
-	  it is possible to run a particular script which will
-	  automatically handle the failover.</para>
-
-	<para>To be able to catch the state changes on the
-	  <acronym>CARP</acronym> interfaces, the following
-	  configuration has to be added to the
-	  <filename>/etc/devd.conf</filename> file on each node:</para>
+	  one of the nodes failed or came back online.  These state change
+	  events make it possible to run a script which will
+	  automatically handle the HAST failover.</para>
+
+	<para>To be able to catch state changes on the
+	  <acronym>CARP</acronym> interfaces, add this
+	  configuration to
+	  <filename>/etc/devd.conf</filename> on each node:</para>
 
 	<programlisting>notify 30 {
 	match "system" "IFNET";
@@ -4456,12 +4455,12 @@
 	action "/usr/local/sbin/carp-hast-switch slave";
 };</programlisting>
 
-	<para>To put the new configuration into effect, run the
-	  following command on both nodes:</para>
+	<para>Restart &man.devd.8; on both nodes to put the new configuration
+	  into effect:</para>
 
 	<screen>&prompt.root; <userinput>/etc/rc.d/devd restart</userinput></screen>
 
-	<para>In the event that the <devicename>carp0</devicename>
+	<para>When the <devicename>carp0</devicename>
 	  interface goes up or down (i.e. the interface state changes),
 	  the system generates a notification, allowing the &man.devd.8;
 	  subsystem to run an arbitrary script, in this case
@@ -4471,7 +4470,7 @@
 	  &man.devd.8; configuration, please consult the
 	  &man.devd.conf.5; manual page.</para>
 
-	<para>An example of such a script could be following:</para>
+	<para>An example of such a script could be:</para>
 
 <programlisting>#!/bin/sh
 
@@ -4557,13 +4556,13 @@
 	;;
 esac</programlisting>
 
-	<para>In a nutshell, the script does the following when a node
+	<para>In a nutshell, the script takes these actions when a node
 	  becomes <literal>master</literal> /
 	  <literal>primary</literal>:</para>
 
 	<itemizedlist>
 	  <listitem>
-	    <para>Promotes the <acronym>HAST</acronym> pools as
+	    <para>Promotes the <acronym>HAST</acronym> pools to
 	      primary on a given node.</para>
 	  </listitem>
 	  <listitem>
@@ -4571,7 +4570,7 @@
 	      <acronym>HAST</acronym> pool.</para>
 	  </listitem>
 	  <listitem>
-	    <para>Mounts the pools at appropriate place.</para>
+	    <para>Mounts the pools at an appropriate place.</para>
 	  </listitem>
 	</itemizedlist>
 
@@ -4590,15 +4589,15 @@
 
 	<caution>
 	  <para>Keep in mind that this is just an example script which
-	    should serve as a proof of concept solution.  It does not
+	    should serve as a proof of concept.  It does not
 	    handle all the possible scenarios and can be extended or
 	    altered in any way, for example it can start/stop required
-	    services etc.</para>
+	    services, etc.</para>
 	</caution>
 
 	<tip>
-	  <para>For the purpose of this example we used a standard UFS
-	    file system.  In order to reduce the time needed for
+	  <para>For this example, we used a standard UFS
+	    file system.  To reduce the time needed for
 	    recovery, a journal-enabled UFS or ZFS file system can
 	    be used.</para>
 	</tip>
@@ -4615,41 +4614,40 @@
       <sect3>
 	<title>General Troubleshooting Tips</title>
 
-	<para><acronym>HAST</acronym> should be generally working
-	  without any issues, however as with any other software
+	<para><acronym>HAST</acronym> should generally work
+	  without issues.  However, as with any other software
 	  product, there may be times when it does not work as
 	  supposed.  The sources of the problems may be different, but
 	  the rule of thumb is to ensure that the time is synchronized
 	  between all nodes of the cluster.</para>
 
-	<para>The debugging level of the &man.hastd.8; should be
-	  increased when troubleshooting <acronym>HAST</acronym>
-	  problems.  This can be accomplished by starting the
+	<para>When troubleshooting <acronym>HAST</acronym> problems,
+	  the debugging level of &man.hastd.8; should be increased
+	  by starting the
 	  &man.hastd.8; daemon with the <literal>-d</literal>
-	  argument.  Note, that this argument may be specified
+	  argument.  Note that this argument may be specified
 	  multiple times to further increase the debugging level.  A
-	  lot of useful information may be obtained this way.  It
-	  should be also considered to use <literal>-F</literal>
-	  argument, which will start the &man.hastd.8; daemon in
+	  lot of useful information may be obtained this way.  Consider
+	  also using the <literal>-F</literal>
+	  argument, which starts the &man.hastd.8; daemon in the
 	  foreground.</para>
      </sect3>
 
       <sect3 id="disks-hast-sb">
 	<title>Recovering from the Split-brain Condition</title>
 
-	<para>The consequence of a situation when both nodes of the
-	  cluster are not able to communicate with each other and both
-	  are configured as primary nodes is called
-	  <literal>split-brain</literal>.  This is a dangerous
+	<para><literal>Split-brain</literal> is when the nodes of the
+	  cluster are unable to communicate with each other, and both
+	  are configured as primary.  This is a dangerous
 	  condition because it allows both nodes to make incompatible
-	  changes to the data.  This situation has to be handled by
-	  the system administrator manually.</para>
+	  changes to the data.  This problem must be corrected
+	  manually by the system administrator.</para>
 
-	<para>In order to fix this situation the administrator has to
+	<para>The administrator must
 	  decide which node has more important changes (or merge them
-	  manually) and let the <acronym>HAST</acronym> perform
-	  the full synchronization of the node which has the broken
-	  data.  To do this, issue the following commands on the node
+	  manually) and let <acronym>HAST</acronym> perform
+	  full synchronization of the node which has the broken
+	  data.  To do this, issue these commands on the node
 	  which needs to be resynchronized:</para>
 
         <screen>&prompt.root; <userinput>hastctl role init <resource></userinput>