PERFORCE change 122803 for review
Ulf Lilleengen
lulf at FreeBSD.org
Tue Jul 3 21:23:07 UTC 2007
http://perforce.freebsd.org/chv.cgi?CH=122803
Change 122803 by lulf at lulf_carrot on 2007/07/03 21:22:13
- Clean up code and remove debug printfs.
- Add initial testplan draft. I also have some testscripts that will be
comitted when they're ready.
- Add my scratchpad TODO.
Affected files ...
.. //depot/projects/soc2007/lulf/TESTPLAN#1 add
.. //depot/projects/soc2007/lulf/TODO#3 edit
.. //depot/projects/soc2007/lulf/gvinum_fixup/sbin/gvinum/gvinum.c#11 edit
.. //depot/projects/soc2007/lulf/gvinum_fixup/sys/geom/vinum/geom_vinum_create.c#2 edit
Differences ...
==== //depot/projects/soc2007/lulf/TODO#3 (text+ko) ====
@@ -8,3 +8,148 @@
6. Make sure other parts is function correctly, and implement what perhaps is
not implemented yet.
7. Run some tests to make sure the new gvinum code-base is good enough.
+
+
+### MY internal scratchpad. This might be outdated, but I'm trying to mark the
+things I've finished with DONE.
+
+1. Find out if we can support adding subdisks to a raid5 plex.
+
+2. Need a way to tell if we're adding a subdisk to a degraded raid5 plex or not.
+This means we only add subdisks to a raid5 plex if the original size is larger
+than the size during attachment. One way is to have a original size of the plex.
+Another way is to allow a subdisk to attach itself to a plex _if_ the plex is
+degraded and all other subdisks are up. The best would probably be to have a
+pled->origsize field to keep the original size of the plex intact. This could
+expand if we later want to expand a raid5 array. DONE.
+
+3. Must make sure that we don't use a subdisks plex if it's detached (same with
+plex and volume). NO
+
+4. Make sure gv_plex_size uses the smallest stripe... makes it possible to have
+unequal sized subdisks, but only parts of it is used... ?NO
+
+5. To support mounted raid5 rebuild... we need to check if both rebuild and
+normal request BIO's can work side by side... As long as it is in the degraded
+state perhaps?
+
+6. To get syncing working... we need a way to read BIOs from one plex and write
+them to another plex. One way of doing this is that we issue a read-bio for one
+plex, and when that bio returns, it will issue a BIO with the data to be written
+to a destination plex. Either, issue all reads at once, or issue one read, and
+let the 'done'-routines handle issueing of the next bios.
+In both ways, we need a way to tell the destination of a write, or where the
+original data came from. DONE
+
+Todo:
+* Need to check if it works to use bio_caller1. Doesn't, its used, but
+ bio_caller2 works... for now
+* Try to make send_parity work for sync as well. NO, created another one.
+* Restructure gv_sync_completed and split it a bit up to create a better
+ abstraction. DONE
+* Test it. DONE
+* Make sure we don't have write between syncing.
+ Krav:
+ - Skal kunne lese av mirror mens mountet.
+ - Skal kunne skrive til mirror mens mountet.
+ - MÃ¥ delaye writes.
+ - Reads som kommer når vi har noe i queuen må også delayes.
+
+ To achieve this, we can - for each BIO that comes down to a volume -
+ check that if one plex is syncing, we put the BIO in a wait-queue that
+ is only processed when NO plexes are in GV_PLEX_SYNCING state.
+
+7. Investigate problem with setting plex state to up even with forcing. Fixed.
+8. Check why panic when using mirror.
+ It's because a BIO is shared between all plexes for a write. Se my
+ comment in the hack in gv_bio_done for more explanation.
+9. Check weird states showing up when doing rebuild/sync etc and prevent from
+config being listed up all the time. Config not listed anymore.
+10. Make sure we can use bio_caller2 in sync.
+11. Change event structure to include non-pointer parameters for flags etc.
+12. Since GV_SD_INITIALIZING is used by gv_sync and gv_rebuild, we should
+perhaps update s->initialized, or create a new state called syncing or
+something. Because gv_list gives a weird output when s->initialized is not used.
+Perhaps have another state.
+
+13. Don't crash when we try rebuild on a up plex with only down/stale subdisks.
+Perhaps it's okay since we're forcing the state anyway and the user literally
+asks for it.
+
+14. When a raid5 plex is created, the subdisks are stale and the plex is down,
+but it shouldn't. When a mirror is created, the second one is stale, but it
+shouldn't.
+
+15. Implement initialization. Done
+
+16. Initially, parity is wrong... so must remember to rebuild, but should this
+be necessary? No, init is used
+
+
+* What may this problem be? Me using bio_caller1? No, we don't have enough bits
+* for our flags :) fixed.
+* Got a recursing lock issue. Fixed.
+
+* Syncing problem. Lukas proposed:
+>
+> Well, that's even easier than the RAID5 rebuild problem: reads are only
+> served from the "good" plex, writes go through to all plexes, as the new
+> data would be written anyway (after a read from the "good" plex). Some
+> locking would be advisable, though.
+
+The flaw with this is that if a write is done on a plex, it can be overwritten
+by a sync write, so it will have to wait for the sync to be done, and then
+overwrite the sync. We could perhaps examine if the offset is greater than
+current synced offset, and then we again have to make sure all plexes are beyond
+that offset.
+
+Suggestion for handling mounted rebuilds:
+Imagine this situation: you have rebuilt the parity up to offset X of
+the RAID5 plex. I/O that requests something from below X can go through
+normally, I/O that requests something beyond X needs to run like in
+degraded mode. The trick is to get the locking right, so that the
+rebuild process and the normal I/O don't interfere when they are around
+the some offset.
+
+* It's already checked that if a plex is syncing and synced > the offset +
+ length, then the request can be served.
+* Otherwise, the request will be served like a degraded or no-parity. The
+ problem is that if NOPARITY or REBUILD is detected (begause s->state is not
+ UP), data will be written without parity, and we'd have to rebuild parity
+ or write the data afterwards.
+
+15. Must test and debug! Check why hanging while rebuilding and copying at the
+same time (initiating copy first)
+
+* Bug #1: When trying to newfs on a degraded volume where the _first_ subdisk is
+ down, i get cg 0: bad magic number. Hmmm....
+ Actually, it's a bug in RAID5 degraded write/read code. We got a data
+ corruption bug. Fixed, we forgot to write correct parity data.
+
+
+* Check if we need to check for REBUILD flag when syncing.
+
+ISSUES:
+* We don't but delayed requests due to rebuild on the global queue, since
+ A) They shouldn't be run by all plexes.
+ B) They were issued before new I/Os, so they should be sendt down before
+ them.
+* Delayed requests due to REBUILD currently use it's own BIO queue. Perhaps we
+ should rework the way plex queues are used.
+
+18. Rewrite rename, move. DONE.
+
+
+19. Concat/mirror/stripe
+ * Right now, we ask the kernel for each drive. Another way could be to
+ tell the kernel to send the whole config, and create parse routines
+ for it.
+
+ * Fix padding in gv_volume for userland. DONE
+
+20. Make sync/rebuild requests also get delayed due to rebuild requests?
+
+20. When to trondheim. Write papers for status report.
+
+21. Updateman-page
+
==== //depot/projects/soc2007/lulf/gvinum_fixup/sbin/gvinum/gvinum.c#11 (text+ko) ====
@@ -375,7 +375,6 @@
gvinum_concat(int argc, char **argv)
{
- printf("Preparing\n");
if (argc < 2) {
warnx("usage:\tconcat [-fv] [-n name] drives\n");
return;
@@ -396,7 +395,6 @@
plexes = subdisks = volumes = 0;
drives = 1;
- printf("Trying to create drive on %s\n", device);
/* Strip away eventual /dev/ in front. */
if (strncmp(device, "/dev/", 5) == 0)
@@ -406,7 +404,6 @@
if (drivename == NULL)
return (NULL);
- fprintf(stderr, "Drivename %s is okay\n", drivename);
req = gctl_get_handle();
gctl_ro_param(req, "class", -1, "VINUM");
gctl_ro_param(req, "verb", -1, "create");
@@ -422,12 +419,10 @@
gctl_ro_param(req, "volumes", sizeof(int), &volumes);
gctl_ro_param(req, "plexes", sizeof(int), &plexes);
gctl_ro_param(req, "subdisks", sizeof(int), &subdisks);
- fprintf(stderr, "Creating drive request sent to kernel...\n");
errstr = gctl_issue(req);
if (errstr != NULL)
warnx("error creating drive: %s", errstr);
gctl_free(req);
- printf("Done creating drive %s on %s\n", drivename, device);
return (drivename);
}
@@ -474,11 +469,8 @@
}
/* Find a free volume name. */
- if (volname == NULL) {
- fprintf(stderr, "Finding name\n");
+ if (volname == NULL)
volname = find_name("gvinumvolume", GV_TYPE_VOL, GV_MAXVOLNAME);
- fprintf(stderr, "Found name %s\n", volname);
- }
/* Then we send a request to actually create the volumes. */
gctl_ro_param(req, "verb", -1, verb);
@@ -523,7 +515,6 @@
comment[0] = '\0';
/* Find a name. Fetch out configuration first. */
- printf("Fetching configuration\n");
req = gctl_get_handle();
gctl_ro_param(req, "class", -1, "VINUM");
gctl_ro_param(req, "verb", -1, "getconfig");
@@ -536,7 +527,6 @@
}
gctl_free(req);
- printf("Got configuration:\n");
printf(buf);
begin = 0;
len = strlen(buf);
@@ -553,7 +543,6 @@
if (buf[i] == '\n' || buf[i] == '\0') {
ptr = buf + begin;
strlcpy(line, ptr, (i - begin) + 1);
- printf("Processing line: .%s.\n", line);
begin = i + 1;
switch (type) {
case GV_TYPE_DRIVE:
@@ -572,9 +561,7 @@
}
if (name == NULL)
continue;
- printf("Found a name: .%s.\n", name);
if (!strcmp(sname, name)) {
- printf("Conflicts, try next\n");
conflict = 1;
/* XXX: Could quit the loop earlier. */
}
@@ -928,45 +915,6 @@
if (errstr)
warnx("%s\n", errstr);
gctl_free(req);
-/* do {
- rv = 0;
- req = gctl_get_handle();
- gctl_ro_param(req, "class", -1, "VINUM");
- gctl_ro_param(req, "verb", -1, "parityop");
- gctl_ro_param(req, "flags", sizeof(int), &flags);
- gctl_ro_param(req, "rebuild", sizeof(int), &rebuild);
- gctl_rw_param(req, "rv", sizeof(int), &rv);
- gctl_rw_param(req, "offset", sizeof(off_t), &offset);
- gctl_ro_param(req, "plex", -1, argv[0]);
- errstr = gctl_issue(req);
- if (errstr) {
- warnx("%s\n", errstr);
- gctl_free(req);
- break;
- }
- gctl_free(req);
- if (flags & GV_FLAG_V) {
- printf("\r%s at %s ... ", msg,
- gv_roughlength(offset, 1));
- }
- if (rv == 1) {
- printf("Parity incorrect at offset 0x%jx\n",
- (intmax_t)offset);
- if (!rebuild)
- break;
- }
- fflush(stdout);
-
- Clear the -f flag.
- flags &= ~GV_FLAG_F;
- } while (rv >= 0);*/
-
-/* if ((rv == 2) && (flags & GV_FLAG_V)) {
- if (rebuild)
- printf("Rebuilt parity on %s\n", argv[0]);
- else
- printf("%s has correct parity\n", argv[0]);
- }*/
}
void
==== //depot/projects/soc2007/lulf/gvinum_fixup/sys/geom/vinum/geom_vinum_create.c#2 (text+ko) ====
@@ -54,18 +54,15 @@
sc = gp->softc;
dcount = 0;
- printf("Starting to create concat device\n");
vol = gctl_get_param(req, "name", NULL);
if (vol == NULL) {
gctl_error(req, "volume's not given");
return;
}
- printf("Got first parameter name: %s\n", vol);
flags = gctl_get_paraml(req, "flags", sizeof(*flags));
drives = gctl_get_paraml(req, "drives", sizeof(*drives));
- printf("Got both flags and drives\n");
if (drives == NULL) {
gctl_error(req, "drives not given");
return;
@@ -85,18 +82,11 @@
p->stripesize = 0;
gv_post_event(sc, GV_EVENT_CREATE_PLEX, p, NULL, 0, 0);
- /* XXX: We'll might have a problem making sure all of these things exists
- * when we come into the event loop. */
-
- printf("We have created the volume and plex. Time to create subdisks\n");
- /* Drives are first (only right now) priority */
+ /* Drives are first (right now) priority */
for (dcount = 0; dcount < *drives; dcount++) {
snprintf(buf, sizeof(buf), "drive%d", dcount);
drive = gctl_get_param(req, buf, NULL);
- printf("We're looking for drive %s\n", drive);
d = gv_find_drive(sc, drive);
- printf("Creating subdisk for drive %s on device %s\n", d->name,
- d->device);
if (d == NULL) {
gctl_error(req, "No such drive '%s'", drive);
continue;
@@ -110,6 +100,7 @@
s->size = -1;
gv_post_event(sc, GV_EVENT_CREATE_SD, s, NULL, 0, 0);
}
+ gv_post_event(sc, GV_EVENT_SAVE_CONFIG, sc, NULL, 0, 0);
}
/*
@@ -130,26 +121,21 @@
dcount = 0;
scount = 0;
pcount = 0;
- printf("Starting to create mirror device\n");
vol = gctl_get_param(req, "name", NULL);
if (vol == NULL) {
gctl_error(req, "volume's not given");
return;
}
- printf("Got first parameter name: %s\n", vol);
flags = gctl_get_paraml(req, "flags", sizeof(*flags));
drives = gctl_get_paraml(req, "drives", sizeof(*drives));
- printf("Got both flags and drives\n");
- /* XXX: Remove when we have support for drivegroups. */
if (drives == NULL) {
gctl_error(req, "drives not given");
return;
}
/* We must have an even number of drives. */
- /* XXX: We need supported for striped mirror as well. */
if (*drives % 2 != 0) {
gctl_error(req, "must have an even number of drives");
return;
@@ -186,10 +172,7 @@
for (dcount = pcount; dcount < *drives; dcount += 2) {
snprintf(buf, sizeof(buf), "drive%d", dcount);
drive = gctl_get_param(req, buf, NULL);
- printf("We're looking for drive %s\n", drive);
d = gv_find_drive(sc, drive);
- printf("Creating subdisk for drive %s on device %s\n",
- d->name, d->device);
if (d == NULL) {
gctl_error(req, "No such drive '%s'", drive);
/* XXX: Should we fail instead? */
@@ -208,6 +191,7 @@
scount++;
}
}
+ gv_post_event(sc, GV_EVENT_SAVE_CONFIG, sc, NULL, 0, 0);
}
/*
@@ -227,7 +211,6 @@
sc = gp->softc;
dcount = 0;
pcount = 0;
- printf("Starting to create striped device\n");
vol = gctl_get_param(req, "name", NULL);
if (vol == NULL) {
gctl_error(req, "volume's not given");
@@ -236,8 +219,6 @@
flags = gctl_get_paraml(req, "flags", sizeof(*flags));
drives = gctl_get_paraml(req, "drives", sizeof(*drives));
- printf("Got both flags and drives\n");
- /* XXX: Remove when we have support for drivegroups. */
if (drives == NULL) {
gctl_error(req, "drives not given");
return;
@@ -268,10 +249,7 @@
for (dcount = 0; dcount < *drives; dcount++) {
snprintf(buf, sizeof(buf), "drive%d", dcount);
drive = gctl_get_param(req, buf, NULL);
- printf("We're looking for drive %s\n", drive);
d = gv_find_drive(sc, drive);
- printf("Creating subdisk for drive %s on device %s\n", d->name,
- d->device);
if (d == NULL) {
gctl_error(req, "No such drive '%s'", drive);
continue;
@@ -285,4 +263,5 @@
s->size = -1;
gv_post_event(sc, GV_EVENT_CREATE_SD, s, NULL, 0, 0);
}
+ gv_post_event(sc, GV_EVENT_SAVE_CONFIG, sc, NULL, 0, 0);
}
More information about the p4-projects
mailing list