damage to pkg's sqlite data base

Scott Bennett bennett at sdf.org
Wed May 13 21:53:40 UTC 2015


     I'm short on time right now, so I'll defer my replies to Andrew Clark
and Chris H until late tonight after I (hope to) have had a chance to try
out Andrew's suggested procedure.  I do have sqlite installed, though (see
far below).

Baptiste Daroussin <bapt at freebsd.org> wrote:
> On Wed, May 13, 2015 at 06:48:12AM -0700, Chris H wrote:
> > On Wed, 13 May 2015 02:20:55 -0500 Scott Bennett <bennett at sdf.org> wrote
> > 
> > > "Chris H" <bsd-lists at bsdforge.com> wrote:
> > > > On Wed, 13 May 2015 00:12:51 -0500 Scott Bennett <bennett at sdf.org> wrote
> > > > > "Chris H" <bsd-lists at bsdforge.com> wrote:
> > > > > > On Tue, 12 May 2015 01:17:46 -0500 Scott Bennett <bennett at sdf.org>
> > > > > wrote >
> > > > > > > For nearly two weeks I've been stymied by an apparently damaged
> > > > > record > > in the sqlite data base used by pkg(8) and pkg-static(8). 
> > > > > Unfortunately, > > it is a record for a port that is depended upon rather
> > > > > heavily, lang/gcc. > > lang/gcc compiled and linked just fine, but any
> > > > > attempt to install the > > result ends up like this.
> > > > > > > 
> > > > > > > ===>  Checking if gcc already installed
> > > > > > > ===>   Registering installation for gcc-4.8.4_3
> > > > > > > Installing gcc-4.8.4_3...
> > > > > > > pkg-static: sqlite error while executing iterator in file
> > > > > > > pkgdb_iterator.c:931: database disk image is malformed pkg-static:
> > > > > sqlite > > error while executing INSERT OR REPLACE INTO files (path,
> > > > > sha256, > > package_id) VALUES (?1, ?2, ?3) in file pkgdb.c:1722:
> > > > > database disk image > > is malformed *** Error code 70
> > > > > > > 
> > > > > > > Stop.
> > > > > > > make: stopped in /usr/ports/lang/gcc
> > > > > > > 
> > > > > > >      I'm not familiar with sqlite commands, nor with the structure of
> > > > > the > > pkg data base, nor any of several other relevant topics.  But I
> > > > > do have 
> > >
> > >      If someone knows sqlite well enough to look at the error messages above
> > > and then give me directions to delete the offending records, then perhaps the
> > > next attempt to install lang/gcc might actually work.  Anyone?
> > > 
> > > > > > > questions:
> > > > > > > 
> > > > > > >     1) how might the damage have happened?
> > > > > > > 
> > > > > > >     2) how might such an event be avoided/prevented in the future?
> > > > > > > 
> > > > > > >     3) most urgently, how can I *fix* it?
> > > > > > You should get a backup made on the nightly cron(8) (periodic(8)).
> > > > > > It'll be located in /var/backups
> > > > > > Off the top of my head I can't remember which of the 2, is which.
> > > > > > But a simple size comparison should tell the tale. :)
> > > > > 
> > > > >      The problem began much farther back than two days ago, but I should
> > > > > be able to restore it from two weeks or more in the past and then find
> > > > > the most recent version that is not corrupted.
> > > > > 
> > > > > > Simply rename your (now) corrupt db, and copy the backup over.
> > > > > 
> > > > >      However, if I do that, then what happens to all the ports that have
> > > > > been updated or added since that version of the data base was backed up?
> > > > > I have run "portmaster -a" (with some additional options) quite a few
> > > > > times since the lang/gcc problem first appeared, so an old local.sqlite
> > > > > will no longer accurately reflect what is currently installed.
> > > > > >
> > > > > > I ran into this problem as well, and this was how I was finally
> > > > > > able to overcome it.
> > > > > >
> > > > > > Hope this helps!
> > > > > >
> > > > > > --Chris
> > > > > >
> > > > > > > 
> > > > > > >     4) I was unable to find any instructions for recreating a pkg
> > > > > data > >     base if the data base gets damaged/destroyed.  Is there a
> > > > > way to > >     do that that I missed?
> > > > > 
> > > > >      There must be a way to do this, right?  I mean, really, it's pretty
> > > > > fundamental that no new data base be put into production without a way to
> > > > > rebuild it.  The FreeBSD developers haven't really broken so ancient and
> > > > > basic a principle, have they?  So what's the trick?  What is the method
> > > > > to rebuild /var/db/pkg/local.sqlite from scratch based upon the currently
> > > > > installed ports/packages?
> > > > I whined about it the first time my DB blew up. It's become
> > > > corrupted several times since on different boxes/versions. *but*
> > > > after the first time, I made it a habit of making a copy of it *before*
> > > > embarking on an upgrade, or install of any ports. Seems it's the only
> > > > way to save yourself. pkg(8), to the best of my knowledge doesn't
> > > > perform a backup prior to any of this. So I *do*.
> > > > Annoying, but it beats attempting to recover from a failed port
> > > > upgrade/install.
> > > 
> > >      Chris, thanks for your responses, depressing though they be.
> > >      I have been using portmaster for ages to update my installed ports
> > > and usually to install them in the first place.  Does your suggestion
> > > mean abandoning portmaster?
> > No. I'm an avid user of portmaster(8). In fact I intend to be it's
> > maintainer. :)
> > >  If it is necessary to back local.sqlite up
> > > before each installation, that would seem to imply that portmaster would
> > > only be useful in getting a list of ports to be updated, after which each
> > > port would have to be updated individually, so that local.sqlite could be
> > > backed up before each one.  I have over 1900 ports installed, so that would
> > > mean just never updating my ports at all, except by rebuilding all of them
> > > upon upgrading to a new major release of FreeBSD.
> > I
> > >      Bapt, oh, Bapt, whatever were you folks thinking?  Did you really
> > > leave us no practical recovery method?
> > In Bapt's humble defense. It's difficult, to impossible for him to fix
> > a problem w/o decent clues. So I wouldn't go pointing any fingers, just
> > yet. :) In other words; w/o a cause, what should he fix? :)
> > What recovery would you suggest? Truth is, if you brushed up on sqlite3
> > you could create a fresh DB, using output from portmaster(8). Troublesome,
> > perhaps. But at least an alternative.

     Chris?  You already responded to my question 4) above, which answers
your question.  Why are you now asking this?  Any data base intended for
production use MUST be accompanied by a method of recreating it before
the data base goes into production.  This has been the case as long as
data bases have been used.  The Wallendas may walk without nets between
skyscraper roofs, but I assure you that a) not all of them arrive at the
intended destinations and b) normal mortals have a different perspective
on that sort of risk.
     Further, if portmaster produces that output *from the corrupted data
base*, then it stands to reason that other cases of corruption that differed
in particulars from this one might prevent portmaster from providing said
output.

> > > 
> Coming very late to the discussion I would like to understand what lead you to a
> broken database, from what I am aware of the only way to end up with a broken db
> is using pkg over nfs without rpc lock properly running (note that the default
>
> nfs client service on freebsd does not start rpc lock...)
>
> pkg work around this case by detecting the filesystem it is running on and
> changing the locking system it uses when running on network filesystem so this
> problem cannot in theory happen anymore even if one forgot to start rpc lockd.
>
> It remains however one case where this can happen: it is one the host is running
> over nfs and it starts a jail, in that case, the jail does not know it is
> running over nfs so pkg cannot know it needs to switch the locking system.
>
> Are you in one of those situation, if not can you describe a a bit more your
> system?

FreeBSD hellas 10.1-STABLE FreeBSD 10.1-STABLE #54 r282368: Sun May  3 15:48:37 CDT 2015     bennett at hellas:/usr/obj/usr/src/sys/hellas  amd64
     I have only one machine running these days, so there is no NFS.  At
present, I have no jails set up.
     Here are data concerning the relevant drive, which is the boot drive.
There are many other file systems on this machine, but they all involve
external devices and are unrelated to this issue, except for /buildwork,
where WRKDIRPREFIX=/buildwork/ports and CCACHE_DIR=/buildwork/ccache.  The
same sort of information for /buildwork is included below.

/dev/mirror/fbsds1a on / (ufs, local)
devfs on /dev (devfs, local, multilabel)
/dev/mirror/fbsds1d on /var (ufs, local)
/dev/mirror/fbsds1e on /usr (ufs, local, soft-updates)
/dev/mirror/fbsds1f on /usr/local (ufs, local)
/dev/mirror/fbsds3a on /usr/home (ufs, local)
/dev/mirror/fbsds3d on /usr/ports (ufs, local, soft-updates)
/dev/ufs/stripework on /buildwork (ufs, local, soft-updates)

 mirror/fbsds1  COMPLETE  ada0s1 (ACTIVE)
mirror/fbsds3a  COMPLETE  ada0s3a (ACTIVE)
mirror/fbsds3d  COMPLETE  ada0s3d (ACTIVE)
                            ufs/stripework     N/A  stripe/5x15g1
stripe/5x15g1      UP  da0p5
                       da2p5
                       da3p5
                       da5p5
                       da6p5

Note that the mirrors listed above have only one device in each.  At one
point a few weeks ago, there were two devices in each for a bit under
eight hours until the brand-new second drive failed.  Unfortunately, it
turns out that the manufacturer does not provide any support for its
internal hard drive products, other than replacement under warranty via
a web page that requires an unsecured web browser to apply for the
replacement.  Grrr.  Until I get that straightened out and have a second
drive, all of the mirrors listed above on the boot drive are unreplicated.
>
> Now by default pkg creates back up of the db daily to allow recovery, there was
> an issue before pkg 1.5 which requires a manual intervention and fixed after pkg
> 1.5 so it is easier to recover.

     One serious problem for recovery is that I don't know when the
corruption actually occurred.  All I can tell you is that it must have
happened no earlier than the previous update to lang/gcc (lang/gcc48).
The error messages first appeared with the update to gcc-4.8.4_3.
>
> sqlite is a pretty solid db I would like to understand how you ended up with a
> corrupted db if not what I described above.

     So would I because I would seriously like it not to happen again.
Long ago, I used to use portupgrade, but I eventually switched to
portmaster for three reasons:  speed (ruby is awfully slow), dougb@
massively reworked portmaster and made it work much more correctly
than portupgrade, and portupgrade's frustrating habit of always deciding
that the package data base had to be rebuilt from nothing, which always
took a long time and failed as often as it succeeded.  Because portmaster
now has to use a data base instead of direct evidence, it seems that that
fourth complaint has partly returned to haunt us.
     sqlite3-3.8.9_1 is what is installed on my system.  I have no idea
which version is incorporated into pkg or pkg-static.


                                  Scott Bennett, Comm. ASMELG, CFIAG
**********************************************************************
* Internet:   bennett at sdf.org   *xor*   bennett at freeshell.org  *
*--------------------------------------------------------------------*
* "A well regulated and disciplined militia, is at all times a good  *
* objection to the introduction of that bane of all free governments *
* -- a standing army."                                               *
*    -- Gov. John Hancock, New York Journal, 28 January 1790         *
**********************************************************************


More information about the freebsd-ports mailing list