Re: PortsDB: a program that imports the ports tree into an SQLite database
- In reply to: Yuri : "PortsDB: a program that imports the ports tree into an SQLite database"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Sun, 15 Jan 2023 09:48:40 UTC
[Freshly updated key - please update your keyring] On 2023-Jan-04 14:30:59 -0800, Yuri <yuri@FreeBSD.org> wrote: >I developed the PortsDB project that imports FreeBSD ports into an >SQLite database: https://github.com/yurivict/freebsd-portsdb >The port is ports-mgmt/portsdb. Thanks. That looks quite useful. >The database can be fully rebuilt in ~20 minutes, after which in can be >quickly (in seconds) updated with new commits. That seems a long time but it seems that most of it is running various "make describe" commands across all ports. Unfortunately, "ports-import" failed for me, reporting: ... actions that will be performed: - import the ports tree into the database /home/peter/ports.sqlite memory Parse error near line 135079: near "net": syntax error _PHASE,CHILD_PKGORIGIN,CHILD_FLAVOR,KIND) VALUES('net/rubygem-net-protocol','' error here ---^ portsdb-import 1042.79s user 2088.03s system 479% cpu 10:53.06 total Looking at the intermediate SQL, the problem is that the "INSERT INTO Port" statement for print/ghostscript7-base gets interleaved with several "INSERT OR IGNORE INTO Depends" statements at 2048 character boundaries. Eliding most of the relevant lines, I get: INSERT INTO Port(...) VALUES ('print/ghostscript7-base',...GS_t4693d8 GS_tekINSERT OR IGNORE INTO Depends(...) VALUES('net/rubygem-net-protocol','',null,'mail/rubygem-net-smtp','','R'); 4696 GS_bmpmono GS_bmpgray ... GS_ljet4d GS_ljetplus GSINSERT OR IGNORE INTO Depends(...) VALUES('graphics/gd','',null,'graphics/libwmf','','L'); _lp1800 GS_lp1900 GS_lp2000 ... GS_vgalib',null); It looks like several other ports (www/apache24, math/octave-forge, www/nginx-full, print/ghostscript8-base, print/ghostscript9-base) also result in "INSERT INTO Port" statements exceeding 2048 characters, at least for me, only 1 port actually failed the race with another process. Looking at the actual code, the problem is that the "echo" in /usr/local/libexec/portsdb/include/functions-sql.sh doesn't do atomic writes of arbitrary length arguments. The actual size isn't clear but when I tried ktrace(1)'ing a script that contained 'echo "very long string" >> somefile' it did a series of 1024 character writes. I can't explain why I've only seen truncation at 2KiB and only in one case but race condition behaviour isn't obvious. My suggestion is that you use something other than sh(1) for writing the intermediate file. Based on a quick check, awk(1) can write at least 5358 characters in a single write(1) call but the write(2) man page doesn't document any atomicity requirements so it seems possible that large writes could be interleaved at the filesystem level in any case. Ideally, this tool would be written with explicit serialisation operations around cases where write(1) operations are being issued by multiple processes/threads. -- Peter Jeremy