Re: Strange network/socket anomalies since about a month
- In reply to: Alexander Leidinger : "Strange network/socket anomalies since about a month"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Mon, 22 Apr 2024 15:35:16 UTC
On Apr 22, 2024, at 3:26 AM, Alexander Leidinger <Alexander@Leidinger.net> wrote:
> Hi,
> 
> I see a higher failure rate of socket/network related stuff since a while. Those failures are transient. Directly executing the same thing again may or may not result in success/failure. I'm not able to reproduce this at will. Sometimes they show up.
> 
> Examples:
> - poudriere runs with the sccache overlay (like ccache but also works for rust) sometimes fail to create the communication socket and as such the build fails. I have 3 different poudriere bulk runs after each other in my build script, and when the first one fails, the second and third still run. If the first fails due to the sccache issue, the second and 3rd may or may not fail. Sometimes the first fails and the rest is ok. Sometimes all fail, and if I then run one by hand it works (the script does the same as the manual run, the script is simply a "for type in A B C; do; poudriere bulk -O sccache -j $type -f  ${type}.pkglist; done" which I execute from the same shell, and the script doesn't do env-sanityzing).
> - A webmail interface (inet / local net -> nginx (rev-proxy) -> nginx (webmail service) -> php -> imap) sees intermittent issues sometimes. Opening the same email directly again afterwards normally works. I've also seen transient issues with pgp signing (webmail interface -> gnupg / gpg-agent on the server), simply hitting send again after a failure works fine.
> 
> Gleb, could this be related to the socket stuff you did 2 weeks ago? My world is from 2024-04-17-112537. I do notice this since at least then, but I'm not sure if they where there before that and I simply didn't notice them. They are surely "new recently", that amount of issues I haven's seen in January. The last two updates of current I did before the last one where on 2024-03-31-120210 and 2024-04-08-112551.
> 
> I could also imagine that some memory related transient failure could cause this, but with >3 GB free I do not expect this. Important here may be that I have https://reviews.freebsd.org/D40575 in my tree, which is memory related, but it's only a metric to quantify memory fragmentation.
> 
> Any ideas how to track this down more easily than running the entire poudriere in ktrace (e.g. a hint/script which dtrace probes to use)?
No answers, I'm afraid, just a "me too."
I have the same problem as you describe when using ports-mgmt/sccache-overlay when building packages with Poudriere.  In my case, I'm using FreeBSD 14-STABLE (stable/14-13952fbca).
I actually stopped using ports-mgmt/sccache-overlay because it got to the point where it didn't work more often than it did.  Then, a few months ago, I decided to start using it again on a whim and it worked reliably for me.  Then, starting a few weeks ago, it has reverted to the behaviour you describe above.  It is not as bad right now as it got when I quit using it.  Now, sometimes it will fail, but it will succeed when re-running a "poudriere bulk" run.
I'd love it to go back to when it was working 100% of the time.
Cheers,
Paul.