Analyzing Log files of very large size

John Levine johnl at iecc.com
Sun Jul 11 20:11:40 UTC 2021


It appears that David Christensen <dpchrist at holgerdanske.com> said:
>On 7/11/21 5:13 AM, KK CHN wrote:
>> I am in a requirement to analyze large log files of sonic wall firewall
>> around 50 GB. for a suspect attack. ...

>But if this project is for an employer or client, I would recommend 
>starting with the commercial-off-the-shelf (COTS) log analysis tool made 
>by the hardware vendor.  Train up on it.  Buy a support contract:
>
>https://www.sonicwall.com/wp-content/uploads/2019/01/sonicwall-analyzer.pdf

This is reasonable advice if you plan to be doing these analyses on a regular
basis, but it's overkill if you only expect to do it once.

I have found that some of the text processing utilities that come with BSD
are a lot faster than others.  The regex matching in perl is a lot faster
than python, sometimes by an order of magnitude.  My took of choice is mawk,
an implementation of the funky but very useful awk language that is amazingly
fast.  grep is OK, sed is too slow for anything other than tiny jobs.

I'd suggest first dividing up the logs into manageable chunks, perhaps using
split or csplit, or it would be a good first project in mawk, using patterns
to divide the files into chunks that represent an hour or a day.

Then you can start looking for interesting patterns, perhaps with grep if they
are simple enough, or more likely with some short mawk scripts.

R's,
John


More information about the freebsd-questions mailing list