Analyzing Log files of very large size

Mehmet Erol Sanliturk m.e.sanliturk at gmail.com
Sun Jul 11 20:18:12 UTC 2021


On Sun, Jul 11, 2021 at 5:38 PM Vlad Markov <dvoich at optonline.net> wrote:

> On Sun, 11 Jul 2021 19:43:41 +0530
> KK CHN <kkchn.in at gmail.com> wrote:
>
> > Yes, it is.
> >
> > On Sun, Jul 11, 2021 at 6:02 PM Korolev Sergey <serejk at febras.net>
> wrote:
> >
> > > Is it a plain text file?
> > >
> > > On 11 Jul 2021, at 22:13, KK CHN <kkchn.in at gmail.com> wrote:
> > >
> > > List,
> > >
> > > I am in a requirement to analyze large log files of sonic wall firewall
> > > around 50 GB. for a suspect attack.
> > >
> > > What tools and solutions need to be deployed for handling this much
> large
> > > files and pls enlighten me with your expertise and reference materials
> if
> > > any.
> > >
> > > All are tcp / ip communications, DNS UDP transports ..
> > >
> > > Regards,
> > > Kris
> I used to use split to break up large log files into manageable pieces.
> From there it depends on how you work. At first we used grep then we moved
> on to using perl regex to analyze logs.
>
> Vlad
>
>
>
> --
>
>
>

My idea is as follows because I am trying to use such a feature for a
database management system to track behavior of the program .
The generated log for a very short time came out 56 GigaBytes . During
backup of sources , the computer warned me about
"You are trying to backup 56 GigaBytes into a 4.7 GigaBytes DVD."
Assume a message line is 56 bytes , this size of file contains 1 Billion
records to study .

Then , it is easy to load this size of file as an AVL tree into memory by
grouping the accessed parts by counting their occurrences .

In your case , you may generate your log as , perhaps "accessor , accessed
parts , ... " .
Assume that you need who is accessing ( or attempting to access ) into
'some (as list )" parts .
During AVL tree generation , use "accessed parts" as  KEYs , and "accessor"
values as its leaves with some other vital information .


>From an AVL tree it is very easy to get a list of such accessors in order
and study them in more detail .
Since a small amount of information is sufficient , computers with memory
capacities will be sufficient .
If your memory is not sufficient , you may use an SSD disk as a storage
with even 500 Mega~Bytes per second write/read speeds .
Be careful about wear of such disks with very high amounts of write/read
operations .



It is very easy to find open source AVL tree software with sufficiently
permissive licenses . I do not know exactly , but my
opinion is that even in FreeBSD sources there are such parts .

It is possible to find information about AVL trees in data structures books
, especially such books using C or C++ may be more
useful for you .


https://en.wikipedia.org/wiki/AVL_tree
AVL tree

Please search the following phrase in Google :

open source repositories about avl software



Mehmet Erol Sanliturk


More information about the freebsd-questions mailing list