Hello there, I'm trying to find a baysen like filter for interpreting raw log files from my server, I have searched a bit and found nothing. What I would like to have is a program that would remove the most common server queries and analyse the most uncommon ones so that I can interpret if its scrapper or a hacker trying to find a security hole. Tks
I haven't seen anything Bayesian but I would recommend looking at Splunk. Check out the demos available on their site, particularly this one. I suspect once you learn how to use it you will never look back. You can also put all your other log files through it and use all the same tools for analysing them. That said, it would be very interesting to have a Bayesian filter develop an idea of what is "normal" on your website so it can detect abnormal behaviour automatically. We tend to use our normal traffic tracking and ad-hoc one-line scripts to try and identify interesting behaviour but it is time consuming and inaccurate. Alternatively, you could try glTail. It's also not quite what you are after but it does show your traffic patterns in real time and it is seriously cool.