Dr Malloc's Apache Log tricks

Discussion in 'Apache' started by DrMalloc, Sep 27, 2006.

  1. #1
    While viewing stats from a static generating like awstats/webalizer/analytics is the best way of gauging the activity of a site, it's useful to have some commandline tricks for getting some quick info on your logs. I decided to share some of the commands i've learned/devised over time for doing so, using the gnu toolset (which are available on any unix variant such as freebsd or linux). Feel free to add any commands of your own. It's worth noting that these commands work for the 'combined' apache log format, although that is usually the default log setting.

    Finding the number of unique IPs in a logfile:
    cat <logfile> | awk '/ / {print $1}' | sort | uniq | wc -l

    Listing all the referring URLs in a logfile:
    cat <logfile> | awk '/ / {print $11}' | sort | uniq | sed -e 's/"//g'

    List the number of googlebot hits to a site in a day (NOTE: this does depend on the machine having the 'host' command/program, some linux distributions ship with resolveip instead):
    host -x `cat <logfile> | awk '/ / {print $1}'` | grep 'googlebot.com$' | wc -l

    Output the amount of bandwidth transferred by the requests in the log.
    cat <logfile> | awk '/ / {i += $10;} END { print (i/1000000) "MB"; }'

    A quick explanation of the tools used:
    grep / filters lines by a given keyword or pattern
    cat / displays the contents of a file
    wc -l / outputs the number of lines in a file
    awk / a rather complex scripting tool, but for the purpose of this post it splits lines based on a given character or pattern and allows the output of single chunks
    sort / sorts lines into alphabetical order
    uniq / takes a sorted list of items and eliminates duplicates
     
    DrMalloc, Sep 27, 2006 IP
    danielbruzual likes this.