a simple tool "How to find out" if surfers find relevant pages on your site

Discussion in 'Products & Tools' started by hans, Mar 24, 2004.

  1. #1
    a few months ago - i had some spare time and surfed
    and found a simple perl script
    much different from what i expected and with a nice side effect.

    few basics - in case there are some newbies reading THIS.

    with simple access_log-file analyzing tools such as webalizer you detailed statistics about
    - entry/exit pages
    - referrer
    - countries visitors are originating
    - traffic details by hits, pages, KB, visits, hosts, ... and much more
    - how many hits for each individual page
    - error statistics
    - ...

    and of course

    a list of USED keywords when searching in SE to find your site

    thats all very nice + helpful

    but

    all these data LACK relevancy about answers to

    - DID EVERY surfer visiting my site from ANY SE query-result really find the ONE PAGE directly relevant to HIS query

    - is the page HE found FULLY answer HIS query ??

    that was my key problem i wanted an answer a solution ...

    and the one who-is-online-script i found was access.pl
    the original is at
    http://mkruse.netexpress.net/
    access.pl was the answer and FULL solution to my quest

    i modified it - adapted to more individual SE to give better SE overview in the browser output - more lines of output
    ( now 777 lines of access_log - that gives me approx 1 hr of access +/- depending on day time )

    a modified version access.zip is available from my site for download at

    http://www.kriyayoga.com/logfiles/tools.html

    see top of page -- security !

    how does it work ?

    with the command tail -f
    it takes a configurable number of access_log lines
    and displays it DIFFERENTLY - here how

    - grouped by IP

    - you see ONE IP/surfer as a group - with ALL the pages/files THAT one IP ( surfer ) was visiting

    - at the beginning the referrer ( highlighted SE with language affiliation for
    most google's and some others as well - these are easy changes i made and you see how when you open the script with a
    Unix compliant text editor and you can add or modify according to your needs ! )

    - hence you see from WHERE each visitor is coming - what exact query words HE used for that search engine AND
    you see from the FIRST page he loaded IF he found the full answer to HIS query !

    now let's assume he does NOT find full answer or totally wrong page .. worst scenario ..

    why ?

    - wrong or missing title / description or content irrelevant to description or keywords in meta
    - wrong / irrelevant search results from SE

    what to do ?

    change - correct - adapt your pages according to the NEEDS of surfers / SEO !
    provide the fullest possible answer / solution to HIS query and problem in life - whatever service you offer
    YOU are the solution FOR your targeted group on this entire planet !!! who else ?

    less worst scenario ..

    you feel and know he found - but only part of his query - you see his FULL query + may be ONE or several of the terms are missing on THAT page ..

    but may be available ON your site ..

    then have links to all or any missing parts that HE can find on YOUR site !

    adapt YOUR pages to meet full need of each or as many of YOUR visitors as possible.
    another chapter with the one answer he may have been missing
    another sentence to clarify one point and to make your results full and complete

    some other scenarios ..

    SE are still learning and having different algorithms - sometimes their combination of answers offered simply is wrong - hence
    nothing to do by you - but by SE.

    or

    surfer lacks knowledge about HOW to use SE and how to write a logical all inclusive query that really results in what he NEEDS

    or

    surfer lacks knowledge what he really is looking for !! YES that happens sometimes - some people just confused and LOST
    in life that they write queries that make no logical sense at all - or - they write full sentence just like you would ask a human ..
    but NEVER a computer at this early stage of PC development.

    these last points you can do little or nothing at all.

    but all above points show you a DIRECT relationship between ONE individual surfer with his full query and all the pages he surfed
    and hence you KNOW if he found ALL or part or nothing at all ..
    and you can adapt your site if needed all your pages to optimize CONTENT

    you also see WHERE exactly your 404 ( page not found ) are coming from ..
    and can adapt .. correct - or just KNOW that there is nothing to be done if case of URL is wrong or correct URL
    misstyped by surfer ... except in latter case you can make a custom 404 page with an ON SITE real text search engine you
    install to give each surfer another chance to find ON your site what he was looking for.

    re custom missing.html page ( configured in .htaccess )

    every once in a while there is a NEED to "send that page on vacation" such as i do right now for a few months - to allow ALL SE to get a REAL 404 and to remove any and all wrong / outdated / missing pages from THEIR database - i think its a question
    of courtesy toward all SE to do so every once in a while.

    it has allowed me to reduce the number of error 404 form some 3% to less than 1 % and this remainder is caused by fancy silicon valley SW request asking for weird URLs like

    /MSOffice/cltreq.asp?UL=1&ACT=4&BUILD=4219&STRMVER=4&CAPREQ=0
    or
    /icickm2004/ icickm2004-home.htm ( space IN url --->> after 2004/ and icickm ... - illegal URL !! )
    OR
    /icickm2004/%20icickm2004-home.htm ( space in URL replaced by %20 -- ILLEGAL URL !! )

    the latter TWO samples come from G - IF people COPY and PASTE URL rather than clicking URL !

    NOTHING to be done for such and if you see that ALL your 404 are coming from such human errors
    by surfers - then you can accept whatever % of 404 you have !

    these are all pages that may NEVER ever have been on your site but are requested by some browser software or surfers

    other point ..

    WHY is this access.pl tool in my security paragraph ?

    because of queries like

    -------------------- start log file excerpt form access.pl
    12.38.79.66 - Mozilla/4.06 (Win95; I) Date Page Status Referrer
    12/28 02:14 /cgi-bin/Mail.pl 302
    12/28 02:14 /cgi-bin/FormMail.cgi 200
    12/28 02:14 /cgi-bin/formmail.cgi 200
    12/28 02:14 /cgi-bin/FormMail.pl 200
    12/28 02:14 /cgi-bin/mail.pl 302
    12/28 02:14 /cgi-bin/formmail.pl 200
    12/28 02:14 /cgi-bin/Mail.cgi 302
    12/28 02:14 /cgi-bin/mail.cgi
    --------------------- end

    here above we see ( i saw in real time :) )
    a hacker attempt to search for an inexistent form mail perl script ( probably the one from Matt's perl archive )
    to attempt abuse of such script for HIS spam ..

    other hacker attempts also have been observed a few times

    then you may act instantly by BLOCKING that IP in your .htaccess ( takes just a few seconds ) ..

    or just watch and smile if you know that your site is safe !

    people "sneak" in and search in areas where NO link ever guides them - and you known and observe and may adjust
    permissions or access of such web site areas.


    one last point to access.pl

    its just ONE file

    you then call THAT full URL of access.pl in your browser and see it in your browser window.
    you may - like i did - use external style sheet to make it more colorful and easier to instantly READ

    change the number of lines

    and 2 lines only NEED to really be adapted
    one with the FULL absolute path to your access_log file on your server
    anther one with the full domain name !

    that's it

    it takes just a very few minutes

    you may also see what i see re AOL .. each visit from ONE surfer is split into many page file-requests each one coming from different IP -
    which makes appear in YOUR log like MANY visitors use AOL
    in reality you may see that ONE AOL surfer may created up to about 20 different page requests originating from different IPs -
    making all belief ...
    :)

    belief is one part of creation - KNOWLEDGE another part
    because YOU know the direct relationship between all different files requested for ONE page and hence you will KNOW ...
    how small .. and how few ..
    :)

    access.pl helped me a lot during the past months
    i actually load it THE VERY first page each online sessions
    it also shows me instantly the nice SE visits by various Bots
    and HOW they crawl or LOOP ( i had once a direct observation of ONE German SE-bot looping some 4-5 THOUSAND times ..
    and because they are friendly - they have their URL - i emailed them
    the other day the problem was FIXED and their bot surfed free of any loop ..

    it like becoming guardian angel for bots
    and to help when needed and/or possible

    or to be a guardian angel of surfers
    who may use a word misspelled or ONE word where YOU write two and you adapt or add that joined word - to assure all spellings
    and misspellings related to YOUR site are found.

    you also see something interesting about google-bots

    how often they get a 304 or 200
    and you see

    it appears obvious that the TTL ( time to live ) of their database is configured to BE short ! - because they visit my site daily with up to a few
    hundreds of pages crawled AND ... many times get a 200 - meaning they have DUMPED their previously crawled original and WANT a new
    one forced into their cache - because YOU will know that at least a few of those 200 pages called by Googlebot are still the same
    as a few weeks earlier during the previous visit.

    which also somehow MAY ( MAY BE .. ) explain WHY google dance -- why some pages DROP fully out and a day or few days later are IN
    and top 10 or so again - it may simply be that heir goal of FRESHNESS sometimes can NOT be met by heir own bot to RELOAD ( 200 )
    that dumped page BEFORE it is dumped ..

    it also shows HOW time efficient a google bot CAN be and most of the time IS - a new page often is IN their search data base within
    less than 24 hrs after publishing WITHOUT submitting the new page - just by adding it in your navigation menus !!!

    god bless
     
    hans, Mar 24, 2004 IP
  2. expat

    expat Stranger from a far land

    Messages:
    873
    Likes Received:
    18
    Best Answers:
    0
    Trophy Points:
    0
    #2
    Hi Hans,

    nice tool I will have a look at it
    here is one I use as nothing beats on-line realtime
    [​IMG]
    It's an adapted script from MnMDesigns shows visits last 4 minutes
    from where to and either direct or where from in detail. Klickable so I can immediately check the SE or source and the page they see.

    I run a couple of versions of it for high traffic domains and spend a bit of time checking if searches resolve to the right pages, what are positions in various SE's and more specific what curious query strings users come up with.

    If I have the time and see curious requess comming in I sometimes just ban the IP for 24h.
    Regarding the hoovers and mail slurpers most of my sites run various poison cgi scripts. The above will try to inject 100+ invalid e-mails if one comes along as it leads them along a range of domains.

    It's nice when one has a break or is on the phone watching what is going on and has helped a great deal to refine and be more relaxed about the whole thing.
    http://www.mnm-designs.com/main.php

    Cheers
    M
     
    expat, Mar 25, 2004 IP
  3. hans

    hans Well-Known Member

    Messages:
    2,923
    Likes Received:
    126
    Best Answers:
    1
    Trophy Points:
    173
    #3
    yes
    yours may be better for high traffic domains ( last 4 mins )
    i djusted my number of access_log lines to approx 1 hrs
    so i am free to work and have a look every once in a while

    all those tools help to fine tune a domain and hence to feel in peace when taking off
    for a while.

    i also used once a mobile phone to check my domain from anywhre ( nokia 3650 )
    ..

    to make life easier - i just uploaded a very recent view of my last 777 lines to give you an exact idea on how it looks like

    http://www.kriyayoga.com/logfiles/access.html

    and

    we see near the top of page
    2 errors 404 created by /index.html%20/%20_top ---> a copy and paste of URL that included spaces form the search engine result page ..

    i will remove that page after a few days or week - in the mean time may it help decision making if unsure to install and try or leave ..

    peace of mind is important to enjoy life !
    :)

    have fun
     
    hans, Mar 25, 2004 IP
  4. expat

    expat Stranger from a far land

    Messages:
    873
    Likes Received:
    18
    Best Answers:
    0
    Trophy Points:
    0
    #4
    Hi Hans,

    looks OK I will check it out soon maybe a nice one for some of my clients.

    Regarding on-line you can adjust this to whatever you like it comes standard with 5 minutes but it's easy to change to 60 min and how many lines to show etc.
    The nice thing is it uses mysql and cleans up after itself so you don't build a huge db. I run it on a php screen that refreshes every 3 minutes. Oh and does some crude but nice predictions ond so on.
    Cheers
    M

    PS one of the few free tools that actually install cleanly and work out of the box (wel after adjusting the cfg file).
     
    expat, Mar 25, 2004 IP
  5. hans

    hans Well-Known Member

    Messages:
    2,923
    Likes Received:
    126
    Best Answers:
    1
    Trophy Points:
    173
    #5
    ok
    access.pl has no db - it reads directly access_log and after process
     
    hans, Mar 25, 2004 IP
  6. iconv

    iconv Well-Known Member

    Messages:
    189
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    108
    #6
    Using it on all my sites now, thanks for posting Hans. Very handy to catch rogue bots and track popular keywords used to find pages.
     
    iconv, Oct 21, 2004 IP
  7. hans

    hans Well-Known Member

    Messages:
    2,923
    Likes Received:
    126
    Best Answers:
    1
    Trophy Points:
    173
    #7
    happy to see you like it and find it useful
    so am i again happy to have found someone to adapt it for me for my modified access_log form my 1and1.com hosting

    i was missing it for the past weeks since my move to 1and1 hosting - now since a few days i have it again :)

    so if anyone else has same hosting as i have now
    there is a version available of access.pl that can handle the additional 2 log format fields

    the very same applies to the well known webalizer for access stats,
    a friend from india - C/C++ developer - has modified webalizer too, to make it work again for the modified access_log format of 1and1 hosting :)
     
    hans, Oct 21, 2004 IP
  8. netprophet

    netprophet Banned

    Messages:
    288
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #8
    gud stuffs...........

    :)
     
    netprophet, Oct 9, 2006 IP
  9. The_Phantom

    The_Phantom Well-Known Member

    Messages:
    470
    Likes Received:
    8
    Best Answers:
    0
    Trophy Points:
    110
    #9
    Error 404: NOT FOUND!
    Your browser cannot find the document corresponding to the URL you typed in.
     
    The_Phantom, Oct 17, 2006 IP
  10. manojcricfan

    manojcricfan Peon

    Messages:
    69
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #10
    Looks interesting but not opening for me.
     
    manojcricfan, Mar 29, 2011 IP
  11. hans

    hans Well-Known Member

    Messages:
    2,923
    Likes Received:
    126
    Best Answers:
    1
    Trophy Points:
    173
    #11
    sorry guys for the outdated link but that was a truly old thread from 2004
    but
    link corrected now

    currently I have no time to run the script on my servers - I simply have too much traffic and too much work to do otherwise

    but I used for years the script
    original access.pl and the 1and1 version

    the script no longer is maintained but still should work on most servers
    the newer version "whoisonline" = I never had time nor need to test since the original was working perfectly for me

    God luck
     
    hans, Mar 29, 2011 IP
  12. GlennBridges

    GlennBridges Guest

    Messages:
    20
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #12
    Jeez, that is a boat load of information. Netherless very helpful. Cheers!
     
    GlennBridges, Jun 9, 2011 IP
  13. shteca

    shteca Active Member

    Messages:
    145
    Likes Received:
    6
    Best Answers:
    0
    Trophy Points:
    78
    #13
    Thats a lot of writing. My eyes cant see so much text without closing.
     
    shteca, Sep 2, 2012 IP