starting to get consistent results :D

Discussion in 'General Business' started by blacknet, Dec 27, 2007.

  1. #1
    so here's how just under 2 hours of my day went today :)

    13:15 - came up with an idea for a web archiving website
    13:24 - registered dayarchive.net and pointed to server
    13:29 - knocked up a quick script to generate the site I wanted
    13:41:19 - run the created script to generate the site
    13:50:32 - site generated, started doing the css/template
    14:22:04 - completed site tempaltes and css
    14:23:39 - regenerated site

    site made ..

    14:29 - added the site to google webmaster tools - and submitted to google + yahoo add url
    14:30 - pinged a few blog ping servers
    14:30:07 - yahoo slurp visits (head call)
    14:30:44 - moreoverbot visits
    14:30:49 - moreoverbot grabs homepage
    14:30:49 - moreoverbot grabs rss
    14:30:54 - googlebot grabs rss
    14:30:55 - googlebot grabs homepage
    14:33:05 - first scraper comes in via google blog search
    14:35:47 - googlebot grabs it's first 2nd level page
    14:37:14 - blog pulse live grabs rss + homepage
    14:39:44 - tecnorati bot grabs rss + homepage
    14:39:54 through 14:45:58 - googlebot grabs 82 pages and indexes them :)
    14:44:11 - another scraper
    14:52:53 - first visitor from google.com with a search for: "Turista movie spoilers" (on front page for this already)
    14:56:59 through 15:06:36 R6_feedFetcher harvests site (meanwhile....)
    15:00:15 second real vistor comes in from (refer blocked) but real user
    15:02:56 third real vistor comes in from google.com search for "lou mcfadden winesburg"
    15:06:00 second visitor views a second page
    15:07:35 fourth visitor lands from google.com search for "julia louis-dreyfus bares all"

    it's now another 2 hours later and the sites just hit 38 visits from google and my first $1 in adsense :)

    i guess the lesson is (and this is purely to myself).. I've spent 8 months developing a "perfect" bh system and it's still not finished - in 1 hour I used scripts I'd made months ago to get a site up in the top tens on 2/3 word phrases simply by keeping ti simple (and weirdly with no seo, just neat free for all content) - I could have made 500 sites in those 8 months, I guess you just can't (possibly shouldn't) automate everything - the human touch is what makes it work!

    hope your all ahving a good holiday!

    blacknet
     
    blacknet, Dec 27, 2007 IP
  2. Lethal7

    Lethal7 Active Member

    Messages:
    2,262
    Likes Received:
    56
    Best Answers:
    0
    Trophy Points:
    90
    #2
    nice, wats ur url :p
     
    Lethal7, Dec 27, 2007 IP
  3. blacknet

    blacknet Active Member

    Messages:
    709
    Likes Received:
    16
    Best Answers:
    2
    Trophy Points:
    70
    #3
    first two lines mate...
    13:15 - came up with an idea for a web archiving website
    13:24 - registered dayarchive.net and pointed to server

    registered.... ^^^^

    going to add in the archiving in a mo and maybe monetize with something better than adsense! (however thats not my niche)
     
    blacknet, Dec 27, 2007 IP
  4. sudarshannus

    sudarshannus Peon

    Messages:
    1,431
    Likes Received:
    144
    Best Answers:
    0
    Trophy Points:
    0
    #4
    very good idea...Hope u can tell us what script u used?
     
    sudarshannus, Dec 27, 2007 IP
  5. Lethal7

    Lethal7 Active Member

    Messages:
    2,262
    Likes Received:
    56
    Best Answers:
    0
    Trophy Points:
    90
    #5
    lmao, i just made myself look so stupid!

    cant believe i missed that!
     
    Lethal7, Dec 27, 2007 IP
  6. blacknet

    blacknet Active Member

    Messages:
    709
    Likes Received:
    16
    Best Answers:
    2
    Trophy Points:
    70
    #6
    sure can.. it's not - made it myself :)

    update:
    traffic was going up and up - added in another level with another 380+ pages and now the big g has started de-indexing me :eek:

    or should i say not re-crawling.. I'm currently tryign a few things to see if I can force them to come back..

    update:00:21GMT
    not sure what one did, but g didn't index or re-crawl - however in the last hour both msn and ask/teoma have done full spiders with the initial entrypoint being the new url of the rss feed - yahoo alos hit it 7 times ver the past 45 minutes..)
    the change.. changed "link"s in rss to point to more rsss feeds rather than actual pages..
     
    blacknet, Dec 28, 2007 IP
  7. blacknet

    blacknet Active Member

    Messages:
    709
    Likes Received:
    16
    Best Answers:
    2
    Trophy Points:
    70
    #7
    another update!

    I decided that adding in the extra pages was a bit of overkill - and seemed to be more of a negative, especially with over 1000 generated in 2 days (on a new domain).

    Also discovered that gbot seemed to confuse itself with daily feeds, constantly checking /2007-12-27.xml for updates rather than /rss (so watch out for that in your logs guys)

    Further, I realised that although the rss was blog like.. the front page wasn't, and perhaps overkill on links (100+) - that whole link to word count ratio killed it I assume; as did the big g, mass de-indexing occured (on blog search, and vague index started on the main serps)

    Further more! the whole "update once a day" thing was doing me no favours at all.

    So I've changed the whole site!! changes made:

    page subjects are now gathered from multiple sources and checked to find "new trends" on the net - and actually verify them! the process is cron'd every 2 minutes

    content retrieval happens every 2 minutes aswell, with pages pre-generated.

    publishing of a single item occurs randomly sometime between 2 and 20 minutes. frontpage, rss and archives are all updated at every "publish".

    the front page now list's in a blog stylee the latest 25, (as does the main rss) - with a full daily archive available.

    so.. hows it working.. well the changes have been live for just under an hour and >

    well the big g, technorati, moreover and spherebot have all hit index and /rss on every publish :)

    on the first publish g went looking at the old rss feed, so I 301'd it to the new rss
    2nd update g went to the correct rss and homepage
    3rd update, big g realised the site was changing and sent in the proper bot to do a full crawl of everything int he rss

    I must stress this is all within 32 minutes of changing the site. and more to the point G now classes it as a rela site and not a blog, so blog search = nothing whilst normal serps are http://www.google.com/search?hl=en&q=site:dayarchive.net&sa=N&tab=bw

    you can see the changes by checking these two links:
    new: http://dayarchive.net/
    old: http://dayarchive.net/2007-12-28

    if anybody wants any clarification just reply and I'll clarify whatever you want.

    hope nobody minds, just keepign a report here, if not just for my own benefit [keeps me focussed]
     
    blacknet, Dec 29, 2007 IP
  8. eruct

    eruct Well-Known Member

    Messages:
    1,189
    Likes Received:
    49
    Best Answers:
    0
    Trophy Points:
    108
    #8
    Cool script and nice idea for a site.
    Some noob questions if you don't mind....
    In the header you have a google ad but it doesn't say 'ads by...', is that a glitch? Is that allowed? If so how?
    Also, I'm curious as to where the feeds are coming from.
     
    eruct, Dec 29, 2007 IP
  9. blacknet

    blacknet Active Member

    Messages:
    709
    Likes Received:
    16
    Best Answers:
    2
    Trophy Points:
    70
    #9
    yeah sure thing:
    google ads are "referer" ad's, text link only much nicer one thinks! they're also embedded in the text right near "more" to hopefully get some extra clicks in a legal manner.

    the feeds, well they're not actually all feeds - it's a cross reference between a few yahoo api's, google apps, my own db's and some general rss feeds filtered by time. there are 700+ scripts working together to find the data, and 3 to build the site, 2 to display lol.

    edit: the pages displayed aren't actually any feeds, they're pre-generated static pages, which where gen'd when the new "hot term" is found. - cron's great :)
     
    blacknet, Dec 29, 2007 IP
  10. blacknet

    blacknet Active Member

    Messages:
    709
    Likes Received:
    16
    Best Answers:
    2
    Trophy Points:
    70
    #10
    another update:
    big g was only doing token index hits, so I changed to doing a proper rpc post ping to them, sure enough the big g came and spidered again, hitting the rss, getting the changes and spidering them.

    indexed within 2 minutes of crawl - nice :)


    update: 48 minutes later, and after another 5 updates and 3 g index hits, another crawl and index! somethings working :p
     
    blacknet, Dec 29, 2007 IP
  11. blacknet

    blacknet Active Member

    Messages:
    709
    Likes Received:
    16
    Best Answers:
    2
    Trophy Points:
    70
    #11
    update: now the big g is crawlign and indexing every page within 30 seconds of publish :) joy - cracked it?
     
    blacknet, Dec 29, 2007 IP
  12. eruct

    eruct Well-Known Member

    Messages:
    1,189
    Likes Received:
    49
    Best Answers:
    0
    Trophy Points:
    108
    #12
    Cool. Thanks for the answers.

    Does your script have the ability to filter more specific results rather than just the search trends?
     
    eruct, Dec 29, 2007 IP
  13. gnatfish

    gnatfish Peon

    Messages:
    86
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #13
    Thanks for posting all this, it's very interesting!
     
    gnatfish, Dec 29, 2007 IP
  14. tonyrocks

    tonyrocks Active Member

    Messages:
    1,574
    Likes Received:
    50
    Best Answers:
    0
    Trophy Points:
    88
    #14
    Nice work! I bet you get de-listed in a matter of 3 days :)
     
    tonyrocks, Dec 29, 2007 IP
  15. blacknet

    blacknet Active Member

    Messages:
    709
    Likes Received:
    16
    Best Answers:
    2
    Trophy Points:
    70
    #15
    Well yes, the scripts collectively can do pretty much anything, it's just a matter of putting them together in a way that works for what you want :)

    Bet I don't, and if I do bet the sites back in within 24 hours :) - already been down the delisting route (within 36 hours), counteracted it, and this is three days later!
    edit: thats a bit big headed actually, i hope i don't get de-listed, and will do everything I can to prevent and counter it, end of the day it's out of my total control though *shrugs*

    Update
    well I just left the system to work away by itself over night, and sure enough googles been hitting on every publish, and doing a full update every 2-4 publishes (roughly every 45 minutes), same with moreover, sphere scout, technorati etc.
    googles now got all 169 published pages indexed and is passing traffic through quite frequently. Should eb an interesting day today :)

    unique visitors (not inc spiders or myself):
    Fri 12/28/2007 : 169
    Sat 12/29/2007: 77
    Sun 12/30/2007: 233 (so far)

    remember, friday was launch one, which got listed great then delisted over saturday, saturday late on through to sunday is the new method.

    edit update
    on closer inspection I found that big g hadn't actually done a full index for over 2.5 hours, so I got my head to thinking why - final reasons where
    1: site had been running out of content to publish, thus publishing less frequently
    2: i was using a loop to generate the index page which was making it 1.8 seconds to generate

    actions taken:
    1: broadened filters to allow an extra 3 phrases per 5 minutes to be checked and verifired, this means that there's always something to publish, without overkilling and making it too obvious where db sources are coming from and getting done for dup content
    2: made script to pre published index page on ever site update, thus now static html (with a little scripting twist)
    3: also removed some google ads, as they where killing load tims in normal browsers

    results
    big g is back to crawlign and indexing every 40 (rough) minutes (every third publish)

    traffics had a major boost aswell thanks to a few long tails
     
    blacknet, Dec 30, 2007 IP
  16. blacknet

    blacknet Active Member

    Messages:
    709
    Likes Received:
    16
    Best Answers:
    2
    Trophy Points:
    70
    #16
    Another Day; Day 4,

    Had moments of paranoia over the past 24, mainly because I noticed that with all the changes to the system I'd managed to gain myself a few repeat pages; coupled with the fact I've been monitoring logs line by line to see exactly whats going on, and the reaction to each change.

    To cut it short, gbot was hitting 3 day old pages that where almost exact duplicates of pages the system had just published about 20 minutes earlier, after g did this 3 times in a row; and the sites indexed page count dropped by 3 I feared the worst; Reacted and deleted 630 rss feeds and mod_rewrite'd a quick fix to 404 everything prior to the change.

    An hour later and gbot hadn't appeared back - oh hell - checked the serps and the indexed pages count had gone up by 230 (with the old pages) - waited another hour and g was both frequently crawling the new pages and doing a slow cache forming crawl of the old pages (all getting 404'd).

    I took a gamble and removed the rules allowing all the old content back, and removing a couple of the duplicates manually; seems to have paid off! Really though, either possible action was a gamble..

    Stats
    Other than that yesterday was all about monitoring and making sure things are going as planned; as this is only a tiny practical test of something far bigger thats been months (years) in the making :)

    Stats Update (unique visitors, spiders and myself removed):
    28/12/2007 169
    29/12/2007 77
    30/12/2007 336
    31/12/2007 184 (so far, its early..)

    pages in g: 327
    latest page in index: 22 minutes ago

    happy new year
     
    blacknet, Dec 31, 2007 IP
  17. isaa

    isaa Well-Known Member

    Messages:
    315
    Likes Received:
    7
    Best Answers:
    0
    Trophy Points:
    110
    #17
    Well done, blacknet! There's a lot to be said about spontaneity, for just going out there and implementing an idea the minute it occurs. Good for you. Wishing you continued success with it.
     
    isaa, Dec 31, 2007 IP
  18. blacknet

    blacknet Active Member

    Messages:
    709
    Likes Received:
    16
    Best Answers:
    2
    Trophy Points:
    70
    #18
    many thanks Isaa :)

    a quick update, the site just hit the 290 unique mark for the day a few minutes ago, and the system has pubished 113 articles today with a further 115 articles ready to publish; This would indicate to me that it could be producing content twice as quickly as it is.

    here's a new years gamble then, let's double up publish speed and see what happens with the big g.. infact, perhaps a vague increase gradually increasing over two days would be better, and some alternating atricle posters..?

    I'll let you know what I decide and how it pans out.. 336 uniques to beat!

    ps: slow delisting is happening - which is GREAT!!! as the site is being de-listed as a blog, and listed as a "real" site in the proper serps instead - I've mananged to get it to flick between the two twice, so I think I've finally figured out what the "technical" difference between a blog and a "site" is (as far as g is concerned, at this time) - probably already aout of date :'(
     
    blacknet, Dec 31, 2007 IP
  19. blacknet

    blacknet Active Member

    Messages:
    709
    Likes Received:
    16
    Best Answers:
    2
    Trophy Points:
    70
    #19
    New Year Update!

    First Off.. HAPPY NEW YEAR GUYS it's 1AM here in the uk :)

    Stats:
    I'd got my figures wrong previously, had been generating reports with a time offset wrong (by an hour) *doh* so here's the daily update and correct figures.
    28/12/2007 177
    29/12/2007 69
    30/12/2007 350
    31/12/2007 413

    that's all uniques with my own ip's and bot's removed. only source of traffic is serps - fully automated system.

    well that's it! it's going well.. also managed to get me a nice little domain "usyou.com" for free thanks to register.com of all people (and it has backlinks in yahoo and gblogs + webmaster info in google webmaster tools).

    the best bit i guess is the phrase "us you" has 580,000,000 pages in g, and thankfulyl that means I don't need to seo for anything, seeing as almost every paragraph of text ever will have the words "us" and "you" in there ;):eek:

    so hopefully by next week we'll be on round 2 :p

    ps: goal of all this. I want atleast half of cnet's "tv.com" traffic, if not all of it
     
    blacknet, Dec 31, 2007 IP
  20. blacknet

    blacknet Active Member

    Messages:
    709
    Likes Received:
    16
    Best Answers:
    2
    Trophy Points:
    70
    #20
    Quick Update..

    site's been left to it's own devices, all traffic is organic and coming in nicely.. ad's are being clicked! stats are as follows:

    28/12/2007 177
    29/12/2007 69
    30/12/2007 350
    31/12/2007 413
    01/01/2008 399
    02/01/2008 360
    03/01/2008 374
    04/01/2008 494
    05/01/2008 498
    06/01/2008 514
     
    blacknet, Jan 7, 2008 IP