starting to get consistent results :D

blacknet Active Member

Messages:: 709

Likes Received:: 16

Best Answers:: 2

Trophy Points:: 70

#1

so here's how just under 2 hours of my day went today

13:15 - came up with an idea for a web archiving website
13:24 - registered dayarchive.net and pointed to server
13:29 - knocked up a quick script to generate the site I wanted
13:41:19 - run the created script to generate the site
13:50:32 - site generated, started doing the css/template
14:22:04 - completed site tempaltes and css
14:23:39 - regenerated site

site made ..

14:29 - added the site to google webmaster tools - and submitted to google + yahoo add url
14:30 - pinged a few blog ping servers
14:30:07 - yahoo slurp visits (head call)
14:30:44 - moreoverbot visits
14:30:49 - moreoverbot grabs homepage
14:30:49 - moreoverbot grabs rss
14:30:54 - googlebot grabs rss
14:30:55 - googlebot grabs homepage
14:33:05 - first scraper comes in via google blog search
14:35:47 - googlebot grabs it's first 2nd level page
14:37:14 - blog pulse live grabs rss + homepage
14:39:44 - tecnorati bot grabs rss + homepage
14:39:54 through 14:45:58 - googlebot grabs 82 pages and indexes them
14:44:11 - another scraper
14:52:53 - first visitor from google.com with a search for: "Turista movie spoilers" (on front page for this already)
14:56:59 through 15:06:36 R6_feedFetcher harvests site (meanwhile....)
15:00:15 second real vistor comes in from (refer blocked) but real user
15:02:56 third real vistor comes in from google.com search for "lou mcfadden winesburg"
15:06:00 second visitor views a second page
15:07:35 fourth visitor lands from google.com search for "julia louis-dreyfus bares all"

it's now another 2 hours later and the sites just hit 38 visits from google and my first $1 in adsense

i guess the lesson is (and this is purely to myself).. I've spent 8 months developing a "perfect" bh system and it's still not finished - in 1 hour I used scripts I'd made months ago to get a site up in the top tens on 2/3 word phrases simply by keeping ti simple (and weirdly with no seo, just neat free for all content) - I could have made 500 sites in those 8 months, I guess you just can't (possibly shouldn't) automate everything - the human touch is what makes it work!

hope your all ahving a good holiday!

blacknet

blacknet, Dec 27, 2007 IP

Lethal7 Active Member

Messages:: 2,262

Likes Received:: 56

Best Answers:: 0

Trophy Points:: 90

#2

nice, wats ur url

Lethal7, Dec 27, 2007 IP

blacknet Active Member

Messages:: 709

Likes Received:: 16

Best Answers:: 2

Trophy Points:: 70

#3

first two lines mate...
13:15 - came up with an idea for a web archiving website
13:24 - registered dayarchive.net and pointed to server

registered.... ^^^^

going to add in the archiving in a mo and maybe monetize with something better than adsense! (however thats not my niche)

blacknet, Dec 27, 2007 IP

sudarshannus Peon

Messages:: 1,431

Likes Received:: 144

Best Answers:: 0

Trophy Points:: 0

#4

very good idea...Hope u can tell us what script u used?

sudarshannus, Dec 27, 2007 IP

Lethal7 Active Member

Messages:: 2,262

Likes Received:: 56

Best Answers:: 0

Trophy Points:: 90

#5

blacknet said: ↑

first two lines mate...
13:15 - came up with an idea for a web archiving website
13:24 - registered dayarchive.net and pointed to server

registered.... ^^^^

going to add in the archiving in a mo and maybe monetize with something better than adsense! (however thats not my niche)
Click to expand...

lmao, i just made myself look so stupid!

cant believe i missed that!

Lethal7, Dec 27, 2007 IP

blacknet Active Member

Messages:: 709

Likes Received:: 16

Best Answers:: 2

Trophy Points:: 70

#6

sudarshannus said: ↑

very good idea...Hope u can tell us what script u used?
Click to expand...

sure can.. it's not - made it myself

update:
traffic was going up and up - added in another level with another 380+ pages and now the big g has started de-indexing me

or should i say not re-crawling.. I'm currently tryign a few things to see if I can force them to come back..

update:00:21GMT
not sure what one did, but g didn't index or re-crawl - however in the last hour both msn and ask/teoma have done full spiders with the initial entrypoint being the new url of the rss feed - yahoo alos hit it 7 times ver the past 45 minutes..)
the change.. changed "link"s in rss to point to more rsss feeds rather than actual pages..

blacknet, Dec 28, 2007 IP

blacknet Active Member

Messages:: 709

Likes Received:: 16

Best Answers:: 2

Trophy Points:: 70

#7

another update!

I decided that adding in the extra pages was a bit of overkill - and seemed to be more of a negative, especially with over 1000 generated in 2 days (on a new domain).

Also discovered that gbot seemed to confuse itself with daily feeds, constantly checking /2007-12-27.xml for updates rather than /rss (so watch out for that in your logs guys)

Further, I realised that although the rss was blog like.. the front page wasn't, and perhaps overkill on links (100+) - that whole link to word count ratio killed it I assume; as did the big g, mass de-indexing occured (on blog search, and vague index started on the main serps)

Further more! the whole "update once a day" thing was doing me no favours at all.

So I've changed the whole site!! changes made:

page subjects are now gathered from multiple sources and checked to find "new trends" on the net - and actually verify them! the process is cron'd every 2 minutes

content retrieval happens every 2 minutes aswell, with pages pre-generated.

publishing of a single item occurs randomly sometime between 2 and 20 minutes. frontpage, rss and archives are all updated at every "publish".

the front page now list's in a blog stylee the latest 25, (as does the main rss) - with a full daily archive available.

so.. hows it working.. well the changes have been live for just under an hour and >

well the big g, technorati, moreover and spherebot have all hit index and /rss on every publish

on the first publish g went looking at the old rss feed, so I 301'd it to the new rss
2nd update g went to the correct rss and homepage
3rd update, big g realised the site was changing and sent in the proper bot to do a full crawl of everything int he rss

I must stress this is all within 32 minutes of changing the site. and more to the point G now classes it as a rela site and not a blog, so blog search = nothing whilst normal serps are http://www.google.com/search?hl=en&q=site:dayarchive.net&sa=N&tab=bw

you can see the changes by checking these two links:
new: http://dayarchive.net/
old: http://dayarchive.net/2007-12-28

if anybody wants any clarification just reply and I'll clarify whatever you want.

hope nobody minds, just keepign a report here, if not just for my own benefit [keeps me focussed]

blacknet, Dec 29, 2007 IP

eruct Well-Known Member

Messages:: 1,189

Likes Received:: 49

Best Answers:: 0

Trophy Points:: 108

#8

Cool script and nice idea for a site.
Some noob questions if you don't mind....
In the header you have a google ad but it doesn't say 'ads by...', is that a glitch? Is that allowed? If so how?
Also, I'm curious as to where the feeds are coming from.

eruct, Dec 29, 2007 IP

blacknet Active Member

Messages:: 709

Likes Received:: 16

Best Answers:: 2

Trophy Points:: 70

#9

yeah sure thing:
google ads are "referer" ad's, text link only much nicer one thinks! they're also embedded in the text right near "more" to hopefully get some extra clicks in a legal manner.

the feeds, well they're not actually all feeds - it's a cross reference between a few yahoo api's, google apps, my own db's and some general rss feeds filtered by time. there are 700+ scripts working together to find the data, and 3 to build the site, 2 to display lol.

edit: the pages displayed aren't actually any feeds, they're pre-generated static pages, which where gen'd when the new "hot term" is found. - cron's great

blacknet, Dec 29, 2007 IP

blacknet Active Member

Messages:: 709

Likes Received:: 16

Best Answers:: 2

Trophy Points:: 70

#10

another update:
big g was only doing token index hits, so I changed to doing a proper rpc post ping to them, sure enough the big g came and spidered again, hitting the rss, getting the changes and spidering them.

indexed within 2 minutes of crawl - nice

update: 48 minutes later, and after another 5 updates and 3 g index hits, another crawl and index! somethings working

blacknet, Dec 29, 2007 IP

blacknet Active Member

Messages:: 709

Likes Received:: 16

Best Answers:: 2

Trophy Points:: 70

#11

update: now the big g is crawlign and indexing every page within 30 seconds of publish joy - cracked it?

blacknet, Dec 29, 2007 IP

eruct Well-Known Member

Messages:: 1,189

Likes Received:: 49

Best Answers:: 0

Trophy Points:: 108

#12

Cool. Thanks for the answers.

Does your script have the ability to filter more specific results rather than just the search trends?

eruct, Dec 29, 2007 IP

gnatfish Peon

Messages:: 86

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#13

Thanks for posting all this, it's very interesting!

gnatfish, Dec 29, 2007 IP

tonyrocks Active Member

Messages:: 1,574

Likes Received:: 50

Best Answers:: 0

Trophy Points:: 88

#14

Nice work! I bet you get de-listed in a matter of 3 days

tonyrocks, Dec 29, 2007 IP

blacknet Active Member

Messages:: 709

Likes Received:: 16

Best Answers:: 2

Trophy Points:: 70

#15

eruct said: ↑

Does your script have the ability to filter more specific results rather than just the search trends?
Click to expand...

Well yes, the scripts collectively can do pretty much anything, it's just a matter of putting them together in a way that works for what you want

tonyrocks said: ↑

Nice work! I bet you get de-listed in a matter of 3 days
Click to expand...

Bet I don't, and if I do bet the sites back in within 24 hours - already been down the delisting route (within 36 hours), counteracted it, and this is three days later!
edit: thats a bit big headed actually, i hope i don't get de-listed, and will do everything I can to prevent and counter it, end of the day it's out of my total control though *shrugs*

Update
well I just left the system to work away by itself over night, and sure enough googles been hitting on every publish, and doing a full update every 2-4 publishes (roughly every 45 minutes), same with moreover, sphere scout, technorati etc.
googles now got all 169 published pages indexed and is passing traffic through quite frequently. Should eb an interesting day today

unique visitors (not inc spiders or myself):
Fri 12/28/2007 : 169
Sat 12/29/2007: 77
Sun 12/30/2007: 233 (so far)

remember, friday was launch one, which got listed great then delisted over saturday, saturday late on through to sunday is the new method.

edit update
on closer inspection I found that big g hadn't actually done a full index for over 2.5 hours, so I got my head to thinking why - final reasons where
1: site had been running out of content to publish, thus publishing less frequently
2: i was using a loop to generate the index page which was making it 1.8 seconds to generate

actions taken:
1: broadened filters to allow an extra 3 phrases per 5 minutes to be checked and verifired, this means that there's always something to publish, without overkilling and making it too obvious where db sources are coming from and getting done for dup content
2: made script to pre published index page on ever site update, thus now static html (with a little scripting twist)
3: also removed some google ads, as they where killing load tims in normal browsers

results
big g is back to crawlign and indexing every 40 (rough) minutes (every third publish)

traffics had a major boost aswell thanks to a few long tails

blacknet, Dec 30, 2007 IP

blacknet Active Member

Messages:: 709

Likes Received:: 16

Best Answers:: 2

Trophy Points:: 70

#16

Another Day; Day 4,

Had moments of paranoia over the past 24, mainly because I noticed that with all the changes to the system I'd managed to gain myself a few repeat pages; coupled with the fact I've been monitoring logs line by line to see exactly whats going on, and the reaction to each change.

To cut it short, gbot was hitting 3 day old pages that where almost exact duplicates of pages the system had just published about 20 minutes earlier, after g did this 3 times in a row; and the sites indexed page count dropped by 3 I feared the worst; Reacted and deleted 630 rss feeds and mod_rewrite'd a quick fix to 404 everything prior to the change.

An hour later and gbot hadn't appeared back - oh hell - checked the serps and the indexed pages count had gone up by 230 (with the old pages) - waited another hour and g was both frequently crawling the new pages and doing a slow cache forming crawl of the old pages (all getting 404'd).

I took a gamble and removed the rules allowing all the old content back, and removing a couple of the duplicates manually; seems to have paid off! Really though, either possible action was a gamble..

Stats
Other than that yesterday was all about monitoring and making sure things are going as planned; as this is only a tiny practical test of something far bigger thats been months (years) in the making

Stats Update (unique visitors, spiders and myself removed):
28/12/2007 169
29/12/2007 77
30/12/2007 336
31/12/2007 184 (so far, its early..)

pages in g: 327
latest page in index: 22 minutes ago

happy new year

blacknet, Dec 31, 2007 IP

isaa Well-Known Member

Messages:: 315

Likes Received:: 7

Best Answers:: 0

Trophy Points:: 110

#17

Well done, blacknet! There's a lot to be said about spontaneity, for just going out there and implementing an idea the minute it occurs. Good for you. Wishing you continued success with it.

isaa, Dec 31, 2007 IP

blacknet Active Member

Messages:: 709

Likes Received:: 16

Best Answers:: 2

Trophy Points:: 70

#18

many thanks Isaa

a quick update, the site just hit the 290 unique mark for the day a few minutes ago, and the system has pubished 113 articles today with a further 115 articles ready to publish; This would indicate to me that it could be producing content twice as quickly as it is.

here's a new years gamble then, let's double up publish speed and see what happens with the big g.. infact, perhaps a vague increase gradually increasing over two days would be better, and some alternating atricle posters..?

I'll let you know what I decide and how it pans out.. 336 uniques to beat!

ps: slow delisting is happening - which is GREAT!!! as the site is being de-listed as a blog, and listed as a "real" site in the proper serps instead - I've mananged to get it to flick between the two twice, so I think I've finally figured out what the "technical" difference between a blog and a "site" is (as far as g is concerned, at this time) - probably already aout of date :'(

blacknet, Dec 31, 2007 IP

blacknet Active Member

Messages:: 709

Likes Received:: 16

Best Answers:: 2

Trophy Points:: 70

#19

New Year Update!

First Off.. HAPPY NEW YEAR GUYS it's 1AM here in the uk

Stats:
I'd got my figures wrong previously, had been generating reports with a time offset wrong (by an hour) *doh* so here's the daily update and correct figures.
28/12/2007 177
29/12/2007 69
30/12/2007 350
31/12/2007 413

that's all uniques with my own ip's and bot's removed. only source of traffic is serps - fully automated system.

well that's it! it's going well.. also managed to get me a nice little domain "usyou.com" for free thanks to register.com of all people (and it has backlinks in yahoo and gblogs + webmaster info in google webmaster tools).

the best bit i guess is the phrase "us you" has 580,000,000 pages in g, and thankfulyl that means I don't need to seo for anything, seeing as almost every paragraph of text ever will have the words "us" and "you" in there

so hopefully by next week we'll be on round 2

ps: goal of all this. I want atleast half of cnet's "tv.com" traffic, if not all of it

blacknet, Dec 31, 2007 IP

blacknet Active Member

Messages:: 709

Likes Received:: 16

Best Answers:: 2

Trophy Points:: 70

#20

Quick Update..

site's been left to it's own devices, all traffic is organic and coming in nicely.. ad's are being clicked! stats are as follows:

28/12/2007 177
29/12/2007 69
30/12/2007 350
31/12/2007 413
01/01/2008 399
02/01/2008 360
03/01/2008 374
04/01/2008 494
05/01/2008 498
06/01/2008 514

blacknet, Jan 7, 2008 IP

Log in or Sign up

starting to get consistent results :D

blacknet Active Member

Lethal7 Active Member

blacknet Active Member

sudarshannus Peon

Lethal7 Active Member

blacknet Active Member

blacknet Active Member

eruct Well-Known Member

blacknet Active Member

blacknet Active Member

blacknet Active Member

eruct Well-Known Member

gnatfish Peon

tonyrocks Active Member

blacknet Active Member

blacknet Active Member

isaa Well-Known Member

blacknet Active Member

blacknet Active Member

blacknet Active Member

Useful Searches