I made a post about this on my blog a while ago but thought I'd share it with anyone who hadn't seen it... How often do you recieve email spam? I'm pretty bloody lucky in that I never get more than 20 or 30 spam emails a day, but they still piss me off. Anyway, I was thinking about what to do in order to stop recieving this crap, and a thought occured to me... Why not use it to my advantage! Basically, I set up rules in my gmail account that means that any emails containing certian phrases are emailed to a specific blogger address (The type you can send new blog posts to.) Now, every time I recieve a spam email, it gets forwarded to this blogger account and creates me a new page of content! Try it for yourself... It's free, and a real easy way of upping your page count. Here is an example site that I forward a couple of my google alerts to... http://search-engine-news-and-info.blogspot.com/ This is the first time I've posted a live link to it so it hasn't been indexed yet, but it would be interesting to see how much of that content is included in a site: query.... Anyway, now I want to be able to rip out aff id's from url's that are posted in spam I get sent and replace them with my own. I've been learning a bit of regex recently and that seems like a pretty good way to strip the URL's, but I was wondering what you guys would suggest using to forward the emails. Any input would be appreciated, and any script I knock up for this will be freely distributed. Cheers
That's an excellent idea. Not sure on the e-mail / scripting interaction. I've been wondering about it myself but can't for the life of me figure out how to let e-mail protocols interact with http. Copy-paste the e-mail content into a PHP form and hit submit. Do the reg exps and e-mail it on to blogger. That's no problemo. Getting the e-mail INTO PHP fully automatically is what I don't know.
That's a cool idea, I'll give it a go. With the amount of spam I get probably only take a week to have a 1000 or so page site.
Let me know the blogger e-mail address and I can forward you mine... A Google search throws up lots more now than back when I first looked at it: http://gvtulder.f2o.org/articles/incoming-mail/ [search=google]e-mail to php script[/search]
Ahhhh... Nice search As you know I am a bit of a php noob but this *should* be a pretty simple thing to knock up. The way I see this working is I keep the emails in a folder on a server in text format, then get php to parse that folder and use regex to strip out the aff url's. PHP then forwards it to the email addy I want, and voila - New post... Think that'd work? The biggest issue I can see cropping up is the number of different aff sites and id's there are out there, and writing enough regex rules to ber able to accomodate for any new url's that are found. Basically if the script fails to parse then strip out the URL's, I'd rather the email was sent to a different account so I can look at it and write regex rules accordingly...
I'd just have it to send the e-mail to a script and do it right there and then. No collecting business. Deleting aff ID's but keeping the base url requires only one regexp (I normally use substr with a couple of positions found).
That's making the assumption that all the links point to the same aff broker, while in fact there could be hundreds of different aff programmes contained within the emails. This is why I may have to end up writing a new rule for each affiliate broker.
Great idea for a free content stream, however - won't you get knobbled for posting duplicate content? A lot of sites have spam posted on them from what I've seen around the web?
just write a regex to search for http:// and take out everything upto the next / ie something like /http:\/\/(.*)\// Code (markup): That's not been tested but might work.
Yes what jlawrence says. You don't have to go for individual regexps. One that just recognizes URLs and deletes everything after the TLD will do.
OK... I was thinking of leaving all other URL's in but only taking out the aff ones.. Your way is MUCH easier... I think I'll be doing that RE: dup content... Not sure... I'll have to wait and see... I'll keep you informed.
I have no problem donating my Spam although I get a LOT of it so it may be too much for you. Of course the only downside is if a personal email ends up in the filter and its posted for everyone to see
That's ok... I'll PM you two an address each and we can have a comp to see who can get the highest page count the quickest... That'll be fun at first anyway. Once I work out whether or not this will actually work, I'll get the script working... As a side point, how difficult would it be for me to make sure that duplicate emails aren't sent... Would I need to store a record of it in a DB or something? That could get out of hand if the number of emails is really high.
Thats a legitimate issue. Every spam I get is usual up to 10 copies as they pull every possible email address and I have a catch-all in place.
Any suggestions for possible solutions guys? Like I say, I thought maybe putting the copy in a DB then cross referencing would work, but If you're trying to put say 10,000 emails a day into the DB (That number is high but I've seen it in the past) then I don't imagine it being all that long until the DB falls over. Saying that though, You only need 1 copy of each email to be able to distinguish the similarities... I guess HTML formatting should be stripped out too...