Call me a grumpy git, but the avalanche of threads and posts using almost unreadable English is becoming a problem. I have to wade through so many to find any decent content I can imagine it giving a bad impression to new comers and may even cause some to leave. I know I don't spend as much time on here as I used to years ago because of the decline in relevance and quality. Can anything be done about this? I fear for the reputation and overall quality of this great forum. It's so obvious that a lot of people post for the sake of posting and nine times out of ten they have poor English. Sitepoint doesn't have this problem to this extent, what are they doing right?
We've actually built a tons of internal stuff that stops most of it (I know it doesn't seem like it, but you should see what happens when we turn it off). We are always looking for more options/creative ways to cut it down further... So if you have ideas we haven't thought of... As far as Sitepoint, I suspect they have far less people trying to spam. The bigger you are, the more spammers try to spam. And then we saw a 10x increase when we became the biggest. It makes sense from a spammers point of view. If you ate going to spend the same amount of time spamming a site, it makes sense to spam the one that gets the most eyes...
Thanks for your reply Shawn I now you and your team are doing tonnes to address these problems and I do appreciate all the work and your customer service. I'm not sure what extra could be done. Maybe looking at it from an AI/PROLOG perspective there could be software to do lexical analysis, like looking at [noun] [verb] constructs of English sentences, but this is far from simple. It's not just the outright spam, but a lot of non-spam posters just have a poor standard of English.
What if the min character limit for a post is increased? Its 10 characters right now. Try it with 10 words or 20 words.
It's actually something I was looking at (a full blown backend grammatical checker for new posts). Although it doesn't look like they are viable at this point (too many false positives and false negatives). Google's Prediction API looks pretty good... But have not had time to really dig into it (it required "training").
While I agree that a grammar checking algorithm would be awesome, I would imagine that would eat an unreasonable amount of system resources for each post. Instead, why not make it so if a post gets a certain number of "down votes" or "bad reputation" then the post gets hidden or deleted? Also, I didn't realize, until now, that it's the actual owner responding to these posts... Shawn, any chance you want to come party in Vegas? Beers on you? lol
That's awesome Shawn, maybe in a few years they'll be viable I like ShrinkCSS's idea about rep. putting more of the work on we the users may be a good option. Coupling it with checking a user's location (india, china etc..) could help too.
It's not so much about required resources, rather the systems I've looked at just aren't that good. Re: Vegas, sure, I'll be right over. Yeah... I have a really big list of possible options that I go through when I have time. Often it's not just looking at how well stuff works in theory, rather I have to build all sorts of stuff just to test it. There are some stuff that I already know I'm going to build/use live... Just a matter of having the time to build it. But nothing will ever be good enough in my mind, so I'm always looking at additional options/ideas.
That's great Shawn, I look forward to those future developments. Are you a programmer/developer by trade?