Hey all, Long-time lurker, first time posted. Apologies for the lack of an intro message—if you really want me to bore you with that, I’ll do it retrospectively. I have something of a predicament, and essentially just want to see if this job is going to be as laborious as it looks. I’ve recently taken on the job of tackling a couple of thousand pages of content. As it stands, I am not allowed to mess around with the IA too much. What I do have to do, though, is find relationships between the pages. As it stands there is absolutely heehaw in the metadata that could be used to link them up, so writing something to look at that is out of the question. What’s the best way to tackle this? Without any tools, it’ll be a case of broadly identifying unifying themes between site sections, then using site specific search to try and note down what pages contain similar info. It’s a fairly laborious task, as you can imagine. Is there a better way to do this? Help! Thanks in advance!
It sounds like what you need is something to index the pages like an SE crawler does. This open source software at Source Forge might be a starting point: http://sourceforge.net/projects/phpcrawl/ Or you can look through the offerings from a Google search using web page indexing software as a search term. That will give you a mixed bag of offerings, both free and paid.