Does anyone have any suggestions for grabbing data off a site that is horribly coded. For example the site has 5 rows on one listing 6 on another 8 on another. But nothing is coded on the back end to help identify also it is php and html but its all statically coded. So for example we have something like thing1 <br> thing2<BR> What would be the best possible quickest way to add this into a excel sheet or a mysql database. That is my goal. Also it on a local network so processing power will not be a issue.
Depending on how much is horribly coded.. I will need to see the code! Can you PM me or show us this monstrous code?
It looks like this <div id="stuff"> name<BR> item<BR> name<BR> item<BR> item2<BR> name<BR> item<BR> name<BR> item<BR> name<BR> item<BR> </div>
So, you will like to grab all this item and name and organize them better? Maybe grab all this in a and store in a database? It can be accomplished in a quite easy way, you may want to collect the contents of a div maybe using a bit regular expressions (ie: /'<div id\=\"stuff\">.*<\/div>/siU') then if what is inside the DIV is really divide in group of 2, you may will like to explode the result in a array and take values 2 by 2. But I still haven't understood clearly what you need
Yes you are correct about storing it in a mysql database. And no sometimes the listing changes to 4 or 5 options and its not marked inside the code. If it was marked inside the code like <div class="option1">something</div> <div class="option2"> something 2</div> I would know how to grab it and store it. But when its not closed off in the tags It doesn't store correctly. I just really need it to store correctly. That is my goal.
If I understand the problem correctly, then it seems that there is nothing that differentiates the 'name' and 'item' . Does there happen to be a relation between names of 'name' and 'items'? If yes then maybe something can be regexed. Maybe number of letters, maybe first capitalized letter etc. Anything to latch on to?
But after you regex it, you only have it sorted out.. right. So then you still need to write a script to add them do a database or excel file.