Best option for a grabing data off big htm php html site that is horribley coded

Discussion in 'PHP' started by xbat, Apr 3, 2013.

  1. #1
    Does anyone have any suggestions for grabbing data off a site that is horribly coded. For example the site has 5 rows on one listing 6 on another 8 on another. But nothing is coded on the back end to help identify also it is php and html but its all statically coded.

    So for example we have something like

    thing1 <br>
    thing2<BR>


    What would be the best possible quickest way to add this into a excel sheet or a mysql database. That is my goal. Also it on a local network so processing power will not be a issue.
     
    Solved! View solution.
    xbat, Apr 3, 2013 IP
  2. xbat

    xbat Well-Known Member

    Messages:
    326
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    105
    #2
    bump...
     
    xbat, Apr 5, 2013 IP
  3. YoGem

    YoGem Active Member

    Messages:
    676
    Likes Received:
    8
    Best Answers:
    2
    Trophy Points:
    90
    #3
    Depending on how much is horribly coded.. I will need to see the code! Can you PM me or show us this monstrous code?
     
    YoGem, Apr 5, 2013 IP
  4. xbat

    xbat Well-Known Member

    Messages:
    326
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    105
    #4

    It looks like this
    <div id="stuff">
    name<BR>
    item<BR>
    name<BR>
    item<BR>
    item2<BR>
    name<BR>
    item<BR>
    name<BR>
    item<BR>
    name<BR>
    item<BR>
    </div>
     
    xbat, Apr 6, 2013 IP
  5. YoGem

    YoGem Active Member

    Messages:
    676
    Likes Received:
    8
    Best Answers:
    2
    Trophy Points:
    90
    #5
    So, you will like to grab all this item and name and organize them better? Maybe grab all this in a and store in a database?

    It can be accomplished in a quite easy way, you may want to collect the contents of a div maybe using a bit regular expressions (ie: /'<div id\=\"stuff\">.*<\/div>/siU') then if what is inside the DIV is really divide in group of 2, you may will like to explode the result in a array and take values 2 by 2.

    But I still haven't understood clearly what you need :)
     
    YoGem, Apr 6, 2013 IP
  6. xbat

    xbat Well-Known Member

    Messages:
    326
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    105
    #6
    Yes you are correct about storing it in a mysql database. And no sometimes the listing changes to 4 or 5 options and its not marked inside the code. If it was marked inside the code like <div class="option1">something</div> <div class="option2"> something 2</div> I would know how to grab it and store it. But when its not closed off in the tags It doesn't store correctly. I just really need it to store correctly. That is my goal.
     
    xbat, Apr 7, 2013 IP
  7. #7
    If I understand the problem correctly, then it seems that there is nothing that differentiates the 'name' and 'item' .
    Does there happen to be a relation between names of 'name' and 'items'? If yes then maybe something can be regexed. Maybe number of letters, maybe first capitalized letter etc. Anything to latch on to?
     
    kutchbhi, Apr 7, 2013 IP
  8. xbat

    xbat Well-Known Member

    Messages:
    326
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    105
    #8
    Right now thats looking like my best option.
     
    xbat, Apr 16, 2013 IP
  9. nickardo

    nickardo Active Member

    Messages:
    81
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    70
    #9
    But after you regex it, you only have it sorted out.. right. So then you still need to write a script to add them do a database or excel file.
     
    nickardo, Apr 17, 2013 IP