Is this possible? How would you make it?

Discussion in 'Programming' started by yyyk9, May 24, 2008.

  1. #1
    A php script.
    It will examine a page and return only the text.

    And tell which category it would be long to.

    * Arts & Entertainment
    * Shopping
    * Sports & Recreation
    * News
    * Business & Industrial
    * Health
    * Home & Garden
    * Culture & Society
    * Technology
    * Travel
    * Reference
    * Games
    * Employment & Recruiting
    * Education
    * Finance
    * Hobbies
    * Law
    * Parenting & Family
    * People & Relationships
    * Real Estate
    * Automotive

    Is it possible to create a php script that will do this and if so; What does it have to do?
    Does it examine keywords and determine which category it belongs to; and how would it know this? How would you do it if there are hundreds of thousands of keywords?

    Thanks.
     
    yyyk9, May 24, 2008 IP
  2. Phase

    Phase Active Member

    Messages:
    50
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    93
    #2
    Even though your post is still somewhat vague, I can tell you if there are "hundreds of thousands" of entries, you will need some kind of database (most common being MySQL). As for how, learn how. There's thousands of PHP tutorials, and PHP actually documents every single function it has on its website (php.net/functionnamehere).

    Telling you "how" would be giving you source code. Either learn to do it yourself or look on php script listing websites and find something similar.
     
    Phase, May 24, 2008 IP
  3. allaboutgeo

    allaboutgeo Peon

    Messages:
    85
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    0
    #3
    In PHP you can use file_get_contents() to get the page source code. Then using a regular expression parse the HTML and find the related information.
     
    allaboutgeo, May 24, 2008 IP
  4. yyyk9

    yyyk9 Peon

    Messages:
    670
    Likes Received:
    52
    Best Answers:
    0
    Trophy Points:
    0
    #4
    Thanks allaboutgeo.

    I understand the basics. But how would it be possible to categorize a page automatically to those categories?
    Would you need a giant database of keywords?
     
    yyyk9, May 26, 2008 IP
  5. dmccarthy

    dmccarthy Peon

    Messages:
    19
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #5
    To determine which category the page belongs to would be quite difficult - if you could - you would probably be working for google or the likes
     
    dmccarthy, May 26, 2008 IP
  6. MatthewDP

    MatthewDP Member

    Messages:
    40
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    41
    #6
    I'd create a dictionary of keywords for each category. Download the contents of the page, and see how many instances of each keyword are on the page, and then match it up with the category that has the most results.
     
    MatthewDP, May 26, 2008 IP
  7. yyyk9

    yyyk9 Peon

    Messages:
    670
    Likes Received:
    52
    Best Answers:
    0
    Trophy Points:
    0
    #7
    Where would you get a lot of relevant keywords? thanks!
     
    yyyk9, May 26, 2008 IP