ypscraper for sql DB????

Discussion in 'Databases' started by indyonline, Jun 26, 2009.

  1. #1
    Anybody know how to make the yp scraper work with PHPLD? It lets me format it to PHPLD and import it and finds the links but it does not put them in the right category and it does not arrange all the info where it should it be. It puts the address, city, state, Zip all in the description.
    Any help would be appreciated. Thanks.
     
    indyonline, Jun 26, 2009 IP
  2. Social.Network

    Social.Network Member

    Messages:
    517
    Likes Received:
    15
    Best Answers:
    0
    Trophy Points:
    35
    #2
    I am not familiar with the Yellow Pages Scraper script, but can try to assist. Does the script generate a phpLD-specific .SQL file? If so, can you post a couple of the INSERT statements? Also, which version of phpLD are you using?
     
    Social.Network, Jun 26, 2009 IP
  3. indyonline

    indyonline Prominent Member

    Messages:
    4,626
    Likes Received:
    248
    Best Answers:
    2
    Trophy Points:
    335
    #3
    Its the most recent version of PHPLD.

    -- PHPLD Generator Invoked For YellowScraper: numbers taken out
    INSERT INTO PLD_LINK ( `TITLE`, `URL`, `STATUS`, `VALID`, `RECPR_VALID`, `CATEGORY_ID`, `DESCRIPTION` ) VALUES ( '"+ listing.name +"', '#', '2', '1', '1', 'high schools', 'Unknown Street, Unknown City, Unknown State Unknown ZIP, Unkown Telephone Number' );
    INSERT INTO PLD_LINK ( `TITLE`, `URL`, `STATUS`, `VALID`, `RECPR_VALID`, `CATEGORY_ID`, `DESCRIPTION` ) VALUES ( 'Arlington High School', '#', '2', '1', '1', 'high schools', '4825 N Arlington Ave, Indianapolis, IN 46226, (317) 226-2345' );
    INSERT INTO PLD_LINK ( `TITLE`, `URL`, `STATUS`, `VALID`, `RECPR_VALID`, `CATEGORY_ID`, `DESCRIPTION` ) VALUES ( 'Arsenal Technical High School', '#', '2', '1', '1', 'high schools', 'Unknown Street, Unknown City, Unknown State Unknown ZIP, (317) 693-5420' );
    INSERT INTO PLD_LINK ( `TITLE`, `URL`, `STATUS`, `VALID`, `RECPR_VALID`, `CATEGORY_ID`, `DESCRIPTION` ) VALUES ( 'Arsenal Technical High School', '#', '2', '1', '1', 'high schools', '1500 E Michigan St, Indianapolis, IN 46201, (317) 693-5300' );
    INSERT INTO PLD_LINK ( `TITLE`, `URL`, `STATUS`, `VALID`, `RECPR_VALID`, `CATEGORY_ID`, `DESCRIPTION` ) VALUES ( 'ben Davis High School', '#', '2', '1', '1', 'high schools', 'Unknown Street, Unknown City, Unknown State Unknown ZIP, (317) 243-5523' );
    INSERT INTO PLD_LINK ( `TITLE`, `URL`, `STATUS`, `VALID`, `RECPR_VALID`, `CATEGORY_ID`, `DESCRIPTION` ) VALUES ( 'Ben Davis University High School', '#', '2', '1', '1', 'high schools', '1155 S High School Rd, Indianapolis, IN 46241, (317) 227-1300' );
    INSERT INTO PLD_LINK ( `TITLE`, `URL`, `STATUS`, `VALID`, `RECPR_VALID`, `CATEGORY_ID`, `DESCRIPTION` ) VALUES ( 'Broad Ripple High School', '#', '2', '1', '1', 'high schools', '1115 Broad Ripple Ave, Indianapolis, IN 46220, (317) 693-5700' );
    INSERT INTO PLD_LINK ( `TITLE`, `URL`, `STATUS`, `VALID`, `RECPR_VALID`, `CATEGORY_ID`, `DESCRIPTION` ) VALUES ( 'Cathedral High School', '#', '2', '1', '1', 'high schools', '5225 E 56th St, Indianapolis, IN 46226, (317) 542-1481' );
    INSERT INTO PLD_LINK ( `TITLE`, `URL`, `STATUS`, `VALID`, `RECPR_VALID`, `CATEGORY_ID`, `DESCRIPTION` ) VALUES ( 'Covenant Christian High School', '#', '2', '1', '1', 'high schools', '7525 W 21st St, Indianapolis, IN 46214, (317) 390-0202' );
    INSERT INTO PLD_LINK ( `TITLE`, `URL`, `STATUS`, `VALID`, `RECPR_VALID`, `CATEGORY_ID`, `DESCRIPTION` ) VALUES ( 'Decatur Central High School', '#', '2', '1', '1', 'high schools', '5251 Kentucky Ave, Indianapolis, IN 46221, (317) 856-5288' );
    INSERT INTO PLD_LINK ( `TITLE`, `URL`, `STATUS`, `VALID`, `RECPR_VALID`, `CATEGORY_ID`, `DESCRIPTION` ) VALUES ( 'Emmerich Manual High School', '#', '2', '1', '1', 'high schools', '2405 Madison Ave, Indianapolis, IN 46225, (317) 226-2200' );
    INSERT INTO PLD_LINK ( `TITLE`, `URL`, `STATUS`, `VALID`, `RECPR_VALID`, `CATEGORY_ID`, `DESCRIPTION` ) VALUES ( 'Franklin Central High School', '#', '2', '1', '1', 'high schools', '6215 S Franklin Rd, Indianapolis, IN 46259, (317) 862-6646' );
    Code (markup):
     
    indyonline, Jun 26, 2009 IP
  4. Social.Network

    Social.Network Member

    Messages:
    517
    Likes Received:
    15
    Best Answers:
    0
    Trophy Points:
    35
    #4
    The address, city, state, zip, and phone are enclosed in ONE pair of quotes, so the values are treated as ONE value. Also, the value is inserted into the DESCRIPTION column as previously stated. I have a couple more questions, but will send you a PM.
     
    Social.Network, Jun 26, 2009 IP
  5. Social.Network

    Social.Network Member

    Messages:
    517
    Likes Received:
    15
    Best Answers:
    0
    Trophy Points:
    35
    #5
    I think the behavior reported is a limitation of phpLD. Please see the table structure for table PLD_LINK below. The company name is mapped to the TITLE column and the balance is mapped to the DESCRIPTION column. The DESCRIPTION column is the catch all for the company information. I think this is "as-designed", sorry.

    --
    -- Table structure for table `PLD_LINK`
    --

    CREATE TABLE `PLD_LINK` (
    `ID` int(11) NOT NULL auto_increment,
    `TITLE` varchar(255) NOT NULL default '',
    `DESCRIPTION` longtext,
    `URL` varchar(255) NOT NULL default '',
    `CATEGORY_ID` int(11) NOT NULL default '0',
    `RECPR_URL` varchar(255) default NULL,
    `RECPR_REQUIRED` tinyint(4) NOT NULL default '0',
    `STATUS` int(11) NOT NULL default '0',
    `VALID` tinyint(4) NOT NULL default '0',
    `RECPR_VALID` tinyint(4) NOT NULL default '0',
    `OWNER_ID` int(11) default NULL,
    `OWNER_NAME` varchar(255) default NULL,
    `OWNER_EMAIL` varchar(255) default NULL,
    `OWNER_NOTIF` int(11) NOT NULL default '0',
    `DATE_MODIFIED` timestamp NOT NULL default CURRENT_TIMESTAMP on update CURRENT_TIMESTAMP,
    `DATE_ADDED` timestamp NOT NULL default '0000-00-00 00:00:00',
    `HITS` int(11) NOT NULL default '0',
    `LAST_CHECKED` datetime default NULL,
    `RECPR_LAST_CHECKED` datetime default NULL,
    `PAGERANK` int(11) NOT NULL default '-1',
    `RECPR_PAGERANK` int(11) NOT NULL default '-1',
    `FEATURED_MAIN` int(11) NOT NULL default '0',
    `FEATURED` int(11) NOT NULL default '0',
    `EXPIRY_DATE` datetime default NULL,
    `NOFOLLOW` tinyint(4) NOT NULL default '0',
    `PAYED` int(11) NOT NULL default '-1',
    `LINK_TYPE` int(11) NOT NULL default '0',
    `IPADDRESS` varchar(15) default NULL,
    PRIMARY KEY (`ID`),
    KEY `PLD_LINK_TITLE_IDX` (`TITLE`),
    KEY `PLD_LINK_URL_IDX` (`URL`),
    KEY `PLD_LINK_CATEGORY_ID_IDX` (`CATEGORY_ID`),
    KEY `PLD_LINK_STATUS_CATEGORY_ID_IDX` (`STATUS`,`CATEGORY_ID`),
    KEY `PLD_LINK_HITS_IDX` (`HITS`),
    KEY `PLD_LINK_FEATURED_IDX` (`FEATURED`),
    KEY `PLD_LINK_EXPIRY_DATE_IDX` (`EXPIRY_DATE`),
    FULLTEXT KEY `PLD_LINK_DESCRIPTION_IDX` (`DESCRIPTION`)
    ) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ;
     
    Social.Network, Jun 27, 2009 IP
  6. LincolnAve

    LincolnAve Peon

    Messages:
    213
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    0
    #6
    This would be a great question to post on the yp scraper forums!

    Get it fixed at the source.

    You probably signed up for them if you bought it after Feb. 09.
     
    LincolnAve, Jun 27, 2009 IP