Scraping Google SERPs with ColdFusion

Discussion in 'Programming' started by digga121, Aug 26, 2008.

  1. #1
    Scraping Google SERPs with ColdFusion


    <!---  We only need the domain name without the http://www. --->
    <cfset siteurl = "jasonbartholme.com">
     
    <!---  Regular expression which matches the pattern to determine count--->
    <cfset googleregex = '<font size=-1>Results [\s\S]*? of about <b>([\s\S]*?)</b&gt; for'>
     
    <!--- useragent is required because the page will not "get" properly otherwise --->
    <cfhttp url="http://www.google.com/search?q=site%3A#siteurl#"
    		method="get"
    		resolveurl="false"
    		useragent="#cgi.http_user_agent#">
    </cfhttp>
     
    <!---  Trims the whitespace in the content, and check for our regex pattern --->
    <cfset sdoc = trim(cfhttp.filecontent)>
    <cfset result = refindnocase(googleregex,sdoc,1,"true")>
     
    <!---  cftry/cfcatch to see if refindnocase() returned a result --->
    &lt;cftry&gt;
    	<cfset resultcount = replace(mid(sdoc,result.pos[2],result.len[2]),',','','ALL')>
    <cfcatch type="any">
    	<cfset resultcount = 0&gt;
    </cfcatch>
    </cftry>
     
    <!---  display our result --->
    Pages indexed: <cfoutput>#resultcount#</cfoutput>
    Code (markup):
    Original Post: http://www.jasonbartholme.com/scraping-google-serps-with-coldfusion/
     
    digga121, Aug 26, 2008 IP
  2. JasonBartholme

    JasonBartholme Peon

    Messages:
    396
    Likes Received:
    23
    Best Answers:
    0
    Trophy Points:
    0
    #2
    Thanks for mentioning my post, digga121 :)
     
    JasonBartholme, Aug 29, 2008 IP
  3. digga121

    digga121 Active Member

    Messages:
    158
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    53
    #3
    Hey !

    NP Great posts man !
     
    digga121, Aug 30, 2008 IP