1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

How to check reciprocal link?

Discussion in 'PHP' started by Alvin, Apr 26, 2007.

  1. #1
    Hello friends,

    I am looking for a function to check reciprocal link...

    can experts guide me with this?
     
    Alvin, Apr 26, 2007 IP
  2. jestep

    jestep Prominent Member

    Messages:
    3,659
    Likes Received:
    215
    Best Answers:
    19
    Trophy Points:
    330
    #2
    Something like:

    
    
    $myUrl = 'http://www.mysite.com';
    $recipUrl = 'http://somesite.com';
    
    $pos = strpos($myUrl, file_get_contents($recipUrl));
    
    if($pos === false){
    
    //link is not there
    
    } else {
    
    //link is there
    
    }
    
    
    PHP:
     
    jestep, Apr 27, 2007 IP
  3. Alvin

    Alvin Notable Member

    Messages:
    2,076
    Likes Received:
    164
    Best Answers:
    0
    Trophy Points:
    210
    #3
    Thanks Jestep but this doesn't work for me...
     
    Alvin, Apr 27, 2007 IP
  4. Subikar

    Subikar Active Member

    Messages:
    241
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    60
    #4
    $myUrl = 'http://www.mysite.com';
    $recipUrl = 'http://somesite.com';

    $content=file_get_contents($recipUrl,FALSE,NULL,0,20);
    $pos = strpos($myUrl, $content);

    if($pos === false){

    //link is not there

    } else {

    //link is there

    }


    Try this one I think this will solve the problem.
     
    Subikar, Apr 28, 2007 IP
  5. Gordaen

    Gordaen Peon

    Messages:
    277
    Likes Received:
    12
    Best Answers:
    0
    Trophy Points:
    0
    #5
    file_get_contents might be set to local files only, in which case you would just need to use CURL.
     
    Gordaen, Apr 28, 2007 IP
  6. ErectADirectory

    ErectADirectory Guest

    Messages:
    656
    Likes Received:
    65
    Best Answers:
    0
    Trophy Points:
    0
    #6
    This is the method that most sites use to detect reciprocal links. I think it's a terrible way because it's very easy to build a hidden page (not linked in menu or sitemap) and feed that to the script. Your link will be found by file_get_contents() but will never be found by a spider. What good is that really?

    While we could build a spider for this that grabs every link found in a site and test to see if the recip link page exists there, that seems to be a bit of an overkill as many sites have 10's of 1000's of pages. Bad for processor consumption.

    Any suggestions on a better way to check reciprocal links? We could easily write something that assures that the recip link page is 1 click away from the home page but if you get further than that you could really be in a mess because most sites link their site map page from their index.
     
    ErectADirectory, Apr 28, 2007 IP
  7. ruby

    ruby Well-Known Member

    Messages:
    1,854
    Likes Received:
    40
    Best Answers:
    1
    Trophy Points:
    125
    #7
    Well you could check to see if that age was indexed in Google and/or chck the PageRank is greater than 0.
     
    ruby, Apr 28, 2007 IP
    ErectADirectory likes this.
  8. Alvin

    Alvin Notable Member

    Messages:
    2,076
    Likes Received:
    164
    Best Answers:
    0
    Trophy Points:
    210
    #8
    Anyway I am talking about the one particular page not the whole site! :rolleyes: :rolleyes:



    :confused: :confused:
     
    Alvin, Apr 28, 2007 IP
  9. brealmz

    brealmz Well-Known Member

    Messages:
    335
    Likes Received:
    24
    Best Answers:
    3
    Trophy Points:
    138
    #9
    use file_get_contents

    download your php.ini if any and if none create it:

    add or edit this line:

    
    allow_url_fopen = on
    
    Code (markup):
     
    brealmz, Apr 28, 2007 IP
  10. Pat Gael

    Pat Gael Banned

    Messages:
    1,331
    Likes Received:
    68
    Best Answers:
    0
    Trophy Points:
    0
    #10
    There are also freeware programs to do such task without the hassle to setup a script, just do a research at www.download.com
     
    Pat Gael, Apr 29, 2007 IP
  11. NoamBarz

    NoamBarz Active Member

    Messages:
    242
    Likes Received:
    5
    Best Answers:
    0
    Trophy Points:
    58
    #11
    if the site you are checking for the reciprocal in has a sitemap, you could search the sitemap to see that a link to the reciprocal page exists. I am talking about an HTML sitemap. However, I can think of no reason why you couldn't also check the XML sitemap. If the site doesn't have an XML sitemap, maybe the reciprocal isn't worth anything anyway...
    I haven't checked this method, but I think it should work.
    Hope this helps...
     
    NoamBarz, Apr 29, 2007 IP
    ErectADirectory likes this.
  12. abdussamad

    abdussamad Active Member

    Messages:
    543
    Likes Received:
    17
    Best Answers:
    0
    Trophy Points:
    60
    #12
    One way would be to use the xml or DOM functions of PHP to parse the HTML document and search for your site's url. The code posted above will not work if the link partner should choose to add your site's URL as a comment :D :

    
    <!-- http://lovelysite.com -->
    
    Code (markup):
     
    abdussamad, Apr 29, 2007 IP
    ErectADirectory likes this.
  13. ErectADirectory

    ErectADirectory Guest

    Messages:
    656
    Likes Received:
    65
    Best Answers:
    0
    Trophy Points:
    0
    #13
    Agreed but automated querying is against google's terms & conditions. Not saying it is not possible, I've personally implemented that code, but is this recip checking script worth a potential G ban? I could do it with the others (y!, msn, jeeves, etc) but most require an API so the script would not be very portable as I'm looking to distribute this script.

    Obviously you would want to check the actual page for the existence of your link. What we are discussing it verifying that the one page your link is on can be found by the search engine's spiders. If not, your reciprocal link counts for nothing.

    Great idea Barz, but what would be the correct way to teach a program to find the sitemap page?

    if (strpos("site", $linkLocation) && strpos("map", $linkLocation))
    PHP:
    What about those people who just use an xml feed for their sitemap? search all xml.php & feed.php & etc also? That is a lot of overhead.

    @jestep - all that script will do is just find if the url exists on the page, not that it is located in a link. Looking for a link would be a bit like this

    $pattern = '/((?<=href=")).*?(?=")/i';
    $urlcontents = file_get_contents($recipUrl) ;
    preg_match_all($pattern, $urlcontents, $matches);
    PHP:
    Or something similar.

    Personally, I think Google's PR check is the best option but it comes with some dangerous hazards, and being banned is not really an option.

    So I guess I must defer my vote to 1: check the recip page for my link's existence then 2: check the index page for a link to the recip page.

    A side benefit of this is that my link will only be 1 click away from the homepage and will therefore carry a decent amount of weight. This will not make the recip givers happy as these pages are usually deep in their linking structure. This might end up being bad for business.

    I am looking to do advanced reciprocal checking myself for a script I am involved in. I am interested in hearing everyone's opinions on how to best verify that 1: the link exists on the recip page & 2: that the recip page exists in the internal linking of a site.

    I need this all automated so admin input of a sitemap is not an option unless the user supplies it at submission time.
     
    ErectADirectory, Apr 29, 2007 IP
  14. ErectADirectory

    ErectADirectory Guest

    Messages:
    656
    Likes Received:
    65
    Best Answers:
    0
    Trophy Points:
    0
    #14
    Good point, I forgot about having your link surrounded by comment tags. Comments inside them don't get indexed so I doubt that search engine spiders actually follow the links inside.

    The code I posted above does not take into account the rel=nofollow tag either, it simply looks for 'href=' --> '"' and grabs what is in between. I am more certain that spiders follow these tags but give little weight to these links in terms of importance & PR. Some, but not much.

    This is an important issue and +rep will be given by me to all thoughtful & constructive advice. Thanks in advance for your continued conversation in this matter.
     
    ErectADirectory, Apr 29, 2007 IP
  15. NoamBarz

    NoamBarz Active Member

    Messages:
    242
    Likes Received:
    5
    Best Answers:
    0
    Trophy Points:
    58
    #15
    ErectADirectory,
    Using reciprocal links usually has to do with SEO considerations (although they aren't as effective as one might think). My suggestion about the sitemap was to read a specific file called sitemap.xml. This file is accessed by search engines and they use it when indexing pages of a site. You can visit Google Sitemaps to see the exact format of that file. Site owners that sign up for web directories usually do so for SEO considerations. One of the first rules every SEO learns is that you MUST create an XML sitemap and place it in the top root directory of your website (and then sbmit it to Google Sitemaps). My point is that if someone signs up for a web directory, you can bet that person is thinking in terms of SEO and you can further bet that this person has included a sitemap.xml file in the top directory of his website. If that person did not do so, chances are that his website was not assigned a very good page rank by Google and as a result, the reciprocal link is not as useful (not at all actually). Now, when it comes to searching for the reciprocal within the xml file, life is easy. Google dictates exatly how the file is to be formatted - every URL on the site must look like this:

    <url>
    <loc>http://www.domain.com/</loc>
    <lastmod>2007-01-20</lastmod>
    <changefreq>monthly</changefreq>
    <priority>1.0</priority>
    </url>

    Searching for the reciprocal in a file such as this is very easy. You can even use PHP XML parser functions. The implementation is up to you since different people have different prefferences.
     
    NoamBarz, Apr 30, 2007 IP
  16. ErectADirectory

    ErectADirectory Guest

    Messages:
    656
    Likes Received:
    65
    Best Answers:
    0
    Trophy Points:
    0
    #16
    Noam,

    Thanks for your reply. Reciprocal links usually have to do woth SEO purposes but that does not mean we can rely on standards as they come in many different formats, shapes and sizes.

    The purpose of a sitemap has many defintions to many different people. While Google is the search engine authority this does not mean that everyone follows their rules. Furthermore, it does not mean a site that does not have a sitemap.xml file cannot give a quality backlink (even though there is no such thing when speaking of recip links).

    Take the following for example:

    http://www.avivadirectory.com/sitemap.xml
    http://www.alivedirectory.com/sitemap.xml
    http://digitalpoint.com/sitemap.xml
    http://www.php.net/sitemap.xml
    http://sourceforge.net/sitemap.xml

    None of the above 5 files exist and I would gladly take a backlink from any of the sites listed.

    Your logic makes total sense but most webmasters do not. Personally, I use a sitemap page on all of my (recent) sites but I do not follow Google's guidelines as it makes more sense to me to make a site_map.php file. My sitemap is then pumped out of my db and is up to the minute accurate. Search engine spiders can easily crawl my sitemaps (and frequently do), but it is in html format for easy human consumption.

    Googlebot has no problem with this and still uses the file as a starting point on many crawls. The sitemap.xml file is more for use with static sites so that people who are not programmers can upload an xml file so G is aware when they add a new page.
     
    ErectADirectory, Apr 30, 2007 IP
  17. NoamBarz

    NoamBarz Active Member

    Messages:
    242
    Likes Received:
    5
    Best Answers:
    0
    Trophy Points:
    58
    #17
    ErectADirectory,
    I got your point and I guess it was my mistake for leaving something out.
    When the sitemap.xml file becomes too large, major sites simply zip the file. So they have a sitemap.xml.zip file or a sitemap.xml.gz file etc.
    Try the following examples:

    http://www.avivadirectory.com/sitemap.xml.zip
    http://www.alivedirectory.com/sitemap.xml.zip
    http://www.digitalpoint.com/sitemap.xml.zip
    http://www.php.net/sitemap.xml.zip
    http://sourceforge.net/sitemap.xml.zip

    By showing you these examples I am in now way trying to prove anything!
    I'm simply saying that in most cases, people take Google into consideration b/c it truly is the leading SE around + other search engines such as MSN and Yahoo also use the sitemap file.

    Your idea of using a sitemap.php file is a very good idea, by the way. I'm feeling kind of stupid for not thinking of that myself...

    What I would do is search the top directory for any file called sitemap regardless of what kind of file it is. I'm not 100% sure, but I think that in most cases you'll find something and will be able to use it. The overhead shouldn't be too serious either. The thing to keep in mind is that I think this is the logic search engines use, so why not use it ourselves?

    I'm sure that there are a lot of sites that still don't have a sitemap file. But maybe you'll agree that those sites aren't as important in terms of reciprocal links?

    Although I've never had to use it myself, I know that PHP can read zipped files. See http://il.php.net/zip for example.

    I think I'll try reading one just for practice. Let me know if you'd like me to hand over the code when I'm done. Not that I think you need my help...
     
    NoamBarz, Apr 30, 2007 IP
  18. thrawn

    thrawn Peon

    Messages:
    189
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    0
    #18
    Niether of the DP's sitemaps worked, niether .zip or .gz

    The requested URL was not found on this server.
     
    thrawn, Apr 30, 2007 IP
  19. ErectADirectory

    ErectADirectory Guest

    Messages:
    656
    Likes Received:
    65
    Best Answers:
    0
    Trophy Points:
    0
    #19
    OK to sum it up here are the best ways to check with their limitations. Best at top (my opinion).

    1. check for PR
    PRO - this indicates that google has indexed the page and gives it at least a little respect
    Con - Potential for Google ban because it's against their TOS to automate queries by a script.

    2. Check for sitemap
    Pro - All pages on a site should be linked on this page
    Con - Tough to automate. Sitemaps come in all shapes and sizes. Worst of all there are only loose guidelines of what these files are named, therefore tough to find. Can be zipped

    3. Check homepage for a link to the recip page
    Pro - 1. Assures the page with the link to your site is within 1 click of the homepage. 2. Easy to program, only required 2 page scrapes and very few 404s encountered
    Con - Low amount of recip links because not many want to give reciprocals out withing 1 click of the homepage. Even going 2 levels deep can cause problems if 100(0)'s of links get found

    4. Spider the whole site
    Pro - It will find the link if it exists
    Con - Very slow and a resource hog. Best suited for Perl (yuck)

    5. Only spider the reciprocal page
    Pro - Very fast
    Con - 1. Page may not be found in link structure. 2. Page may not be indexed.

    The worst part of it is that there really is no true guarantee that the link is there unless you hand check it. Consider the use of comment tags around the links and you will realize that you can never 100% automate this task ... but I really want to.

    More suggestions please!!!
     
    ErectADirectory, May 3, 2007 IP
  20. thrawn

    thrawn Peon

    Messages:
    189
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    0
    #20
    Why not just spider the reciprocal page, which would cut out all of those that don't exist. Then leave it to the web master's discretion to either check if it is a good page, or accept it because it exists?
     
    thrawn, May 16, 2007 IP