I want to build a webcrawler which will crawl the SEO informations of a site such as count of backlinks, DA, PA, alexa rank, page loading speed etc. By using which language i can do that in a better way. If anyone explain me thoroughly then it will be really helpful.
You can grab all the on-page SEO information by scraping a site itself. The backlinks, PR, Alexa Rank and such is a bit more complicated to get on a per-site basis as you will need use API's For instance. To get Alexa ranking for this site you would : http://data.alexa.com/data?cli=10&dat=snbamz&url=http://www.digitalpoint.com and parse the relevant data. There are some libraries that help you, such as : https://github.com/eyecatchup/SEOstats SEOstats boasts 50+ methods for collecting SEO data. I'm sure there are pay services too, but I don't bother with them.
If I would have to do it, I would use available third party APIs to collect - "the SEO informations of a site such as count of backlinks, DA, PA..." instead of reinventing the wheel. You need to identify the most popular service providers in this field ( Majestic SEO, Moz tools, Google) and may choose to use APIs of more than one top providers to help users validate the rankings. Most of these APIs would support most of the popular programming languages - e.g. https://www.majesticseo.com/plans-pricing "Majestic SEO provides a set of 'connectors' that have been designed to ease integration with our API. We currently provide implementations in C#, Java, Perl, PHP, Python and Ruby - all of which ship with working examples." Moz API is based on HTTP/HTTPS or REST where you have to send http(s) requests to URL and get a json response in return. This can be implemented in any programming language. They have a free limited edition for small amount of data. But for finding information about on-page stuffs like keyword density, load time, meta tags you can directly query the website -> Internal pages and extract these information programmatic-ally by issuing simple GET requests. Possible in most of languages such as PHP and Java. Choice of languages depends on how you want it to be. If you don't want to use any of third party APIs and develop something like MajesticSEO of your own, I would suggest using a Node.Js + Java based back-end + distributed architecture backed by a good data-warehouse since the tool has to deal with lot of data. For displaying data to users you could use PHP or any web based framework. e.g. User sends a query to find stats for a link, the requests are passed to background crawlers in Java that would be running in parallel on several servers and this data can be persisted on a DB. Or your clients have provided a set of links for which they would need live SEO information, any time, then background crawlers can keep fetching the data for these links and dumping them to DB and for every user query you could lookup DB for info. Caching can be added for real time updates. If you want to use APIs and want to develop asap, you may use PHP + memcache (caching). Even you could have continuously running concurent back end jobs (Pearl/Python/Java) fetching real time data for your client's URLs and storing them in DB/cache.