I need a custom web scraper, let me describe as much as I can as of what I am looking for. What I need is the following: * Scraper target would be: Torrents, Downloads, Online Movie Watch, Online TV Watch - and similar type of Domains. I need * API Script to analyze URLS (from thousands random domains) and extract the Movie/TV Title then call TMDB/TVMaze/Mashape/iMDB/API's/database/other to get the title info. For example, the scraper API get a call from me for the following URL: For Movies -> URL Example: http://www.hdmovieswatch.net/skyfall-2012-full-movie-online/ The scraper need to extract the title "Skyfall 2012" Then -> Get title info from tmdb/imdbapi/imdb/database/mashape/other & Detect if it is a Movie or TV Then -> To send back via GET/POST the following: Movie (Movie or TV), Movie ID, Title, Year, Language, Cast, Crew, Genres, Keywords, RunTime, Budget, Revenue, Movie Website, Release Information (Theater date, PG-13 etc), Rating (RottenTomatoes, iMDB, Metacritic), Recommended Movie Titles (TMDB/Other), Trailer YouTube URL, Poster URL (Mashape/TMDB), Titles in other Languages, Movie Summary, Time Stamp (When the data actually stored in our local db) For TV -> URL Example: http://onwatchseries.to/episode/game_of_thrones_s3_e6.html Then -> To get title info from tmdb/tvmaze/db/mashape/other & Detect if it is a Movie or TV Then -> To send back via GET/POST the following: TV (Movie or TV), TV ID (imdb), Title, Year, Network (HBO/Netflix/Other), Type (Scripted/Reality/Talkshow/Other), Language, Status (If the show still running or not), Days (When the show aired - for example "Sunday"), Show country (US, UK, CA etc), Aired TimeZone (America/New_York), Season (In this example s3 = Season 3), Episode (In this example Episode 06), Cast, Crew, Genres, Keywords, RunTime (For the given Episode), TV Website, Release Information (PG-13 etc), Rating (RottenTomatoes, iMDB, Metacritic), Recommended TV Titles, Episode Trailer YouTube URL, Poster URL, Titles in other Languages, TV Summary, Time Stamp (When the data actually stored in our local db) * The URL is vary from domain to domain - so /skyfall-2012-full-movie-online on torrent X can be /watch-skyfall-2012-full-movie-online on Y and /watch/free-movie-skyfall-2012-full-online on Z The best Idea to do that in my opinion (and easiest way): Have a list of terms to exclude (and I will personally will add manually the terms so no need to worry about that), such as: full, movie, online, watch, free, etc and the remain term is pretty much the Movie or TV title it self. Notice that this tool should work with other languages as well, hence why I think it is the best to do that by excluding terms. Also, it should be a waterfall type of scraper, as the fastest/easiest way to grab the title (URL else iMDB link inside the page else H1 etc etc). The tool should send results back as fast as possible once sending the URL. * The tool should communicate & have it's own database (MySQL/MongoDB/whatever) and store the results there because once the tool detect a Movie/TV from the given URL it should store it in database so the tool can check first if it is existed in database and if not then via the API. The idea behind it is simple, to avoid unlimited/tons of calls for the same title on imdb/tmdb/tvmaze over and over and to ensure future lookups for the same content faster. * In addition - the tool should be able to proccess Titles as well and not just URL's * So bottom line is that I need to send calls with URL/TITLE and I get back all TITLE details as described. * All responses must be valid JSON * Send me a quote via PM Cheers!
I can scrape, extract and post anything you need, but you have to show me the process manually. i can do it with ubot+winautomation if you still need this, add me for discussion: skype: ipowerhost2
This sounds like an awesome job I would love to do. If anyone is looking for this kind of work, please PM me -- scraping and automation is what I do!