Hello, I require some cURL help. I have started a little project about site for downloading videos from YouTube-like video sharing site (it's a small local video sharing site). The concept behind the videos in the site is this: They have around 10 servers for the videos. The video watching pages look like this http://site.com/play:12345678 (this is the unique ID of the video). The download location for the FLV file is like this: http://mediaXX.site.com/s/YY/ZZZZZZZZ.flv where: XX is the number of the random server where this exact video is hosted (ex. 01, 02, 03 - media01, media02, media03), YY is the first characters of the unique video ID and ZZZZZZZZ is the full video ID. I already did the part to parse the video watching URL and make it this way: http://mediaXX.site.com/s/12/12345678.flv, but the server number remains a problem. The only way to get the exact server ID is to cycle and download the pages on all the ten servers (media01, media02) and see the response codes each one returns. The cycle stops as soon as it finds a server that returns a mime/x-flv file and not a mime/xhtml. When the script cycles, the incorrect servers will return a small HTML file with a 404 error and when it reaches the correct one, it will return the FLV file. I already figured it out how to check the remote files, but I don't want it to download the file completely (I use cURL), because when it reaches the correct server, it will download the full FLV (which is usually a lot of Megabytes) only to check it's mime-type. How may I make cURL limit and download only a few required chunks of the file, only to check it's mime-type? I use DreamHost hosting, so using fopen, fsockopen is not an option. Any advice?
Most of these sites use xml for their processing for their flash players. You need to find out what the xml file is and it's $_GET parameters.
Okay, I went through the packets, and it seems you need to use curl to POST onLoad=%5Btype%20Function%5D ( if doesnt work use decoded form [type Function] ) vid=32e47744 ( your video id ) To this address, http://www.vbox7.com/play/magare.do This is infact like a xml, but in text form. You can do debugging with a html file like this <form method='post' action='http://www.vbox7.com/play/magare.do'> <input type='hidden' name='onLoad' value='[type Function]'> <input type='text' name='vid'> <input type='submit'> </form> HTML:
a "curl header only request" cycling through all servers until the right answer comes back would do the trick. if 404, check the next, if header is 200 OK, use the url
Yes, that was exactly what I was trying to do, but from my experiments, it takes quite some time to check the headers, so I assume that it downloads the remote file in order to check it's headers and that is what I try to avoid. @Kaizoku: Thanks man, it worked that way. Now I just have to figure out how to do the cURL POST and parse the output.
then let every single file server occasionally bring in his "have-list" to the main site server. then you save the time of "looking-if-there".
I don't think I understand what you mean. As far as I understand, you mean that I should insert all the IDs the users initiate for download into a database with their corresponding server ID, so the script would check if the video ID is in the database and get it's server ID is there, instead of cycling through the servers first?
no. when you tell curl "headers only" it will only fetchthe headers headers are very quick to fetch try it with curl on command line: curl --head http://google.com > xy.txt && less xy.txt HTTP/1.1 301 Moved Permanently Location: http://www.google.com/ Content-Type: text/html; charset=UTF-8 Date: Sun, 14 Sep 2008 20:56:20 GMT Expires: Tue, 14 Oct 2008 20:56:20 GMT Cache-Control: public, max-age=2592000 Server: gws Content-Length: 219 Code (markup):
I believe my host (DreamHost) doesn't support that, as it doesn't support fopen or stream_get_headers (or something like that) of remote locations and only supports cURL for this kind of stuff. Anyway, I have solved my issue thanks to Kaizoku (rep comes in a second) and I consider this thread may be closed.