How the heck to scrape just a video from a html site?

BeachMaster Active Member

Messages:: 192

Likes Received:: 39

Best Answers:: 0

Trophy Points:: 68

#1

Hi. Does anyone know how to scrape this URL:

http://media2.ldscdn.org/assets/scr...-1-how-we-got-the-book-of-mormon-360p-eng.mp4

From the html site URL:

https://www.lds.org/media-library/video/book-of-mormon/book-of-mormon-stories?lang=eng

Thanks!

BeachMaster, Aug 21, 2016 IP

ixabhay Active Member

Messages:: 190

Likes Received:: 5

Best Answers:: 0

Trophy Points:: 88

#2

Inspect element>>CTRL+F>>.mp4

ixabhay, Aug 21, 2016 IP

Diskretni likes this.

Again3 Peon

Messages:: 6

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 1

#3

CTRL+U (view source code), after that, search by .mp4 with command CTRL+ F.
Search link with best quality, chapter 1 is 1080p. Copy full url and change \/ to /.
Paste the link in your browser and the video will start, to save it you can use command CTRL+S and choose directory.

I hope this can help you.

Again3, Aug 21, 2016 IP

PoPSiCLe Illustrious Member

Messages:: 4,623

Likes Received:: 725

Best Answers:: 152

Trophy Points:: 470

#4

He wants to _scrape_ the link, not load it in the browser - he can download the content from the page without searching for it - the video-link he posted IS the download link.

For scraping, the problem is that the download link is from a CDN (Content Delivery Network), so he'd need to scan the original HTML file for those links, and parse each individual link (depending on what he wants to download). You could perhaps use something like cURL (PHP) to do this - fetch the HTML, parse it, and load all external download links, and then use cURL to fetch the content from those links.

PoPSiCLe, Aug 21, 2016 IP

BeachMaster Active Member

Messages:: 192

Likes Received:: 39

Best Answers:: 0

Trophy Points:: 68

#5

PoPSiCLe said: ↑

He wants to _scrape_ the link, not load it in the browser - he can download the content from the page without searching for it - the video-link he posted IS the download link.

For scraping, the problem is that the download link is from a CDN (Content Delivery Network), so he'd need to scan the original HTML file for those links, and parse each individual link (depending on what he wants to download). You could perhaps use something like cURL (PHP) to do this - fetch the HTML, parse it, and load all external download links, and then use cURL to fetch the content from those links.
Click to expand...

Thanks for your response. To be honest I don't know much about coding, but I'm getting help from a developer to help explain how you would do this. He sent me this:

This is the code and explanation for everything line and why it exits in the code

loadHTML($content); // This function loads all the HTML from the given link $xpath = new DOMXPath($dom); $imgSrc = $xpath->query("/html/body/script[4]"); //This query is used for scrapping the link $fullstring = $imgSrc->item(0)->nodeValue; function get_string_between($string, $start, $end){ // This function is used to extract/get the required link(video) from raw data $string = ' ' . $string; $ini = strpos($string, $start); if ($ini == 0) return ''; $ini += strlen($start); $len = strpos($string, $end, $ini) - $ini; return substr($string, $ini, $len); } $parsed = get_string_between($fullstring, '"link":"', '","size'); echo stripslashes($parsed); // Print the extracted link to browser echo ''; // This is used to open extracted link in new tab ?>

Does any of this make sense to any of you??? Thanks

BeachMaster, Aug 24, 2016 IP

PoPSiCLe Illustrious Member

Messages:: 4,623

Likes Received:: 725

Best Answers:: 152

Trophy Points:: 470

#6

That would be PHP-code (horribly mangled since you just pasted it in without using code-brackets around it) - that is NOT the full code, though, since it's missing key components like the classes used and the function loading the content to begin with - but yes, it's a start, especially if you have all the components.

PoPSiCLe, Aug 24, 2016 IP

NaughtySpider Peon

Messages:: 10

Likes Received:: 1

Best Answers:: 0

Trophy Points:: 3

#7

If you want to scrape links only from one page you can use browser console and this JavaScript:
[].map.call(document.querySelectorAll('a[href*=".mp4"]'), function(a) { return a.getAttribute('href'); }).join('\n');
Code (JavaScript):

NaughtySpider, Sep 11, 2016 IP

BeachMaster Active Member

Messages:: 192

Likes Received:: 39

Best Answers:: 0

Trophy Points:: 68

#8

NaughtySpider said: ↑
If you want to scrape links only from one page you can use browser console and this JavaScript:
[].map.call(document.querySelectorAll('a[href*=".mp4"]'), function(a) { return a.getAttribute('href'); }).join('\n');
Code (JavaScript):
Click to expand...
Thanks. Do you know how to scrape ONLY the video like in the first thread example?

BeachMaster, Sep 17, 2016 IP

Log in or Sign up

How the heck to scrape just a video from a html site?

BeachMaster Active Member

ixabhay Active Member

Again3 Peon

PoPSiCLe Illustrious Member

BeachMaster Active Member

PoPSiCLe Illustrious Member

NaughtySpider Peon

BeachMaster Active Member

Useful Searches