Hello everyone, I know very little about Javascript. But, from the research I've done, it seems like the best tool for me to use for a project I am trying to do. I wish to create an app that gets the content of a dynamic web page, looks for certain data, processes that data, and shows the user the results of that processing. I can handle the processing and the output. What I don't know is how to get the web page content. I need the code for doing so. When I research this topic online, I can find plenty of stuff that assumes I know everything about the language already, I can find stuff that would let me scrape a website without code (useless for creating an app), but I can find nothing that shows or explains the process to me. Can anyone here help?
Typically this would be done with a serverside process written in something like PHP and using curl, or a library based on it. The downside is that if the site is "serverless" and uses javascript to "hydrate" the page you won't get the latest version of the page. A downside of using javascript is that you'll hit CORS errors but you *may* get the final version. You should be able to find scripts that do this under the guise of valuing a website or doing an automated SEO review. You can get those, wade through the code, and use the bits that are useful. You probably won't find anything that "explains the process" but we'll be happy to answer any questions. If you can explain a bit more about what you need from the pages you're harvesting we may be able to give more specific advice - like if that info is available from an API
I may be underselling the paucity of my understanding, here. Still, if you can be patient with my figuring things out by asking questions, I would appreciate your help. I had in part selected Javascript because I know Electron can be used to turn Javascript code into an app. I have no idea how to go about that with PHP. Or anything else, for that matter. I am loathe to explain exactly what I intend in a place where everyone can code better than I, as I hope to make a little money with it. But I can explain a situation that should be functionally the same: I am trying to scrape a single page. I should imagine that would simplify the process greatly. Let us say that it is a page that sells books. If you mouse over the icon of one of the books, a small popup appear with data on the book in what looks like a number of different fields, such as author, publication date, a blurb from the book, and a review of it. I want all the data in the popup so the app can look for certain indications, entered by the user in advance, that the book would likely contain the information he or she is looking for, and is therefore worth examining more closely.
Most apps, whether they're web apps or mobile apps use a backend. PHP is super easy to get going with copy/paste and learn as you go. Because you're getting values from a popup I'd strongly recommend looking for an equivalent API. Sticking with the book example, I get a daily email from a site called BookBub with Amazone/Kobo etc freebies in the genres that I'm interested in. Now, I don't know how they're doing this but I'd imagine they have a web server that makes an API call to those booksellers asking for their lowest-priced books that day with a request size of, say, 1000. Once they have that they can serve the results to their mail program (ie my daily email) and to their website. There are some good reasons to use an API: the site actually wants to give you their information APIs don't change very often and they'll run their old API for a while giving you time to make the necessary changes APIs are normally free they can make changes to their website without breaking the code you are using you can use tools like "postman" to check and test I get that probably sounds a bit daunting, and yes, you have some upskilling to do BUT it will be worth it when you have your app working smoothly. FWIW I'm writing a mobile app that will be calling some public APIs but I'll be storing the result and only calling it again after x hours. There are loads of apis out there and you'll be amazed at what you can do.
OK, I'll look for an API that can do it, but I rather doubt there is one in this particular case. How would I go about finding one? I do know some PHP. But I have no idea how to turn code into an .exe file for installation, which I gather is what would be needed to sell it as an app.
As for there probably not being an API for this, well, a simple web search shows I don't know what I'm talking about, there. There seems to be one, and it may incorporate some of the data I want, but I'm not yet sure. Nor would I know how to use this to get what I want.
OK, you don't need to turn it into an .exe - that doesn't happen anymore. There will be tutorials on how to package up a web app - most will be free with in-app purchases. Keep asking questions, we can help you set up your API calls so that you import the info you need. I get that you don't want to expose your market but we can dance around the edges and still point you in the right direction.
I am not sure if I understand the idea, but from the top of my head, I can suggest a few approaches: Selenium webdriver. Thanks to this tool, you can open a webpage in Chrome and extract all data. You can move the mouse over any element and extract anything you need. Chromium. This is an open-source web browser. It can do the same as Chrome, but you can build it in your application. I hope that is helpful for you.
Sorry, I didn't use those tools. But I know my co-workers used them to create an application to test a website. And they were very successful. Also, one of my friends needed to buy a train ticket quite often, and he was opening a website and monitoring it all day. He was clicking F5 and checking if there were some tickets he needed. Then he created an application based on Selenium web driver, and now this application is monitoring the website all day and purchasing a ticket. So, I know that these two approaches work well, but I cannot explain how to work with them.
Hey! I'm new here. I know it's belated, but I wanted to help you in some way. But now I see that I myself learned a lot of new things.
Hello again, all. The API turned out to be no good. Frequently takes several minutes to return the requested data. So I need to revisit my scraper idea. I tried making a very basic one, to start getting familiar with the subject. In this test, I want to get the html for Google's home page. It doesn't work. Can anyone tell me why?
No, I wouldn't think so, this is for an app I hope to develop, which would have users needing to get the latest data from the API and use it immediately.
You can be hitting the api every minute on your server - if the goal is to improve performance it's still an option. Plus you know exactly how many API calls you'll make and that may stop the API costs ballooning
There's also the problem that there's no way to call the API without a specific name being searched for. I can't just download the whole API database. So this does not really seem to be a workable solution for this job. Does anyone know why my Javascript isn't working?
Right, that makes sense. I ran your code as a comparison in a jsfiddle https://jsfiddle.net/ezhnkj32/1/ and it worked just fine. You are probably getting cors errors if you're not hitting a known API end point - see my example. What's interesting is the speed difference of the three methods. I'd have expected yours to be the fastest but it was the slowest. console.log("lets go"); async function getWithThen(myRequest) { const startTime = new Date().getTime(); console.log(startTime); fetch(myRequest) .then(response => response.json()) .then(json => { console.log('here'); document.getElementById("output1").value = JSON.stringify(json); document.getElementById("time1").value = new Date().getTime() - startTime; }); } async function getWithAwait(myRequest) { const startTime = new Date().getTime(); console.log(startTime); const response = await fetch(myRequest); const json = await response.json(); console.log(json); console.log('here2'); document.getElementById("output2").value = JSON.stringify(json); document.getElementById("time2").value = new Date().getTime() - startTime; } async function getWithXhttp() { const startTime = new Date().getTime(); var xhttp = new XMLHttpRequest(); xhttp.open( "GET", "https://run.mocky.io/v3/7a7a924f-72dd-4cd7-aefa-12be3608e839", true ); xhttp.send(); xhttp.onreadystatechange = function() { if (this.readyState == 4 && this.status == 200) { document.getElementById("output3").value = this.responseText; document.getElementById("time3").value = new Date().getTime() - startTime; } }; } const myRequest = new Request("https://run.mocky.io/v3/7a7a924f-72dd-4cd7-aefa-12be3608e839", {}); getWithThen(myRequest); getWithAwait(myRequest); getWithXhttp(); Code (JavaScript):
OK, so if I understand you correctly, the code is fine, it's just that Google was denying me access as a security measure. Yeah? So, trying to use the site you used in your tests, I get a response. But, I'm not looking for an API output, I'm looking for the code for the page itself. Like what you get if you right click and hit Inspect in Chrome. The data I need is found in div titles, I believe. So how do I get that?
Also: I am not surprised my method is slower, it is older. But my target customers, if I ever get this developed, will be from different parts of the world, and many may be on older tech, with older software. If I understand correctly, the fetch method isn't supported with the older stuff.
So that puts me back to processing on your server and outputting the info as json user uploads less onto their device your server will be faster less reliance on the user's browser capabilities bypasses the cors issues $code = file_get_contents($url); $output = []; $doc = new DOMDocument(); $doc->loadHTML($code); $divs = $dom->getElementsByTagName('div'); foreach ($divs as $div) { ... whatever you need to do... save result in $output } echo json_ecode($output); PHP: