Web Scraping - Advice Please

ExoPaul Peon

Messages:

2

Likes Received:

0

Best Answers:

0

Trophy Points:

1

#1

Hi there,

Please bear in mind that I am not a web developer.

My developer is running a web scraper to scrape specific content from websites. All is working pretty well except that it will only scrape from public-facing content, not content behind a login, even when logged in as an account.

Can anyone give some tips, advice or point me in the direction of a web scraper that can be customised to scrape the content we are wanting AND bypassed the login requirements.

Creating an account first with the relevant site is not a problem, it is just how to then scrape the data we need. How is the bypassing done?

Again, I am not a developer so anything that helps such as an example or code snippet or a product that can be looked at would be so helpful for me to pass on to him.

Thanks guys, and thank you for allowing me to post.

ExoPaul, Sep 21, 2021 IP
sarahk iTamer Staff

Messages:

28,500

Likes Received:

4,460

Best Answers:

123

Trophy Points:

665

#2

You want advice on how to steal intellectual property?
You're at the WRONG forum, buddy.

► PayPal and the negative balance
► Cabin for hire Auckland

sarahk, Sep 21, 2021 IP
sarahk iTamer Staff

Messages:

28,500

Likes Received:

4,460

Best Answers:

123

Trophy Points:

665

#3

Ok, so this user objected to my conclusion that he was stealing intellectual property and likened his scraping to a search engine's indexing.

His developer will have used curl and any number of the open-source packages that exist, and have existed for decades.

That the developer has encountered a site where they've made the login process so complex that a standard curl login won't work suggests that the site owner has gone to some effort to prevent scrapers. I presume the developer has checked to see if there's an API. When you've contacted the site owner have they offered a solution? After all, you're not doing anything wrong so contacting the site owner won't raise any problems.

► PayPal and the negative balance
► Cabin for hire Auckland

sarahk, Sep 21, 2021 IP
MrAEL Peon

Messages:

2

Likes Received:

0

Best Answers:

0

Trophy Points:

1

#4

Hi,
to scrape an unauthorized link/ressources you need access (account or token ...) when you sign in to the website you receive multiple additional information (cookies)
now you need only to get this cookies from your browser (export them) and add them to your scraper tool (as cookies header),
your scrapper now can easily access the links as an authenticated user.

MrAEL, Sep 29, 2021 IP

Log in or Sign up

Advertising (learn more)

Web Scraping - Advice Please

ExoPaul Peon

sarahk iTamer Staff

sarahk iTamer Staff

MrAEL Peon

Log in or Sign up

Advertising (learn more)

Web Scraping - Advice Please

ExoPaul Peon

sarahk iTamer Staff

sarahk iTamer Staff

MrAEL Peon

Useful Searches