1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

automatically import website data

Discussion in 'Programming' started by fundmakesell, Oct 12, 2019.

  1. #1
    I want to create a business directory website for local theaters and community center..etc. Is there a way my website site can automatically import their show times into my website?
    SEMrush
     
    fundmakesell, Oct 12, 2019 IP
    SEMrush
  2. ElscottHavoc

    ElscottHavoc Well-Known Member

    Messages:
    76
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    103
    #2
    I would imagine if you can access the content, a crawler could be programmed to extract show times from it. I'm not sure how to program a scraper, but I would imagine there is someone with the coding know how that could in the marketplace.
     
    ElscottHavoc, Oct 16, 2019 IP
  3. fundmakesell

    fundmakesell Peon

    Messages:
    3
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    1
    #3
    If I have over 200 websites to synch, what is the best way to do this?
     
    fundmakesell, Oct 16, 2019 IP
  4. sarahk

    sarahk iTamer Staff

    Messages:
    25,341
    Likes Received:
    3,482
    Best Answers:
    101
    Trophy Points:
    665
    #4
    If your site has credibility - because of who you are, your connections, how long it's been up, you may be able to get them to update your site directly - you'd just have to create the appropriate forms.

    I'm guessing that's not the case.

    You'd start with a list of those sites and the url of their listings pages.
    Then I'd pick up the phone and ask them if they have an RSS feed or similar of their session times. This is your BEST first approach. You want their cooperation. You want them to see you as an ally. You want them to pick up the phone and let you know when they change their format.

    If they don't then you need to find a source of their info.

    I just looked at my local cinema and to get session times for a movie I have to click through quite a few steps so it's unlikely that any single call will have info you need - example

    A crawl through the "Network" files gives me this: https://www.eventcinemas.co.nz/Cinemas/GetSessions?cinemaIds=509. I'd need to wade through all the info to see how complete it is.

    I can see that they're screening Joker and the session times:
    upload_2019-10-17_17-9-15.png upload_2019-10-17_17-10-17.png

    And from that I can build a specific script and save that in my new alongside that particular cinema.

    If I'm lucky other cinemas will be using the same management software and I can re-use my script. If not I might have to resort to actual screen scraping which is never a good idea. I just looked at our local events centre that has shows, orchestras, film festivals etc and their data wasn't easy to grab.

    If a cinema/theatre takes exception to your site expect to get legal warnings or they might just f*** with you and change the format slightly every week so that your scraper breaks.
     
    sarahk, Oct 16, 2019 IP
    JEET likes this.
  5. fundmakesell

    fundmakesell Peon

    Messages:
    3
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    1
    #5
    Thanks Sarah. Very impressive.
     
    fundmakesell, Oct 21, 2019 IP
  6. JEET

    JEET Well-Known Member

    Messages:
    2,409
    Likes Received:
    132
    Best Answers:
    3
    Trophy Points:
    185
    #6
    Like @sarahk said, see if these websites have a RSS feed of some kind, or a notification service where they send you an email about show timings.

    If there is an email notification service, all you need to do is subscribe to their notifications, then code up a script to access your email ID, parse the email content to get show timings, and then log to your website.
    You can even do this manually. Will be tough manually, with 200 websites sending notifications though...
    It'd be best if you could create a separate email id for just this purpose.

    If this is not an option, then build a crawler, identify yourself as a bot, or a user using your user_agent string, and get the HTML of the website.
    Parse the content, and log in your own website.
    However, like already pointed out, the theatre websites can do a lot to stop this crawler.
    They can block the IP of your server itself so your crawler cannot read anything from their servers...
    The number is on your side though, scanning 200 websites.
    Its very less likely that all of those 200 will block you. Even if 100 websites do not block you, then you will still have a lot of data to fill your website and provide a valuable resource for others.

    Make sure that your crawler does not hits their websites very quickly. Once or twice a day should be ok, specially if you disguise yourself as a human using a browser...
     
    JEET, Oct 28, 2019 IP
  7. sarahk

    sarahk iTamer Staff

    Messages:
    25,341
    Likes Received:
    3,482
    Best Answers:
    101
    Trophy Points:
    665
    #7
    Only hide your crawler if the relationship with the theatres is hostile.

    You want to be seen as a professional so follow the rules and identify yourself properly.
     
    sarahk, Oct 28, 2019 IP
    JEET likes this.
  8. JEET

    JEET Well-Known Member

    Messages:
    2,409
    Likes Received:
    132
    Best Answers:
    3
    Trophy Points:
    185
    #8
    @sarahk
    That is true, if he can get the info via proper channels, then that would be best.

    However, he would need the crawler only if he cannot get that info by proper channels, and scraping it would be the only way.
     
    JEET, Oct 28, 2019 IP