I wonder if a Google bot starts scanning my site, does it scan all the files and folders in my public_html/mywebsite/ file? Or it scans only the files that are linked from my website?
First, the crawlers only read your HTML rendered pages via your top level web directory, second 90% of the time Google finds pages from internal and external links to those web pages. However, Google can find a web page that isn't linked to at all, this does happen, but not that often. Proper use of robots.txt can prevent the pages you don't want indexed from being crawled and indexed.