Digital Point Forums
Winn and Sims

Go Back   Digital Point Forums > Search Engines > Google
Google Analytics
Log In to view
your analytics

Reply
 
Thread Tools
  #1  
Old Apr 13th 2008, 6:56 am
usasportstraining's Avatar
usasportstraining usasportstraining is offline
Starcaller
 
Join Date: May 2007
Location: Minnesota
Posts: 3,916
usasportstraining has a reputation beyond reputeusasportstraining has a reputation beyond reputeusasportstraining has a reputation beyond reputeusasportstraining has a reputation beyond reputeusasportstraining has a reputation beyond reputeusasportstraining has a reputation beyond reputeusasportstraining has a reputation beyond reputeusasportstraining has a reputation beyond reputeusasportstraining has a reputation beyond reputeusasportstraining has a reputation beyond reputeusasportstraining has a reputation beyond repute
Exclamation Google Spider able to Crawl more now!

Google's GoogleBot is now able to crawl forms and they say they can now "scan" Flash and Javascript.

That's big news!!

All those links could be crawlable now that were previously not found due to javascript.

Quote:
Crawling through HTML forms
Friday, April 11, 2008 at 10:50 AM
Written by Jayant Madhavan and Alon Halevy, Crawling and Indexing Team

Google is constantly trying new ideas to improve our coverage of the web. We already do some pretty smart things like scanning JavaScript and Flash to discover links to new web pages, and today, we would like to talk about another new technology we've started experimenting with recently.

In the past few months we have been exploring some HTML forms to try to discover new web pages and URLs that we otherwise couldn't find and index for users who search on Google. Specifically, when we encounter a <FORM> element on a high-quality site, we might choose to do a small number of queries using the form. For text boxes, our computers automatically choose words from the site that has the form; for select menus, check boxes, and radio buttons on the form, we choose from among the values of the HTML. Having chosen the values for each input, we generate and then try to crawl URLs that correspond to a possible query a user may have made. If we ascertain that the web page resulting from our query is valid, interesting, and includes content not in our index, we may include it in our index much as we would include any other web page.

Needless to say, this experiment follows good Internet citizenry practices. Only a small number of particularly useful sites receive this treatment, and our crawl agent, the ever-friendly Googlebot, always adheres to robots.txt, nofollow, and noindex directives. That means that if a search form is forbidden in robots.txt, we won't crawl any of the URLs that a form would generate. Similarly, we only retrieve GET forms and avoid forms that require any kind of user information. For example, we omit any forms that have a password input or that use terms commonly associated with personal information such as logins, userids, contacts, etc. We are also mindful of the impact we can have on web sites and limit ourselves to a very small number of fetches for a given site.

The web pages we discover in our enhanced crawl do not come at the expense of regular web pages that are already part of the crawl, so this change doesn't reduce PageRank for your other pages. As such it should only increase the exposure of your site in Google. This change also does not affect the crawling, ranking, or selection of other web pages in any significant way.

This experiment is part of Google's broader effort to increase its coverage of the web. In fact, HTML forms have long been thought to be the gateway to large volumes of data beyond the normal scope of search engines. The terms Deep Web, Hidden Web, or Invisible Web have been used collectively to refer to such content that has so far been invisible to search engine users. By crawling using HTML forms (and abiding by robots.txt), we are able to lead search engine users to documents that would otherwise not be easily found in search engines, and provide webmasters and users alike with a better and more comprehensive search experience.
Reply With Quote
  #2  
Old Apr 13th 2008, 7:01 am
kodut's Avatar
kodut kodut is offline
Starcaller
 
Join Date: Jan 2007
Location: ╝•╔GURGAON , HARYANA , INDIA
Posts: 2,631
kodut is a jewel in the roughkodut is a jewel in the roughkodut is a jewel in the roughkodut is a jewel in the rough
big news if its real , a generation of search will begin
Reply With Quote
  #3  
Old Apr 13th 2008, 7:03 am
usasportstraining's Avatar
usasportstraining usasportstraining is offline
Starcaller
 
Join Date: May 2007
Location: Minnesota
Posts: 3,916
usasportstraining has a reputation beyond reputeusasportstraining has a reputation beyond reputeusasportstraining has a reputation beyond reputeusasportstraining has a reputation beyond reputeusasportstraining has a reputation beyond reputeusasportstraining has a reputation beyond reputeusasportstraining has a reputation beyond reputeusasportstraining has a reputation beyond reputeusasportstraining has a reputation beyond reputeusasportstraining has a reputation beyond reputeusasportstraining has a reputation beyond repute
Quote:
Originally Posted by kodut View Post
big news if its real , a generation of search will begin
It's straight from Google's blog. I'd say it's as real as it gets.
Reply With Quote
  #4  
Old Apr 13th 2008, 7:20 am
TechEvangelist's Avatar
TechEvangelist TechEvangelist is offline
Twilight Vanquisher
 
Join Date: Apr 2004
Location: Stupid question. At my PC.
Posts: 874
TechEvangelist is a splendid one to beholdTechEvangelist is a splendid one to beholdTechEvangelist is a splendid one to beholdTechEvangelist is a splendid one to beholdTechEvangelist is a splendid one to beholdTechEvangelist is a splendid one to beholdTechEvangelist is a splendid one to beholdTechEvangelist is a splendid one to behold
I guess that explains why I have been seeing search URLs showing up in Google's index for some of my sites. It looked like someone was posting the URLs from the search results pages in the sites, but I could never find the source of the links.

This is a double-edged sword. Search results pages typically are not optimized. I don't see any other type of form that they would care to crawl.

What is the benefit of this? I have not seen any of these pages show up on Google's search pages and I don't think users want to land on internal search results pages. Am I missing something here?
__________________
You are entitled to your own opinion, but you are not entitled to your own facts. - Daniel Moynihan
Facts are meaningless. They can be used to prove anything. - Homer Simpson
Computer Repair in Phoenix
Reply With Quote
  #5  
Old Apr 13th 2008, 7:26 am
rainborick rainborick is online now
Hand of A'dal
 
Join Date: Apr 2004
Location: Minneapolis, MN
Posts: 263
rainborick will become famous soon enoughrainborick will become famous soon enough
Google has been scanning JavaScript and Flash files for URLs for 3-4 years now. They look for complete URLs in the form of "http://www.example.com/". They added the ability to extract text from Flash files over 2 years ago. What is truly new is that they are experimenting with following <form>s to see if they resolve to crawlable pages, such as a site's custom search form. Note that the <form> has to use the "GET" method so that there is a URL to put in the index.
__________________
Richard L. Trethewey
Minneapolis Web Design * SEO Tips * *
Reply With Quote
  #6  
Old Apr 13th 2008, 7:37 am
kks_krishna's Avatar
kks_krishna kks_krishna is offline
of the Nightfall
 
Join Date: Dec 2006
Location: Earth
Posts: 1,482
kks_krishna will become famous soon enough
good news. They need more links
__________________
Plan Your Investment | Medical Advices |
Reply With Quote
  #7  
Old Apr 13th 2008, 7:43 am
usasportstraining's Avatar
usasportstraining usasportstraining is offline
Starcaller
 
Join Date: May 2007
Location: Minnesota
Posts: 3,916
usasportstraining has a reputation beyond reputeusasportstraining has a reputation beyond reputeusasportstraining has a reputation beyond reputeusasportstraining has a reputation beyond reputeusasportstraining has a reputation beyond reputeusasportstraining has a reputation beyond reputeusasportstraining has a reputation beyond reputeusasportstraining has a reputation beyond reputeusasportstraining has a reputation beyond reputeusasportstraining has a reputation beyond reputeusasportstraining has a reputation beyond repute
Quote:
Originally Posted by rainborick View Post
Google has been scanning JavaScript and Flash files for URLs for 3-4 years now. They look for complete URLs in the form of "http://www.example.com/". They added the ability to extract text from Flash files over 2 years ago. What is truly new is that they are experimenting with following <form>s to see if they resolve to crawlable pages, such as a site's custom search form. Note that the <form> has to use the "GET" method so that there is a URL to put in the index.
That explains a few things about what I've been seeing for reported backlinks.

Are you sure they've been scanning javascript for 3 or 4 years though? I've been hearing that they could not, up until now. It seems to me it's a fairly new development, not as new as reading forms, but new just the same.
Reply With Quote
  #8  
Old Apr 13th 2008, 8:41 am
wilhb81's Avatar
wilhb81 wilhb81 is offline
of the Nightfall
 
Join Date: Dec 2007
Location: Southeast Dreamland
Posts: 1,631
wilhb81 will become famous soon enough
WoW, this will be a great news for us, especially for those site that full of flash..
__________________
PR3HealthBlogpost|PR3TechBlogpost|
Reply With Quote
  #9  
Old Apr 13th 2008, 8:43 am
Divisive Cottonwood Divisive Cottonwood is offline
of the Nightfall
 
Join Date: Jul 2007
Location: DPRK
Posts: 1,622
Divisive Cottonwood has a spectacular aura aboutDivisive Cottonwood has a spectacular aura about
Why is this great news? I don't understand. Can somebody give me an example where this will make a difference to a website.
__________________
Web Design | RSS Directory
Reply With Quote
  #10  
Old Apr 13th 2008, 8:48 am
Legendary11's Avatar
Legendary11 Legendary11 is offline
Twilight Vanquisher
 
Join Date: Feb 2008
Location: www.tyronecampbell.co.uk
Posts: 908
Legendary11 is on a distinguished road
more scanning, page indexing more pages on the net more chance of getting visitors?
__________________
Sorry readers no clicking to see my CB product websites anymore. Yes, have a look, but to rewrite my sites? Ouch. Create a free blog | Student Information
Reply With Quote
  #11  
Old Apr 13th 2008, 9:05 am
usasportstraining's Avatar
usasportstraining usasportstraining is offline
Starcaller
 
Join Date: May 2007
Location: Minnesota
Posts: 3,916
usasportstraining has a reputation beyond reputeusasportstraining has a reputation beyond reputeusasportstraining has a reputation beyond reputeusasportstraining has a reputation beyond reputeusasportstraining has a reputation beyond reputeusasportstraining has a reputation beyond reputeusasportstraining has a reputation beyond reputeusasportstraining has a reputation beyond reputeusasportstraining has a reputation beyond reputeusasportstraining has a reputation beyond reputeusasportstraining has a reputation beyond repute
Quote:
Originally Posted by Divisive Cottonwood View Post
Why is this great news? I don't understand. Can somebody give me an example where this will make a difference to a website.
More ways for Google to crawl and gather data means more ways for us to benefit.

If they can find links in javascript and flash, then it gives us more avenues to have backlinks.

There may be some benefits for them crawling forms as well.
Reply With Quote
  #12  
Old Apr 13th 2008, 9:12 am
mhmdkhamis's Avatar
mhmdkhamis mhmdkhamis is offline
of the Nightfall
 
Join Date: Nov 2006
Posts: 1,023
mhmdkhamis will become famous soon enough
good news

thanks man
__________________
Reply With Quote
  #13  
Old Apr 13th 2008, 9:17 am
rupertValentino's Avatar
rupertValentino rupertValentino is offline
Twilight Vanquisher
 
Join Date: Oct 2006
Posts: 735
rupertValentino will become famous soon enoughrupertValentino will become famous soon enough
thanks for spreading the news.ciao
Reply With Quote
  #14  
Old Apr 13th 2008, 9:19 am
godsofchaos's Avatar
godsofchaos godsofchaos is offline
of the Nightfall
 
Join Date: Jan 2008
Location: ...PM ME for RESULT ORIENTED SEO!
Posts: 2,294
godsofchaos is a glorious beacon of lightgodsofchaos is a glorious beacon of lightgodsofchaos is a glorious beacon of lightgodsofchaos is a glorious beacon of lightgodsofchaos is a glorious beacon of lightgodsofchaos is a glorious beacon of light
Thats huge news! Now widgets like Criteo and blogsphere will get a run for their money....
Reply With Quote
  #15  
Old Apr 13th 2008, 9:25 am
Proximity Proximity is offline
Banned
 
Join Date: Nov 2007
Location: Somewhere
Posts: 3,938
Proximity will become famous soon enough
great news lets hope it is benefitial
Reply With Quote
  #16  
Old Apr 13th 2008, 9:38 am
ancientcity ancientcity is offline
Champion of the Naaru
 
Join Date: May 2006
Posts: 129
ancientcity is on a distinguished road
wow, this is interesting
Reply With Quote
  #17  
Old Apr 13th 2008, 9:39 am
pioneer1's Avatar
pioneer1 pioneer1 is offline
of the Nightfall
 
Join Date: Aug 2007
Posts: 1,130
pioneer1 will become famous soon enough
Google is going to to deep scan web pages. Forms, buttons, check boxes etc.
Reply With Quote
  #18  
Old Apr 13th 2008, 9:44 am
elias_sorensen's Avatar
elias_sorensen elias_sorensen is offline
Twilight Vanquisher
 
Join Date: Nov 2006
Location: Denmark
Posts: 774
elias_sorensen is on a distinguished road
Cool news ;-)
__________________
Elias Sørensen
Freelance webdesigner and programmer

Freelance programmør - Elias Sørensen
Reply With Quote
  #19  
Old Apr 13th 2008, 10:13 am
usasportstraining's Avatar
usasportstraining usasportstraining is offline
Starcaller
 
Join Date: May 2007
Location: Minnesota
Posts: 3,916
usasportstraining has a reputation beyond reputeusasportstraining has a reputation beyond reputeusasportstraining has a reputation beyond reputeusasportstraining has a reputation beyond reputeusasportstraining has a reputation beyond reputeusasportstraining has a reputation beyond reputeusasportstraining has a reputation beyond reputeusasportstraining has a reputation beyond reputeusasportstraining has a reputation beyond reputeusasportstraining has a reputation beyond reputeusasportstraining has a reputation beyond repute
Quote:
Originally Posted by godsofchaos View Post
Thats huge news! Now widgets like Criteo and blogsphere will get a run for their money....
Exactly!

That's my hope.
Reply With Quote
  #20  
Old Apr 13th 2008, 10:16 am
arjun316's Avatar
arjun316 arjun316 is offline
of the Nightfall
 
Join Date: Dec 2007
Posts: 1,100
arjun316 will become famous soon enougharjun316 will become famous soon enough
great news i hope they crawl my website more and let it be on top of search engine
Reply With Quote
Reply

Bookmarks

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Spider Crawl USPB Site & Server Administration 3 Nov 15th 2007 10:32 pm
Speed up spider crawl advanet Search Engine Optimization 5 Jun 16th 2007 9:44 am
in how many days google spider will crawl? Jalpari Search Engine Optimization 17 May 19th 2007 5:00 am
Can some one tell me what to look for Crawl Spider? My English not that good Yellowberry.org Scripts 0 Jul 10th 2006 11:35 am
Best way to spider/crawl content on another site? yo-yo Programming 18 Sep 13th 2005 4:47 pm


All times are GMT -8. The time now is 8:04 am.