Go Back   Webmaster Forums UK SEO SEM Webmaster Community Forum - UKWW > Website Traffic and Marketing > General Search Engine Discussions
Register FAQ Members List Downloads Calendar Today's Posts Webmaster Resources Webmaster Blogs
 
 

General Search Engine Discussions Help, support, tips with high ranking in Google, Yahoo, MSN Live. Advice tutorial and help with better listing in Google.
Sub Forums: Google Forum:: SE Optimisation :: SEO Contest:: Internet Marketing :: Social Media:: Link Development

Reply
 
LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old 11-12-2004, 07:28 PM
ovi
Guest
 
Posts: n/a
Default Microsoft Crawling Google Results

Today, I have search some information on the web, and I found this article. Maybe will be interesting for some of you:


I was questioned today by a developer who was watching a particular IP address scan his site. The IP was 65.54.188.86 and is registered to Microsoft Corp. located at One Microsoft Way, Redmond, Washington 98052. This visitor was not sending the normal header information associated with a crawler to the web server such as an http robot name or identifying info or even a browser name.
The behavior it demonstrated made it look like a crawler, especially since it was spidering urls that were no longer in existence (search engine spiders crawl site segments at regular intervals and often come back when an initial crawl left urls uncrawled) and doing so at the rate of 1 page every 3 - 5 seconds. The visitor started their visit at 7:37 am and was still on the site at 12:00 pm.
Correction, the data was there after all, here's the crawler info... msnbot/0.3 (search.msn.com/msnbot.htm)
Here's the kicker
So now you're saying, so what, big deal. But this really is a big deal. It's a big deal not only because the urls this visitor was making requests to don't exist any longer but because the only place these urls can be found is in Google's search results using site:www.sitename.com. A similar query on MSN Search doesn't show the urls at all, even on the beta version of their new Microsoft search engine. But then within just hours of the visitors exit from the site the new same search at Microsoft's new search engine shows all of the urls in question being fully indexed within its results.
My Theory On This Mysterious Microsoft Crawler
The old msn required a fee to be crawled by its spider. But a few months back MSN dropped the fee and said they were going to begin crawling the entire web and doing it without charge. However, that's no easy task. So I believe MSN is using the results from Google and possibly even Yahoo to get all of the pages they've indexed on sites that have a relatively low page count in the current msn search engine.
First off, that's the fastest way to get the relevant pages from a web site. Sure they could just go to the site directly and start crawling but in doing so they're going to get tons of duplicate urls and urls that seem different but point to the same content. Crawling Google's results will eliminate the bandwidth to some extent but will not completely take care of the duplicate content issue their spider will encounter.
Secondly, crawling Google's results can act as a qualitative measure for their new search engine. By creating a baseline number of pages per site when the new Microsoft Search is launched and running a comparison on a regular interval for the next 6 months, they'll be able to determine internally if their engine is finding and indexing the same links and as many links as Google. Call it competitive analysis or whatever you want.
So Microsoft's Screen Scraping?
Obviously my conclusion should be taken as a grain of salt but it's a definite possibility. Microsoft very well could be screen scraping Google (or maybe even using their API, LOL) and crawling the urls it finds. It makes sense from a business case but I wonder if there are any legal issues there. I doubt it. It's like putting garbage out to the curb. Once it's out there it's fair game but I bet Google's lawyers would have more to say than that on the case.

I found this article here:
webpronews.com/insiderreports/searchinsider/wpn-49-20041111MicrosoftCrawlingGoogleResultsForNewSearch Engine.html
Digg this Post!Add Post to del.icio.usStumble this Post!Wong this Post!
Reply With Quote
Reply

Bookmarks



Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Webmaster Resources
UK WW SEO Tools
Find UK Hosts
 
The Forum Rules
Forum Rules - MUST READ
 
Site Of the Month
BizzFace
Nominate site of the month
 
Tag Cloud
0bones ad agency backlinks beauty bid directory brand handbag brand new cash christian dior purse clothes content for sale contest directories directory dooney and bourke purse exchange faric handbag fendi purse free free directories gambling giveaway go kart graphic desingning guaranteed listing handbags high replica internet directories jewelry juicy couture purse link link development link directory link leaders link popularity links link sales louis vuitton purse marc jacobs purse mortgage page rank pet picture of the day poker post request seobowl social sunglasses themes today in history versace purse wallets web desinging web hosting webhosting website web space wordpress writer

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Audio Watermark Web Spider Starts Crawling temi General Webmaster Talk 0 03-01-2007 12:15 AM
Google results: 100% Automatic or human hand? midlandi General Search Engine Discussions 1 01-19-2007 11:47 PM
Microsoft & Google Working Together? clau General Webmaster Talk 0 12-16-2005 10:05 AM
Google Patent : Organic Results Ranked by User Profiling ealex General Search Engine Discussions 0 11-04-2005 04:20 PM
Microsoft to Kill Google temi General Search Engine Discussions 0 09-07-2005 06:26 AM


All times are GMT. The time now is 09:36 PM.

UK Webmaster World Forums - Internet marketing, web development, domain names, SEO contest and discussuons.
Subscribe to our feeds   Subscribe to our feeds

Powered by vBulletin® Version 3.7.0
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
LinkBacks Enabled by vBSEO 3.1.0