This is a great list! I had something similar to number 11 happen in the past. I just want to share the issue in case anyone ever has the same problem.
Google crawled a new site I created, that I had yet to link to from anywhere. They were able to crawl it because my host automatically had directory list on as default in apache. I turned it off of course and I recommend anyone running apache do the same. You can do it by adding this to a .htaccess file.
Options -Indexes
Google ended up making like 10 links like this: sitename.com/?id=3 etc. The only way I found Google would get rid of it from their index was by making disallows in my robots.txt for each link. Deleting the URLs in Google's webmaster tools will simply not work.
|