 |
|

09-13-2007, 06:37 PM
|
 |
Boss Cart consultant
1152 posts this year. Platinum VIP!Trusted Member - This user is a Master!
Last months UKWW Tokens: 12
|
|
|
Join Date: Feb 2007
Location: Veszprém, Hungary
Posts: 1,594
Thanks: 4
Thanked 27 Times in 13 Posts
Nominated 0 Times in 0 Posts
TOTW/F/M Award(s): 0
|
|
Protect yourself against Google Proxy hack
Quote:
Originally Posted by Bagi Zoltán
After 4 days testing i dare to share a defending solution against this google proxy hack , which based on the reverse-forward DNS validating.
All you need to to is to upload this php file to your server as reversedns.php
PHP Code:
<?php
// Get the user agent.
$ua = $_SERVER['HTTP_USER_AGENT'];
// Check the user agent to see if it's identifying itself as a search engine bot.
if(stristr($ua, 'msnbot') || stristr($ua, 'googlebot') || stristr($ua, 'yahoo slurp')){
// The user agent is purporting to be MSN's bot or Google's bot or Yahoo! Slurp.
// If the user agent string is spoofed, we won't find googlebot.com in the host name.
// Get the IP address requesting the page.
$ip = $_SERVER['REMOTE_ADDR'];
// Reverse DNS lookup the IP address to get a hostname.
$hostname = gethostbyaddr($ip);
// Check for '.googlebot.com' and '/search.live.com' in hostname.
if(!preg_match("/\.googlebot\.com$/", $hostname) &&!preg_match("/search\.live\.com$/", $hostname) &&!preg_match("/crawl\.yahoo\.net$/", $hostname)) {
// The host name does not belong to either live.com or googlebot.com.
// Remember the UA already said it is either MSNBot or Googlebot.
$block = TRUE;
header("HTTP/1.0 403 Forbidden");
exit;
} else {
// Now we have a hit that half-passes the check. One last go:
// Forward DNS lookup the hostname to get an IP address.
$real_ip = gethostbyname($hostname);
if($ip!= $real_ip){
$block = TRUE;
header("HTTP/1.0 403 Forbidden");
exit;
} else {
// Real bot.
$block = FALSE;
}
}
}
?>
It will validate the googlebot msnbot and yahoo slurp request, so when they come from a proxy link they won't be able to cache your site as duplicated content. The original site didn't include the Yahoo's robot, but i completed the php with it.
If your site based on php language you may want to implement the robot validation with a simple include like this
PHP Code:
<?php include("reversedns.php"); ?>
If your site is html you have to upload the same php script and add this lines to your .htaccess file.
Code:
AddType application/x-httpd-php .html .htm .txt
php_value auto_prepend_file "/data/www/htdocs/users/sziget-conseils/i-connector/reversedns.php"
Of course you need to substitute my path with yours. If you have no idea about the path you may want to upload a php_info.php file to your root with the following content
PHP Code:
<?echo phpinfo();?>
Put the webadress of the uploaded php_info.php file to the browser and you can find your path at the _SERVER["DOCUMENT_ROOT"] line.
So protect yourself!
As source of this solution i visited this URLs:
Proxy Server URLs Can Hijack Your Google Ranking - how to defend? and Yahoo! Search Blog: Yahoo! Search Crawler, Slurp, has a new Address and Signature Card
|
How you can test it? It is quite simple.
Quote:
enter about:config as an address in the address bar of FireFox, the location where you normally enter a URL (link). I recommend to preserve the original value, which you can get when you enter just about: in the address bar.
Now press the right mouse button to get the context menu and select "String" from the menu entry "New". Enter the preference name "general.useragent.override", without the quotes. Next, enter the new User Agent for instance "Googlebot/2.1 (+http://www.googlebot.com/bot.html)", without the quotes
|
source
Visit your site with this browser setting and you will see what the spider will see when it comes from a proxy link.:armada18:
|

09-13-2007, 07:04 PM
|
 |
Facilitator
5301 posts this year. Platinum VIP!Trusted Member - This user is a Master!
Last months UKWW Tokens: 274
|
|
|
Join Date: Jun 2003
Location: London, England.
Posts: 11,769
Thanks: 3
Thanked 22 Times in 15 Posts
Nominated 0 Times in 0 Posts
TOTW/F/M Award(s): 0
|
|
excellent post and research Bagi, well done, rep added 
|

09-13-2007, 07:13 PM
|
 |
Boss Cart consultant
1152 posts this year. Platinum VIP!Trusted Member - This user is a Master!
Last months UKWW Tokens: 12
|
|
|
Join Date: Feb 2007
Location: Veszprém, Hungary
Posts: 1,594
Thanks: 4
Thanked 27 Times in 13 Posts
Nominated 0 Times in 0 Posts
TOTW/F/M Award(s): 0
|
|
Thank you Temi, i have just post it into my Hungarian blog, i guess the local guys will be happy as well 
|

09-13-2007, 07:18 PM
|
 |
Super Moderator
1988 posts this year. Platinum VIP!Trusted Member - This user is a Master!
Last months UKWW Tokens: 107
|
|
|
Join Date: Mar 2007
Location: zeshaan.info
Posts: 3,701
Thanks: 1
Thanked 7 Times in 6 Posts
Nominated 0 Times in 0 Posts
TOTW/F/M Award(s): 0
|
|
Great job Bagi - rep added
If I wanted to implement this at The Web Directory, would I just need to copy the code above and paste into a new file called reversedns.php and then change details within the code to my site?
|

09-13-2007, 07:23 PM
|
 |
Boss Cart consultant
1152 posts this year. Platinum VIP!Trusted Member - This user is a Master!
Last months UKWW Tokens: 12
|
|
|
Join Date: Feb 2007
Location: Veszprém, Hungary
Posts: 1,594
Thanks: 4
Thanked 27 Times in 13 Posts
Nominated 0 Times in 0 Posts
TOTW/F/M Award(s): 0
|
|
I suggest you to upload the reversedns,php and add this
PHP Code:
<?php include("reversedns.php"); ?>
php include to the very top of the index.php and show_cat.php files. This is how i implemented this 
|

09-13-2007, 07:25 PM
|
 |
Super Moderator
1988 posts this year. Platinum VIP!Trusted Member - This user is a Master!
Last months UKWW Tokens: 107
|
|
|
Join Date: Mar 2007
Location: zeshaan.info
Posts: 3,701
Thanks: 1
Thanked 7 Times in 6 Posts
Nominated 0 Times in 0 Posts
TOTW/F/M Award(s): 0
|
|
Thanks Bagi
Which file would I need to add <?php include("reversedns.php"); ?>
|

09-13-2007, 07:26 PM
|
 |
Super Moderator
1988 posts this year. Platinum VIP!Trusted Member - This user is a Master!
Last months UKWW Tokens: 107
|
|
|
Join Date: Mar 2007
Location: zeshaan.info
Posts: 3,701
Thanks: 1
Thanked 7 Times in 6 Posts
Nominated 0 Times in 0 Posts
TOTW/F/M Award(s): 0
|
|
Bagi, I think you have done a superb job. You have taken the time to test and then share this with us so I am going to give you another rep.
Good job
|

09-13-2007, 07:31 PM
|
 |
Boss Cart consultant
1152 posts this year. Platinum VIP!Trusted Member - This user is a Master!
Last months UKWW Tokens: 12
|
|
|
Join Date: Feb 2007
Location: Veszprém, Hungary
Posts: 1,594
Thanks: 4
Thanked 27 Times in 13 Posts
Nominated 0 Times in 0 Posts
TOTW/F/M Award(s): 0
|
|
Thanks Imran. At my seo dir the yahoo slurp validation isn't in the code since the script has some error with that. I used this reversedns.php file
PHP Code:
<?php // Get the user agent. $ua = $_SERVER['HTTP_USER_AGENT']; // Check the user agent to see if it's identifying itself as a search engine bot. if(stristr($ua, 'msnbot') || stristr($ua, 'googlebot')){ // The user agent is purporting to be MSN's bot or Google's bot. // If the user agent string is spoofed, we won't find googlebot.com in the host name. // Get the IP address requesting the page. $ip = $_SERVER['REMOTE_ADDR']; // Reverse DNS lookup the IP address to get a hostname. $hostname = gethostbyaddr($ip); // Check for '.googlebot.com' and '/search.live.com' in hostname. if(!preg_match("/\.googlebot\.com$/", $hostname) &&!preg_match("/search\.live\.com$/", $hostname)) { // The host name does not belong to either live.com or googlebot.com. // Remember the UA already said it is either MSNBot or Googlebot. $block = TRUE; header("HTTP/1.0 403 Forbidden"); exit; } else { // Now we have a hit that half-passes the check. One last go: // Forward DNS lookup the hostname to get an IP address. $real_ip = gethostbyname($hostname); if($ip!= $real_ip){ $block = TRUE; header("HTTP/1.0 403 Forbidden"); exit; } else { // Real bot. $block = FALSE; } } } ?>
which validates only the googlebot and the msnbot.
Imran, don't forget that not i'm the person who developed this solution, i only share this 
|

09-13-2007, 07:51 PM
|
 |
Facilitator
5301 posts this year. Platinum VIP!Trusted Member - This user is a Master!
Last months UKWW Tokens: 274
|
|
|
Join Date: Jun 2003
Location: London, England.
Posts: 11,769
Thanks: 3
Thanked 22 Times in 15 Posts
Nominated 0 Times in 0 Posts
TOTW/F/M Award(s): 0
|
|
Bagi,
So I can sum the above process up like this (for php based site:
1. save the above code as a reversedns.php
2. call the file with an include code like this: <?php include("reversedns.php"); ?>
3. This is completed. The file will validat the 3 major search engines and enable them to cache your site.
I'm I correct?
Thanks
|

09-13-2007, 07:53 PM
|
 |
Facilitator
5301 posts this year. Platinum VIP!Trusted Member - This user is a Master!
Last months UKWW Tokens: 274
|
|
|
Join Date: Jun 2003
Location: London, England.
Posts: 11,769
Thanks: 3
Thanked 22 Times in 15 Posts
Nominated 0 Times in 0 Posts
TOTW/F/M Award(s): 0
|
|
I will mention it in my blog as well and link to your as the source (well I know you are sharing it, but you have enhanced it). 
|
|
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
|
|
|
|
|