Know about 'Google hacking'
Google hacking is the use of a search engine, such as Google, to locate a security vulnerability on the Internet. There are generally two types of vulnerabilities to be found on the Web: software vulnerabilities and misconfigurations.
Although there are some sophisticated intruders who target a specific system and try to discover vulnerabilities that will allow them access, the vast majority of intruders start out with a specific software vulnerability or common user misconfiguration that they already know how to exploit, and simply try to find or scan for systems that have this vulnerability. Google is of limited use to the first attacker, but invaluable to the second.
When an attacker knows the sort of vulnerability he wants to exploit but has no specific target, he employs a scanner. A scanner is a program that automates the process of examining a massive quantity of systems for a security flaw. The earliest computer-related scanner, for example, was a war dialer; a program that would dial long lists of phone numbers and record which ones responded with a modem handshake.
Today there are scanners that automatically query IP addresses to see what ports they have open, determine what operating system they're probably running, or determine the geographic location of the system. One of the most popular IP scanners is NMap, a free open source utility for network exploration and security auditing. When using NMap, the user specifies a range of hosts and the specific services on each one to scan for. The program will then return a list of the available (and presumably vulnerable) systems.
With a little creativity, Google can be made to operate in a similar way as NMap, even though they use different protocols. As an example, let's pretend we are intruders and we know there's an exploit that will allow us to steal credit card information from any online store that uses SHOP.TAX scripts and that www.secure.com uses SHOP.TAX. When we try our exploit, it turns out that they've already patched the vulnerability. What do we do now? We turn to Google and enter the following search string: inurl:shop.tax
Note that the above search employs advanced operators to produce a list of all sites that have "shop.tax" somewhere in their URL, essentially a list of potentially vulnerable targets. Just as with NMap, all that's left to do is try our exploit against each site on the list.
There are countless variations on this scheme, including some rather clever ways to find particular versions of server programs.
Sometimes administrators misconfigure their sites so badly, it's not even neccessary to use a "third party" exploit in order to gain access to a system. Google indexes the Web very aggressively, and unless a file is put behind in a password- or otherwise access-restricted area of a Web site, there is a good chance that it will be searchable in Google. This includes password files, credit reports, medical records, etc. In cases where the files are not adequately protected from Google, the search engine has basically already performed the exploit for the attacker.
In this way, Google can also be used as a proxy for exploits. A proxy is an intermediary system that an attacker can use to disguise his or her identity. For example, if you were to gain remote access to Bill Gates' computer and cause it to run attacks on treasury.gov, it would appear to the Feds that Bill Gates was hacking them. His computer would be acting as a proxy. Google can be used in a similar way.
The search engine has already gathered this information and will give it freely without a peep to the vulnerable site. Things get even more interesting when you consider the Google cache function. If you have never used this feature, try this:
Do a Google search for "SearchTechTarget.com." Click on the first result and read a few of the headlines. Now click back to return to your search. This time, click the "Cached" link to the right of the URL of the page you just visited. Notice anything unusual? You're probably looking at the headlines from yesterday or the day before. Why, you ask? It's because whenever Google indexes a page, it saves a copy of the entire thing to its server.
This can be used for a lot more than reading old news. The intruder can now use Google to scan for sensitive files without alerting potential targets -- and even when a target is found, the intruder can access its files from the Google cache without ever making contact with the target's server. The only server with any logs of the attack would be Google's, and it's unlikely they will realize an attack has taken place.
An even more elaborate trick involves crafting a special URL that would not normally be indexed by Google, perhaps one involving a buffer overflow or SQL injection. This URL is then submitted to Google as a new Web page. Google automatically accesses it, stores the resulting data in its searchable cache, and the rest is a recipe for disaster.
How can you prevent Google hacking?
Make sure you are comfortable with sharing everything in your public Web folder with the whole world, because Google will share it, whether you like it or not. Also, in order to prevent attackers from easily figuring out what server software you are running, change the default error messages and other identifiers. Often, when a "404 Not Found" error is detected, servers will return a page like that says something like:
Not FoundThe requested URL /cgi-bin/xxxxxx was not found on this server.Apache/1.3.27 Server at your web site Port 80
The only information that the legimitate user really needs is a message that says "Page Not found." Restricting the other information will prevent your page from turning up in an attacker's search for a specific flavor of server.
Google periodically purges it's cache, but until then your sensitive files are still being offered to the public. If you realize that the search engine has cached files that you want to be unavailable to be viewed you can go to ( http://www.google.com/remove.html ) and follow the instructions on how to remove your page, or parts of your page, from their database.