Updated: May 27, 2019
So normally, when you want to search for something on the Internet, what will you do?
You use a search engine like Google, and search for something.
When you make a search, Google shows you all the matching, appropriate search results, and you can choose what website you want to visit from the search results.
How does Google know these websites or webpages?
The primary job or functionality of any search engine is to gather information about different websites by crawling through them using softwares or bots called spiders or crawlers.
When a website is developed and is ready for deployment, the admin submits his/her website to Google Search (or any number of search engines) asking Google to include his/her website in the search results as well.
This is the first time when Google gets to know the web address of that particular website. It then crawls through the submitted website.
Crawling is nothing but following all the publicly available links (or pages) on the website. Google crawls through a website by using several techniques, like making use of a sitemap (which contains all the links of that website) that is submitted by the website admin, and by following links that are mentioned in other pages.
While crawling through all the links of the website, Google analyzes these pages and adds information about the same into its index. This is called indexing.
When someone makes a search, google uses some ranking algorithms to analyze the search properly, and it then fetches relevant data from its index, ranks it, and displays it to the user.
Why is this potentially dangerous?
Any website on the Internet has many important or critical files that are not meant to be read, or accessed by anyone except the developer or the admin of that website.
Just like how you save files on your computer in directories or paths, websites do the same. All the files of a website (be it critical admin files, or just public files) are saved in their respective directories on a web server.
However, all the critical files that contain important information like the Database password of the website, cryptographic keys of the website, etc., are saved in a particular directory and the access to this directory (or directories) are restricted to public. This is obvious. We don't want people on the Internet to look up our passwords and keys, right?
So, what if you don't restrict the access to critical directories because you were lazy or just careless?
Then, yeah it's pretty easy for literally any individual on the Internet to browse through these critical files on your website.
But there's one thing here, how can someone know the directories on your web server? Just like how you create folders and directories on your personal computer, you can do the same on your web server and save files that you wish. So, it's hard for anyone to know the name of the folders, files on your web server.
So, this stops them from reading the critical files of your website? I don't think so.
This is where search engines can be an evil for your website. Search engines index all the directories (or links) of your website (unless you mention specifically not to index a page by making using of nofollow tag, robots.txt, etc)
If search engines indexed your critical directories, and files, it means that they can potentially be shown as search results to the users (although not top in the results as the Ranking AI algorithms are in the play).
Using Google Dorks to search for juicy information
Dorks are some keywords which can be used to filter out the search results and see only what we want to see.
We can make use of dorks to literally tell the search engine to not use the Ranking algorithms and show just what we are mentioning specifically.
For example, when you are googling something like this,
site:vives.org inurl:wp-config.php intext:DB_PASSWORD
site, inurl, and intext are the dorks here.
site is used to tell google to show the results of only the mentioned site
inurl is used to tell google to show only the results which contain the mentioned string in the URL
intext is used to tell google to show only the results that contain the mentioned text in their page.
By making a google search like above, you can get the wp-config.php file of the website (if it exits in Google's search index) which contains wordpress configuration details like the Database name, password, etc.
So what we just did is make use of dorks to extract critical and private information of a website that is not protecting its critical files and folders (directories, as whole).
Some more google dorks
intitle - Used to list results that have a specific text in their title.
link - Used to list web pages that have the link to a particular website.
info - Used to list results based on the information that google has about webpages.
maps - Google looks on Google Maps and lists out results.
Using Google Dorks for hacking
Google dorks for fetching credentials (usernames, passwords) :
intitle:settings.py intext:EMAIL_HOST_PASSWORD -git -stackoverflow
Google dorks to fetch login portals of websites :
intitle:"admin console" inurl:login site:"*.edu"|site:"*.gov"|site:"*.net" -site:*.com -help -guide -documentation -release -notes -configure -support -price -cant
Google dorks to view webcams :
intitle:”Live View/ — AXIS”
intitle:”live view” intitle:axis
intitle:”WJ-NTI 04 Main Page”
To know more Google Dorks and their applications in hacking, check out Google Hacking Database