OSINT (Open Source Intelligence)

Open Source Intelligence is the collection and analysis of data gathered from open sources to produce actionable intelligence.

Index

Google-Dorks-Cheat-Sheet

A list of useful Google Dorks queries and explanations. - GHDB

1. Caché

A cache is a metadata stored so that future requests for that data can be served faster. Cached versions of web pages can be used to view the contents of a page when the live version cannot be reached, has been altered or taken down. The query cache: will show the version of the web page that Google has in its cache.

cache:website address
cache:https://www.parador.es/es paradores

Other tools related to the cache are this Cache checker and the Wayback Machine

2. Intext and Allintext

To find a specific text from a webpage, you can use the intext command. Intext will allow you to search for a single keyword in the results unlike allintext which can be used for multiple keywords. Shows only those pages containing that specific word (or words).

intext:usernames
allintext:"usernames" "passwords"

3. Filetype

Shows only pages that contains the document of that type. For example, you can apply a filter just to retrieve PDF files.

filetype:pdf
filetype:log

4. Intitle and Allintitle

This command filters out the documents based on HTML page titles as the main keywords exist within the title of the HTML page.

intitle:"kernel module"
allintitle:"kernel module" "objdump"

5. Inurl and Allinurl

the inurl command filters out the documents based on the URL text. Those keywords are available on the HTML page, with the URL representing the whole page.

inurl:dump
allinurl:dump physical memory linux

Cache/Archive

Search the latest cached results.

cache:examle.com


Country & Language

If we want to get search results with specific country and language, set parameters gl and hl.

# gl=us: United States
# hl=en: English
https://www.google.com/search?q=apple&gl=us&hl=en


Directory Listing

Search websites which allow directory listings. We can retrieve all files if it's enabled in websites.

intext: "Index of /admin"
intext: "Index of /wp-admin"
site:example.com intext: "Index of /admin"


File Types

Specify the filetype e.g. pdf`.

filetype:pdf
filetype:pdf "email address"


Sensitive Information

site:github.com "DB_USER"
site:github.com "DB_PASSWORD"

# Filter by datetime
"DB_USER" after:2022-01-01 before:2023-01-01


Subdomains

site:*.google.com

# -site: Exclude specific domain
site:*.example.com -site:www.example.com

# Specify file extension
site:*.google.com ext:php


Title

Searche keywords contained in page title.

intitle:pentesting


URL

Search all URLs containing specific keyword e.g. TLD (com, eu, io, etc.).

inurl:edu
inurl:edu "login"

Shodan Dorks

Shordan is a search engine which allows us to find various types of servers by filters. This page gives ways to search specific information.

- [systemweakness](https://systemweakness.com/how-to-find-open-elasticsearch-databases-using-shodan-fb9314af604a)
product:elastic port:9200 country:us
product:postgresql port:5432 country:jp

Search 'users' column

proudct:elastic port:9200 users

OSINT

Awesome Repos

User Research

Usernames

People

Phonenumbers

Birthdays

"Name of target" intext:"happy birthday"

Resumes

"Name of targert" resume filetype:pdf

Social Media

Twitter

Facebook

Instagram

Snapchat Maps

Web Archives

Vulnerabilities (CVE)

Malware

Person Investigation

Accounts in Social Media & Other Platforms

The target person may use some social media. So first check if the account exists in each platform.

Google Dorking

Assume the target person named John Smith.

<social_media> john smith
<social_media> jsmith
<social_media> j.smith

# add the year of birth
<social_media> john1999

For example,

facebook jsmith
reddit jsmith
twitter jsmith

IOSINT (Image OSINT) for Account Pictures

If the person sets the picture as user profile or posts, we can investigate the information about the image using IOSINT.
Also, we might be able to get hints from information reflected a picture as below:

Older Account Pages

Using WayBackMachine, we can gather older information about the target person in each platform.

In WayBackMachine, search the following URLs.

Reddit

http://old.reddit.com/user/<username>
https://www.reddit.com/user/<username>

Twitter

https://twitter.com/<username>

IMINT (Image Intelligence) and GEOINT (Geospatial Intelligence)

IMINT and GEOINT are types of OSINT to reveal desired information from analyzing images.

Basic Investigation

open example.jpg
exiftool example.jpg

Gather Information From Search Engine

Search the keyword which is found in the image.

Upload the image in each search engine.

Video (mp4) Geolocation

FFmpeg extracts every single frame from a video.

# -i: input file
# %06d: followed by six digits e.g. img_000001.png, img_000002.png, etc.
# -hide_banner: hide unnecessary text.
# -r: frame rate (e.g. 1 frame per second)
ffmpeg -i example.mp4 -r 1 img_%06d.png -hide_banner

Find Leaked API Keys

Finding API keys which are leaked is crucial work for penetration testing or bug bounty. If we found the API keys leaked, sensitive information is at risk of being stolen. So immediate actions must be taken.

Awesome Resources

Google Dorks

Google Dorks is useful to search leaked API keys/tokens.
*Here is the simple example so might be unuseful. Please see Awesome Resources section if you are seriously looking for that.

Common APIs

Try changing the site domain and the extensions e.g. js, py, go.

GitHub repositories

site:github.com ext:php "api-key"
site:github.com ext:php "api_key"
site:github.com ext:php "api-token"
site:github.com ext:php "api_token"
site:github.com ext:php "access-token"
site:github.com ext:php "access_token"
site:github.com ext:php "x-api-key"
site:github.com ext:php "x_api_key"
site:github.com ext:php "x-api-token"
site:github.com ext:php "x_api_token"
site:github.com ext:php "x-access-token"
site:github.com ext:php "x_access_token"

GitLab repositories

site:gitlab.com ext:php "api-key"

GitHub Dorks

- [github-dorks](https://github.com/techgaun/github-dorks)

AWS

site:github.com ext:py "ap-northeast-1.amazonaws.com" "x-api-key"

Google APIs

site:github.com ext:py "googleapis.com" "?key="

Hugging Face

site:github.com ext:py "https://api-inference.huggingface.co/models" "Authorization: Bearer"

OpenAI

site:github.com ext:py "https://api.openai.com/v1/models" "Authorization: Bearer"

Common Credentials

path:.env
path:.env passwd
path:.env password
path:.env secret

path:*.env api
path:*.env passwd
path:*.env password
path:*.env secret

path:config.* auth
path:config.* password
path:config.* passwd
path:config.* token
path:config.json password

"example.com" password
"example.com" passwd
"example.com" credential
"example.com" creds

Web Conf

path:.htpasswd

WordPress

path:wp-config.php

Databases

path:.pgpass
path:my.cnf
path:redis.conf
path:mongod.conf

Git

path:.git-credentials

Bash

path:.bash_history
path:.bash_profile
path:.bashrc
path:.profile

path:.bashrc password
path:.bash_history root

path:etc/passwd
path:etc/shadow

path:password.*

SSH

path:id_rsa
path:private_key
path:.ssh/id_rsa

Docker

path:docker.conf
path:docker.service

Backup Files

path:*.bak
path:backup
path:backups

Archive.org

For JSON file:

http://web.archive.org/cdx/search/cdx?url=example.com*&output=json
For TXT format:
http://web.archive.org/cdx/search/cdx?url=example.com*&output=txt

If you need to limit the time frame of the crawl then you can add the following parameters to the end to narrow the range.

yyyyMMddhhmmss
Example:
http://web.archive.org/cdx/search/cdx?url=example.com*&output=txt&from=2010&to=2018

You can also decrease or increase the limit to match your needs. Example:

http://web.archive.org/cdx/search/cdx?url=example.com*&output=txt&limit=999999