OSINT Techniques: Open Source Intelligence Gathering Guide

Open Source Intelligence (OSINT) is the collection and analysis of publicly available information. In penetration testing, OSINT is used during the reconnaissance phase to gather information about targets before active scanning. This guide covers essential OSINT techniques and tools.

What is OSINT?

OSINT involves gathering information from publicly available sources such as websites, social media, public records, and technical databases. This information helps build a profile of the target and identify potential attack vectors.

Passive vs Active Reconnaissance

  • Passive: No direct interaction with target (OSINT)
  • Active: Direct interaction with target systems (scanning)

OSINT is passive reconnaissance – you gather information without touching the target’s systems.

Domain and Website Intelligence

WHOIS Lookup

# Command line
whois example.com

# Information gathered:
# - Registrar details
# - Registration/expiration dates
# - Name servers
# - Registrant contact (if not private)

DNS Enumeration

# Find subdomains
dig example.com ANY
dig +short ns example.com
dig +short mx example.com
dig +short txt example.com

# Subdomain enumeration tools
subfinder -d example.com
amass enum -passive -d example.com
assetfinder example.com

# Certificate transparency logs
# crt.sh, censys.io

Web Archive

The Wayback Machine (web.archive.org) stores historical snapshots of websites. Use it to:

  • Find old versions of websites
  • Discover removed content
  • Identify old technologies
  • Find exposed files that were later removed
# Wayback URLs tool
waybackurls example.com | sort -u

Technology Detection

# Wappalyzer (browser extension)
# BuiltWith (online tool)

# Command line
whatweb example.com
webanalyze -host example.com

Search Engine Dorking

Google dorks use advanced search operators to find specific information:

# Site-specific search
site:example.com

# File types
site:example.com filetype:pdf
site:example.com filetype:doc
site:example.com filetype:xls
site:example.com filetype:sql
site:example.com filetype:log

# Directory listings
site:example.com intitle:"index of"

# Login pages
site:example.com inurl:login
site:example.com inurl:admin

# Sensitive files
site:example.com ext:conf
site:example.com ext:env
site:example.com ext:bak

# Error messages
site:example.com "sql syntax"
site:example.com "mysql_fetch"
site:example.com "warning: include"

# Exposed credentials
site:example.com "password" filetype:txt
site:example.com "api_key"

# Combine operators
site:example.com filetype:pdf -www

Email OSINT

Finding Email Addresses

# Tools
theHarvester -d example.com -b all
hunter.io (web-based)
phonebook.cz

# Email format discovery
# LinkedIn profiles often reveal naming conventions
# [email protected]
# [email protected]

Email Verification

# Check if email exists (careful - can be detected)
# emailhippo.com
# verify-email.org

# Check for breaches
haveibeenpwned.com

Social Media Intelligence

LinkedIn

  • Employee names and job titles
  • Technologies mentioned in job postings
  • Company structure and departments
  • Former employees

Twitter/X

# Search operators
from:username
to:username
@username
#hashtag
since:2024-01-01
until:2024-12-31

GitHub

# Search for secrets
org:company password
org:company api_key
org:company secret
org:company token

# Tools
trufflehog
gitleaks
gitrob

Infrastructure OSINT

IP Address Information

# Geolocation
geoiplookup 8.8.8.8

# ASN lookup
whois -h whois.cymru.com 8.8.8.8

# Online tools
# shodan.io
# censys.io
# ipinfo.io

Shodan

Shodan indexes internet-connected devices and services:

# Search examples
hostname:example.com
org:"Company Name"
net:192.168.1.0/24
port:22
country:US

# Find specific services
apache
nginx
product:MySQL

# Command line
shodan host 8.8.8.8
shodan search "hostname:example.com"

Certificate Transparency

# Find subdomains via SSL certificates
# crt.sh
curl -s "https://crt.sh/?q=%25.example.com&output=json" | jq -r '.[].name_value' | sort -u

Document Metadata

Documents often contain metadata revealing usernames, software versions, and internal paths:

# Extract metadata
exiftool document.pdf
exiftool document.docx

# Common metadata:
# - Author names (usernames)
# - Software versions
# - Creation/modification dates
# - Internal file paths

# Metagoofil (automated)
metagoofil -d example.com -t pdf,doc,xls -o output/

People Search

  • LinkedIn for professional profiles
  • Social media platforms
  • Public records databases
  • namechk.com for username searches
  • whatsmyname.app for username OSINT

OSINT Frameworks and Tools

theHarvester

theHarvester -d example.com -b google,bing,linkedin,twitter

Recon-ng

recon-ng
workspaces create example
db insert domains
# Add example.com
modules load recon/domains-hosts/hackertarget
run

SpiderFoot

Automated OSINT collection with web interface:

spiderfoot -l 127.0.0.1:5001

OSINT Workflow

  1. Define scope: What are you looking for?
  2. Identify sources: Where might this information exist?
  3. Collect data: Use tools and manual searching
  4. Analyze findings: Look for patterns and connections
  5. Document everything: Keep detailed notes
  6. Verify information: Cross-reference multiple sources

Summary

OSINT is a critical first step in any penetration test. The information gathered helps identify potential vulnerabilities, craft targeted attacks, and understand the target’s attack surface. Always document your findings and remember that OSINT should remain passive – do not access systems without authorization.

Written by

Window Events

Leave a Reply

Your email address will not be published. Required fields are marked *