aws_extractor

command module

v1.2.0 Latest Latest Go to latest Published: Mar 19, 2026 License: MIT Imports: 13 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/OctaYus/aws_extractor

Links

Open Source Insights

README ¶

AWS S3 Bucket Extractor

A high-performance concurrent web crawler written in Go that extracts AWS S3 bucket references from websites. Perfect for security research, bug bounty hunting, and cloud asset discovery.

Features

Fast Concurrent Crawling - Multi-threaded URL processing
11 Regex Patterns - Detects various S3 bucket URL formats
Bucket Validation - Filters false positives automatically
Real-time Saving - Buckets saved immediately when discovered
Multiple Output Formats - TXT, JSON, and CSV support
Colored Terminal Output - Easy-to-read colored logs
Retry Logic - Automatic retry for failed requests
Customizable - Configurable timeout, workers, and User-Agent

Installation

Option 1: Install with `go install` (Recommended)

go install github.com/OctaYus/aws_extractor@latest

This will install the binary to your $GOPATH/bin directory (usually ~/go/bin).

Make sure $GOPATH/bin is in your PATH:

export PATH=$PATH:$(go env GOPATH)/bin

Option 2: Build from Source

# Clone the repository
git clone https://github.com/OctaYus/aws_extractor.git
cd aws_extractor

# Build the binary
go build -o aws_extractor

# Optional: Install to your system
sudo mv aws_extractor /usr/local/bin/

Option 3: Download Pre-built Binary

Download the latest release from the Releases page.

Quick Start

# Basic usage
aws_extractor -u urls.txt

# Save results to JSON
aws_extractor -u urls.txt -o results.json -f json

# Verbose mode with 10 workers
aws_extractor -u urls.txt -v -w 10

# Custom timeout and debug mode
aws_extractor -u urls.txt -t 60 -debug

Usage

Usage of aws_extractor:
  -u string
        Path to file containing URLs (one per line) (required)
  -o string
        Output file path
  -f string
        Output format (txt, json, csv) (default "txt")
  -w int
        Number of concurrent workers (default 5)
  -t int
        Request timeout in seconds (default 30)
  -v    Verbose output (show progress per URL)
  -user-agent string
        Custom User-Agent string
  -debug
        Enable debug logging

URL File Format

Create a text file with one URL per line:

# My target URLs
https://example.com
https://github.com
amazon.com          # Will become https://amazon.com

# Lines starting with # or // are ignored
// Another comment
google.com

# Blank lines are ignored

Output Files

When you specify an output file with -o, two files are created:

Main Results File (results.txt / results.json / results.csv)
- Complete crawl results with all URLs and their status
Buckets File (results_buckets.txt)
- Real-time list of discovered buckets
- Format: bucket-name | source-url
- Updated immediately as buckets are found

Examples

Basic Scan

aws_extractor -u targets.txt

Production Scan with JSON Output

aws_extractor -u targets.txt -o scan_results.json -f json

High-Speed Scan (10 workers, 60s timeout)

aws_extractor -u targets.txt -w 10 -t 60 -v

Debug Mode (See All Pattern Matches)

aws_extractor -u targets.txt -debug

Custom User-Agent

aws_extractor -u targets.txt -user-agent "MyScanner/1.0"

Detected S3 Bucket Formats

The tool detects buckets in various formats:

Path-style URLs:
- https://s3.amazonaws.com/bucket-name/file.jpg
- https://s3-us-west-2.amazonaws.com/bucket-name/
Virtual-hosted-style URLs:
- https://bucket-name.s3.amazonaws.com/
- https://bucket-name.s3-us-west-2.amazonaws.com/
- https://bucket-name.s3.us-west-2.amazonaws.com/
S3 URI:
- s3://bucket-name/path/to/file
ARN format:
- arn:aws:s3:::bucket-name
Configuration files (JSON/JS):
- "bucket": "bucket-name"
- Bucket: "bucket-name"
- bucketName: "bucket-name"

Output Formats

TXT Format (`-f txt`)

AWS S3 Bucket Crawler Results
================================================================================

URL: https://example.com
Status: 200
Buckets found (2):
  - my-bucket-name
  - assets-bucket
--------------------------------------------------------------------------------

JSON Format (`-f json`)

[
  {
    "url": "https://example.com",
    "status": 200,
    "buckets": ["my-bucket-name", "assets-bucket"],
    "error": ""
  }
]

CSV Format (`-f csv`)

URL,Status,Buckets Found,Bucket Names,Error
https://example.com,200,2,"my-bucket-name, assets-bucket",

Terminal Output

When a bucket is discovered, you'll see:

────────────────────────────────────────────────────────────────────────────────
[BUCKET DISCOVERED]
  Name:   my-production-bucket
  Source: https://example.com
────────────────────────────────────────────────────────────────────────────────

Summary at the end:

================================================================================
[+] CRAWL SUMMARY
[+] Total URLs crawled: 25
[+] Successful crawls: 23
[-] Failed crawls: 2
[+] Unique buckets found: 5
[+] All unique buckets:
    - bucket-one
    - bucket-two
    - bucket-three
    - bucket-four
    - bucket-five
================================================================================

Performance Tips

Increase Workers for faster scanning:

   aws_extractor -u urls.txt -w 20

Adjust Timeout for slow sites:

   aws_extractor -u urls.txt -t 60

Use Verbose Mode to monitor progress:

   aws_extractor -u urls.txt -v

Security & Legal

Important:

Only scan websites you have permission to scan
Respect robots.txt and terms of service
Don't access buckets that don't belong to you
Use for security research and bug bounties only
Be aware of legal implications in your jurisdiction

Troubleshooting

"command not found: aws_extractor"

Make sure $GOPATH/bin is in your PATH:

export PATH=$PATH:$(go env GOPATH)/bin

Add this to your ~/.bashrc or ~/.zshrc to make it permanent.

Getting Rate Limited

Reduce the number of workers:

aws_extractor -u urls.txt -w 3

Timeouts

Increase the timeout value:

aws_extractor -u urls.txt -t 60

No Buckets Found

Enable debug mode to see pattern matching details:

aws_extractor -u urls.txt -debug

Building for Different Platforms

# Linux
GOOS=linux GOARCH=amd64 go build -o aws_extractor-linux

# Windows
GOOS=windows GOARCH=amd64 go build -o aws_extractor.exe

# macOS (Intel)
GOOS=darwin GOARCH=amd64 go build -o aws_extractor-mac-intel

# macOS (Apple Silicon)
GOOS=darwin GOARCH=arm64 go build -o aws_extractor-mac-arm

Project Structure

aws_extractor/
├── main.go              # Main application code
├── go.mod               # Go module dependencies
├── go.sum               # Dependency checksums
├── README.md            # This file
├── LICENSE              # License file
└── examples/
    └── urls.txt         # Example URL file

Dependencies

logrus - Structured logging

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -am 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Roadmap

Add support for custom regex patterns
Implement rate limiting
Add proxy support
Check bucket accessibility
Add GitHub Actions for releases
Docker container support

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Inspired by security research tools and bug bounty workflows
Built with Go for maximum performance and concurrency

Author

OctaYus

GitHub: @OctaYus
Repository: aws_extractor

Support

If you find this tool useful, please consider:

Starring the repository
Reporting bugs via Issues
Suggesting features via Issues

Happy Hunting!

Documentation ¶

There is no documentation for this package.

Source Files ¶

View all Source files

main.go

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL