This commit is contained in:
Stephen Simpson
2025-12-10 11:16:55 -06:00
parent b4ffdb6560
commit 316610e932
14 changed files with 350 additions and 520 deletions

278
README.md
View File

@@ -1,133 +1,108 @@
# Rocky Man 📚
# 🚀 Rocky Man 🚀
**Rocky Man** is a tool for generating searchable HTML documentation from Rocky Linux man pages across BaseOS and AppStream repositories for Rocky Linux 8, 9, and 10.
## Features
- **Fast & Efficient**: Uses filelists.xml to pre-filter packages with man pages
- **Complete Coverage**: All packages from BaseOS and AppStream repositories
- **Container Ready**: Works on x86_64, aarch64, arm64, etc.
- **Smart Cleanup**: Automatic cleanup of temporary files (configurable)
- **Parallel Processing**: Concurrent downloads and conversions for maximum speed
- **Multi-version**: Support for Rocky Linux 8, 9, and 10 simultaneously
- Uses filelists.xml to pre-filter packages with man pages
- Processes packages from BaseOS and AppStream repositories
- Runs in containers on x86_64, aarch64, and arm64 architectures
- Configurable cleanup of temporary files
- Concurrent downloads and conversions
- Supports Rocky Linux 8, 9, and 10
## Quick Start
### Podman (Recommended)
```bash
# Build the image
podman build -t rocky-man .
# Generate man pages for Rocky Linux 9.6 (using defaults, no custom args)
podman run --rm -v $(pwd)/html:/data/html:Z rocky-man
# Generate for specific versions (requires explicit paths)
podman run --rm -v $(pwd)/html:/app/html:Z rocky-man \
--versions 8.10 9.6 10.0 --output-dir /app/html
# With verbose logging
podman run --rm -v $(pwd)/html:/app/html:Z rocky-man \
--versions 9.6 --output-dir /app/html --verbose
# Keep downloaded RPMs (mount the download directory)
podman run --rm -it \
-v $(pwd)/html:/app/html:Z \
-v $(pwd)/downloads:/app/tmp/downloads:Z \
rocky-man --versions 9.6 --keep-rpms \
--output-dir /app/html --download-dir /app/tmp/downloads --verbose
```
### Docker
### Podman
```bash
# Build the image
docker build -t rocky-man .
# Generate man pages (using defaults, no custom args)
docker run --rm -v $(pwd)/html:/data/html rocky-man
# Generate for specific versions
podman run --rm -v $(pwd)/html:/data/html:Z rocky-man \
--versions 8.10 9.6 10.0
# Generate for specific versions (requires explicit paths)
docker run --rm -v $(pwd)/html:/app/html rocky-man \
--versions 9.6 --output-dir /app/html
# Keep downloaded RPMs for multiple builds
podman run --rm -it \
-v $(pwd)/html:/data/html:Z \
-v $(pwd)/downloads:/data/tmp/downloads:Z \
rocky-man --versions 9.6 --keep-rpms --verbose
```
# Interactive mode for debugging
docker run --rm -it -v $(pwd)/html:/app/html rocky-man \
--versions 9.6 --output-dir /app/html --verbose
### View the HTML Locally
# Keep downloaded RPMs (mount the download directory)
docker run --rm -it \
-v $(pwd)/html:/app/html \
-v $(pwd)/downloads:/app/tmp/downloads \
rocky-man --versions 9.6 --keep-rpms \
--output-dir /app/html --download-dir /app/tmp/downloads --verbose
Start a local web server to browse the generated documentation:
```bash
python3 -m http.server -d ./html
```
Then open [http://127.0.0.1:8000](http://127.0.0.1:8000) in your browser.
To use a different port:
```bash
python3 -m http.server 8080 -d ./html
```
### Directory Structure in Container
The container uses different paths depending on whether you pass custom arguments:
The container uses the following paths:
**Without custom arguments** (using Dockerfile CMD defaults):
- `/data/html` - Generated HTML output
- `/data/tmp/downloads` - Downloaded RPM files
- `/data/tmp/extracts` - Extracted man page files
**With custom arguments** (argparse defaults from working directory `/app`):
- `/app/html` - Generated HTML output
- `/app/tmp/downloads` - Downloaded RPM files
- `/app/tmp/extracts` - Extracted man page files
**Important**: When passing custom arguments, the container's CMD is overridden and the code falls back to relative paths (`./html` = `/app/html`). You must explicitly specify `--output-dir /app/html --download-dir /app/tmp/downloads` to match your volume mounts. Without this, files are written inside the container and lost when it stops (especially with `--rm`).
These paths are used by default and can be overridden with command-line arguments if needed.
### Local Development
#### Prerequisites
**Important**: Rocky Man requires Rocky Linux because it uses the system's native `python3-dnf` module to interact with DNF repositories. This module cannot be installed via pip and must come from the Rocky Linux system packages.
- Python 3.9+
- pip (Python package manager)
- mandoc (man page converter)
- Rocky Linux system or container (for DNF)
#### Installation
#### Option 1: Run in a Rocky Linux Container (Recommended)
```bash
# On Rocky Linux, install system dependencies
# Start a Rocky Linux container with your project mounted
podman run --rm -it -v $(pwd):/workspace:Z rockylinux/rockylinux:9 /bin/bash
# Inside the container, navigate to the project
cd /workspace
# Install epel-release for mandoc
dnf install -y epel-release
# Install system dependencies
dnf install -y python3 python3-pip python3-dnf mandoc rpm-build dnf-plugins-core
# Install Python dependencies
pip3 install -e .
# Run the tool
python3 -m rocky_man.main --versions 9.6 --output-dir ./html/
```
#### Usage
#### Option 2: On a Native Rocky Linux System
```bash
# Generate man pages for Rocky 9.6
python -m rocky_man.main --versions 9.6
# Install epel-release for mandoc
dnf install -y epel-release
# Generate for multiple versions (default)
python -m rocky_man.main --versions 8.10 9.6 10.0
# Install system dependencies
dnf install -y python3 python3-pip python3-dnf mandoc rpm-build dnf-plugins-core
# Custom output directory
python -m rocky_man.main --output-dir /var/www/html/man --versions 9.6
# Install Python dependencies
pip3 install -e .
# Keep downloaded RPMs for debugging
python -m rocky_man.main --keep-rpms --verbose
# Adjust parallelism for faster processing
python -m rocky_man.main --parallel-downloads 10 --parallel-conversions 20
# Use a different mirror
python -m rocky_man.main --mirror https://mirrors.example.com/
# Only BaseOS (faster)
python -m rocky_man.main --repo-types BaseOS --versions 9.6
# Run the tool
python3 -m rocky_man.main --versions 9.6 --output-dir ./html/
```
## Architecture
Rocky Man is organized into clean, modular components:
Rocky Man is organized into components:
```
```text
rocky-man/
├── src/rocky_man/
│ ├── models/ # Data models (Package, ManFile)
@@ -143,22 +118,28 @@ rocky-man/
### How It Works
1. **Package Discovery** - Parse repository `filelists.xml` to identify packages with man pages
2. **Smart Download** - Download only packages containing man pages with parallel downloads
3. **Extraction** - Extract man page files from RPM packages
4. **Conversion** - Convert troff format to HTML using mandoc
5. **Web Generation** - Wrap HTML in templates and generate search index
6. **Cleanup** - Automatically remove temporary files (configurable)
1. **Package Discovery** - Parses repository metadata (`repodata/repomd.xml` and `filelists.xml.gz`) to identify packages containing files in `/usr/share/man/` directories
2. **Package Download** - Downloads identified RPM packages using DNF, with configurable parallel downloads (default: 5)
3. **Man Page Extraction** - Extracts man page files from RPMs using `rpm2cpio`, filtering by section and language based on configuration
4. **HTML Conversion** - Converts troff-formatted man pages to HTML using mandoc, with parallel processing (default: 10 workers)
5. **Cross-Reference Linking** - Parses converted HTML to add hyperlinks between man page references (e.g., `bash(1)` becomes clickable)
6. **Index Generation** - Creates search indexes (JSON/gzipped) and navigation pages using Jinja2 templates
7. **Cleanup** - Removes temporary files (RPMs and extracted content) unless `--keep-rpms` or `--keep-extracts` is specified
## Command Line Options
```
usage: rocky-man [-h] [--versions VERSIONS [VERSIONS ...]]
[--repo-types REPO_TYPES [REPO_TYPES ...]]
[--output-dir OUTPUT_DIR] [--download-dir DOWNLOAD_DIR]
[--extract-dir EXTRACT_DIR] [--keep-rpms] [--keep-extracts]
[--parallel-downloads N] [--parallel-conversions N]
[--mirror URL] [--template-dir DIR] [-v]
```bash
usage: main.py [-h] [--versions VERSIONS [VERSIONS ...]]
[--repo-types REPO_TYPES [REPO_TYPES ...]]
[--output-dir OUTPUT_DIR] [--download-dir DOWNLOAD_DIR]
[--extract-dir EXTRACT_DIR] [--keep-rpms] [--keep-extracts]
[--parallel-downloads PARALLEL_DOWNLOADS]
[--parallel-conversions PARALLEL_CONVERSIONS] [--mirror MIRROR]
[--vault] [--existing-versions [VERSION ...]]
[--template-dir TEMPLATE_DIR] [-v]
[--skip-sections [SKIP_SECTIONS ...]]
[--skip-packages [SKIP_PACKAGES ...]] [--skip-languages]
[--keep-languages] [--allow-all-sections]
Generate HTML documentation for Rocky Linux man pages
@@ -169,11 +150,11 @@ optional arguments:
--repo-types REPO_TYPES [REPO_TYPES ...]
Repository types to process (default: BaseOS AppStream)
--output-dir OUTPUT_DIR
Output directory for HTML files (default: ./html)
Output directory for HTML files (default: /data/html)
--download-dir DOWNLOAD_DIR
Directory for downloading packages (default: ./tmp/downloads)
Directory for downloading packages (default: /data/tmp/downloads)
--extract-dir EXTRACT_DIR
Directory for extracting man pages (default: ./tmp/extracts)
Directory for extracting man pages (default: /data/tmp/extracts)
--keep-rpms Keep downloaded RPM files after processing
--keep-extracts Keep extracted man files after processing
--parallel-downloads PARALLEL_DOWNLOADS
@@ -196,80 +177,11 @@ optional arguments:
--allow-all-sections Include all man sections (overrides --skip-sections)
```
## Troubleshooting
## Attribution
### DNF Errors
The man pages displayed in this documentation are sourced from Rocky Linux distribution packages. All man page content is copyrighted by their respective authors and distributed under the licenses specified within each man page.
**Problem**: `dnf` module not found or repository errors
**Solution**: Ensure you're running on Rocky Linux or in a Rocky Linux container:
```bash
# Run in Rocky Linux container
podman run --rm -it -v $(pwd):/app rockylinux:9 /bin/bash
cd /app
# Install dependencies
dnf install -y python3 python3-dnf mandoc rpm-build dnf-plugins-core
# Run the script
python3 -m rocky_man.main --versions 9.6
```
### Mandoc Not Found
**Problem**: `mandoc: command not found`
**Solution**: Install mandoc:
```bash
dnf install -y mandoc
```
### Permission Errors in Container
**Problem**: Cannot write to mounted volume
**Solution**: Use the `:Z` flag with podman for SELinux contexts:
```bash
podman run --rm -v $(pwd)/html:/data/html:Z rocky-man
```
For Docker, ensure the volume path is absolute:
```bash
docker run --rm -v "$(pwd)/html":/data/html rocky-man
```
### Out of Memory
**Problem**: Process killed due to memory
**Solution**: Reduce parallelism:
```bash
python -m rocky_man.main --parallel-downloads 2 --parallel-conversions 5
```
### Slow Downloads
**Problem**: Downloads are very slow
**Solution**: Use a closer mirror:
```bash
# Find mirrors at: https://mirrors.rockylinux.org/mirrormanager/mirrors
python -m rocky_man.main --mirror https://mirror.example.com/rocky/
```
## Performance Tips
1. **Use closer mirrors** - Significant speed improvement for downloads
2. **Increase parallelism** - If you have bandwidth: `--parallel-downloads 15`
3. **Process one repo at a time** - Use `--repo-types BaseOS` first, then `--repo-types AppStream`
4. **Keep RPMs for re-runs** - Use `--keep-rpms` if testing
5. **Run in container** - More consistent performance
This tool generates HTML documentation from man pages contained in Rocky Linux packages but does not modify the content of the man pages themselves.
## License
@@ -277,20 +189,16 @@ This project is licensed under the MIT License - see the [LICENSE](LICENSE) file
### Third-Party Software
This project uses several open source components. See [THIRD-PARTY-LICENSES.md](THIRD-PARTY-LICENSES.md) for complete license information and attributions.
This project uses several open source components.
Key dependencies include:
- **mandoc** - Man page converter (ISC License)
- **python3-dnf** - DNF package manager Python bindings (GPL-2.0-or-later)
- **Fuse.js** - Client-side search (Apache 2.0)
- **Python packages**: requests, rpmfile, Jinja2, lxml, zstandard
- **Fonts**: Red Hat Display, Red Hat Text, JetBrains Mono (SIL OFL)
### Trademark Notice
Rocky Linux is a trademark of the Rocky Enterprise Software Foundation (RESF). This project is not officially affiliated with or endorsed by RESF. All trademarks are the property of their respective owners. This project complies with RESF's trademark usage guidelines.
## Contributing
Contributions welcome! Please:
1. Fork the repository
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
3. Make your changes with proper documentation
4. Test thoroughly
5. Commit with clear messages (`git commit -m 'feat: add amazing feature'`)
6. Push to your branch (`git push origin feature/amazing-feature`)
7. Open a Pull Request
Rocky Linux is a trademark of the Rocky Enterprise Software Foundation (RESF). This project is not officially affiliated with or endorsed by RESF. All trademarks are the property of their respective owners. This project complies with RESF's trademark usage guidelines.