From f474c238dcc94730663122bb09ecc249ac4d2d6d Mon Sep 17 00:00:00 2001 From: Stephen Simpson Date: Thu, 4 Dec 2025 09:52:19 -0600 Subject: [PATCH] CUSP-1340 Signed-off-by: Stephen Simpson --- README.md | 420 +++++++++--------------------------------------------- 1 file changed, 69 insertions(+), 351 deletions(-) diff --git a/README.md b/README.md index 9796500..0c11fcf 100644 --- a/README.md +++ b/README.md @@ -1,121 +1,85 @@ # Rocky Man ๐Ÿ“š -**Rocky Man** is a comprehensive man page hosting solution for Rocky Linux, providing beautiful, searchable documentation for all packages in BaseOS and AppStream repositories across Rocky Linux 8, 9, and 10. - -> **โœจ This is a complete rewrite** with 60-80% faster performance, modern architecture, and production-ready features! - -## ๐ŸŽ‰ What's New in This Rewrite - -This version is a **complete ground-up rebuild** with major improvements: - -- ๐Ÿš€ **60-80% faster** - Pre-filters packages using filelists.xml (downloads only ~800 packages instead of ~3000) -- ๐Ÿ—๏ธ **Modular architecture** - Clean separation into models, repo, processor, web, and utils -- ๐ŸŽจ **Modern UI** - Beautiful dark theme with instant fuzzy search -- ๐Ÿณ **Container ready** - Multi-stage Dockerfile that works on any architecture -- โšก **Parallel processing** - Concurrent downloads and HTML conversions -- ๐Ÿงน **Smart cleanup** - Automatic cleanup of temporary files -- ๐Ÿ“ **Well documented** - Comprehensive docstrings and type hints throughout -- ๐Ÿ”’ **Thread safe** - Proper locking and resource management -- ๐Ÿค– **GitHub Actions** - Automated weekly builds and deployment - -### Performance Comparison - -| Metric | Old Version | New Version | Improvement | -|--------|-------------|-------------|-------------| -| Packages Downloaded | ~3000 | ~800 | 73% reduction | -| Processing Time | 2-3 hours | 30-45 minutes | 75% faster | -| Bandwidth Used | ~10 GB | ~2-3 GB | 80% reduction | -| Architecture | Single file | Modular (16 files) | Much cleaner | -| Thread Safety | โš ๏ธ Issues | โœ… Safe | Fixed | -| Cleanup | Manual | Automatic | Improved | -| UI Quality | Basic | Modern | Much better | +**Rocky Man** is a tool for generating searchable HTML documentation from Rocky Linux man pages across BaseOS and AppStream repositories for Rocky Linux 8, 9, and 10. ## Features -- โœจ **Fast & Efficient**: Uses filelists.xml to pre-filter packages with man pages (massive bandwidth savings) -- ๐Ÿ” **Fuzzy Search**: Instant search across all man pages with Fuse.js -- ๐ŸŽจ **Modern UI**: Clean, responsive dark theme interface inspired by GitHub -- ๐Ÿ“ฆ **Complete Coverage**: All packages from BaseOS and AppStream repositories -- ๐Ÿณ **Container Ready**: Architecture-independent Docker support (works on x86_64, aarch64, arm64, etc.) -- ๐Ÿš€ **GitHub Actions**: Automated weekly builds and deployment to GitHub Pages -- ๐Ÿงน **Smart Cleanup**: Automatic cleanup of temporary files (configurable) -- โšก **Parallel Processing**: Concurrent downloads and conversions for maximum speed -- ๐ŸŒ **Multi-version**: Support for Rocky Linux 8, 9, and 10 simultaneously +- **Fast & Efficient**: Uses filelists.xml to pre-filter packages with man pages +- **Complete Coverage**: All packages from BaseOS and AppStream repositories +- **Container Ready**: Works on x86_64, aarch64, arm64, etc. +- **Smart Cleanup**: Automatic cleanup of temporary files (configurable) +- **Parallel Processing**: Concurrent downloads and conversions for maximum speed +- **Multi-version**: Support for Rocky Linux 8, 9, and 10 simultaneously ## Quick Start -### Option 1: Docker (Recommended) - -```bash -# Build the image -docker build -t rocky-man . - -# Generate man pages for Rocky Linux 9.6 -docker run --rm -v $(pwd)/html:/data/html rocky-man --versions 9.6 - -# Generate for multiple versions -docker run --rm -v $(pwd)/html:/data/html rocky-man --versions 8.10 9.6 10.0 - -# With verbose logging -docker run --rm -v $(pwd)/html:/data/html rocky-man --versions 9.6 --verbose - -# Keep downloaded RPMs (mount the download directory) -docker run --rm -it \ - -v $(pwd)/html:/data/html \ - -v $(pwd)/downloads:/data/tmp/downloads \ - rocky-man --versions 9.6 --keep-rpms --verbose -``` - -### Option 2: Podman (Native Rocky Linux) +### Podman (Recommended) ```bash # Build the image podman build -t rocky-man . -# Run with podman (note the :Z flag for SELinux) -podman run --rm -v $(pwd)/html:/data/html:Z rocky-man --versions 9.6 +# Generate man pages for Rocky Linux 9.6 (using defaults, no custom args) +podman run --rm -v $(pwd)/html:/data/html:Z rocky-man -# Interactive mode for debugging -podman run --rm -it -v $(pwd)/html:/data/html:Z rocky-man --versions 9.6 --verbose +# Generate for specific versions (requires explicit paths) +podman run --rm -v $(pwd)/html:/app/html:Z rocky-man \ + --versions 8.10 9.6 10.0 --output-dir /app/html + +# With verbose logging +podman run --rm -v $(pwd)/html:/app/html:Z rocky-man \ + --versions 9.6 --output-dir /app/html --verbose # Keep downloaded RPMs (mount the download directory) podman run --rm -it \ - -v $(pwd)/html:/data/html:Z \ - -v $(pwd)/downloads:/data/tmp/downloads:Z \ - rocky-man --versions 9.6 --keep-rpms --verbose + -v $(pwd)/html:/app/html:Z \ + -v $(pwd)/downloads:/app/tmp/downloads:Z \ + rocky-man --versions 9.6 --keep-rpms \ + --output-dir /app/html --download-dir /app/tmp/downloads --verbose ``` -### Option 3: Docker Compose (Development) +### Docker ```bash -# Build and run -docker-compose up +# Build the image +docker build -t rocky-man . -# The generated HTML will be in ./html/ -# Preview at http://localhost:8080 (nginx container) +# Generate man pages (using defaults, no custom args) +docker run --rm -v $(pwd)/html:/data/html rocky-man + +# Generate for specific versions (requires explicit paths) +docker run --rm -v $(pwd)/html:/app/html rocky-man \ + --versions 9.6 --output-dir /app/html + +# Interactive mode for debugging +docker run --rm -it -v $(pwd)/html:/app/html rocky-man \ + --versions 9.6 --output-dir /app/html --verbose + +# Keep downloaded RPMs (mount the download directory) +docker run --rm -it \ + -v $(pwd)/html:/app/html \ + -v $(pwd)/downloads:/app/tmp/downloads \ + rocky-man --versions 9.6 --keep-rpms \ + --output-dir /app/html --download-dir /app/tmp/downloads --verbose ``` ### Directory Structure in Container -When running in a container, rocky-man uses these directories inside `/data/`: +The container uses different paths depending on whether you pass custom arguments: -- `/data/html` - Generated HTML output (mount this to access results) -- `/data/tmp/downloads` - Downloaded RPM files (temporary) -- `/data/tmp/extracts` - Extracted man page files (temporary) +**Without custom arguments** (using Dockerfile CMD defaults): +- `/data/html` - Generated HTML output +- `/data/tmp/downloads` - Downloaded RPM files +- `/data/tmp/extracts` - Extracted man page files -By default, RPMs and extracts are automatically cleaned up after processing. If you want to keep the RPMs (e.g., for debugging or multiple runs), mount the download directory and use `--keep-rpms`: +**With custom arguments** (argparse defaults from working directory `/app`): +- `/app/html` - Generated HTML output +- `/app/tmp/downloads` - Downloaded RPM files +- `/app/tmp/extracts` - Extracted man page files -```bash -# This keeps RPMs on your host in ./downloads/ -podman run --rm -it \ - -v $(pwd)/html:/data/html:Z \ - -v $(pwd)/downloads:/data/tmp/downloads:Z \ - rocky-man --versions 9.6 --keep-rpms -``` +**Important**: When passing custom arguments, the container's CMD is overridden and the code falls back to relative paths (`./html` = `/app/html`). You must explicitly specify `--output-dir /app/html --download-dir /app/tmp/downloads` to match your volume mounts. Without this, files are written inside the container and lost when it stops (especially with `--rm`). -**Note**: Without mounting `/data/tmp/downloads`, the `--keep-rpms` flag will keep files inside the container, but they'll be lost when the container stops (especially with `--rm`). - -### Option 4: Local Development +### Local Development #### Prerequisites @@ -154,6 +118,9 @@ python -m rocky_man.main --parallel-downloads 10 --parallel-conversions 20 # Use a different mirror python -m rocky_man.main --mirror https://mirrors.example.com/ + +# Only BaseOS (faster) +python -m rocky_man.main --repo-types BaseOS --versions 9.6 ``` ## Architecture @@ -164,59 +131,24 @@ Rocky Man is organized into clean, modular components: rocky-man/ โ”œโ”€โ”€ src/rocky_man/ โ”‚ โ”œโ”€โ”€ models/ # Data models (Package, ManFile) -โ”‚ โ”‚ โ”œโ”€โ”€ package.py # RPM package representation -โ”‚ โ”‚ โ””โ”€โ”€ manfile.py # Man page file representation -โ”‚ โ”œโ”€โ”€ repo/ # Repository management -โ”‚ โ”‚ โ”œโ”€โ”€ manager.py # DNF repository operations -โ”‚ โ”‚ โ””โ”€โ”€ contents.py # Filelists.xml parser (key optimization!) -โ”‚ โ”œโ”€โ”€ processor/ # Man page processing -โ”‚ โ”‚ โ”œโ”€โ”€ extractor.py # Extract man pages from RPMs -โ”‚ โ”‚ โ””โ”€โ”€ converter.py # Convert to HTML with mandoc -โ”‚ โ”œโ”€โ”€ web/ # Web page generation -โ”‚ โ”‚ โ””โ”€โ”€ generator.py # HTML and search index generation -โ”‚ โ”œโ”€โ”€ utils/ # Utilities -โ”‚ โ”‚ โ””โ”€โ”€ config.py # Configuration management -โ”‚ โ””โ”€โ”€ main.py # Main entry point and orchestration -โ”œโ”€โ”€ templates/ # Jinja2 templates -โ”‚ โ”œโ”€โ”€ base.html # Base template with modern styling -โ”‚ โ”œโ”€โ”€ index.html # Search page with Fuse.js -โ”‚ โ”œโ”€โ”€ manpage.html # Individual man page display -โ”‚ โ””โ”€โ”€ root.html # Multi-version landing page -โ”œโ”€โ”€ Dockerfile # Multi-stage, arch-independent -โ”œโ”€โ”€ docker-compose.yml # Development setup with nginx -โ”œโ”€โ”€ .github/workflows/ # GitHub Actions automation -โ””โ”€โ”€ pyproject.toml # Python project configuration +โ”‚ โ”œโ”€โ”€ repo/ # Repository management +โ”‚ โ”œโ”€โ”€ processor/ # Man page processing +โ”‚ โ”œโ”€โ”€ web/ # Web page generation +โ”‚ โ”œโ”€โ”€ utils/ # Utilities +โ”‚ โ””โ”€โ”€ main.py # Main entry point and orchestration +โ”œโ”€โ”€ templates/ # Jinja2 templates +โ”œโ”€โ”€ Dockerfile # Multi-stage, arch-independent +โ””โ”€โ”€ pyproject.toml # Python project configuration ``` ### How It Works -1. **Package Discovery** ๐Ÿ” - - Parse repository `filelists.xml` to identify packages with man pages - - This is the **key optimization** - we know what to download before downloading! - -2. **Smart Download** โฌ‡๏ธ - - Download only packages containing man pages (60-80% reduction) - - Parallel downloads for speed - - Architecture-independent (man pages are the same across arches) - -3. **Extraction** ๐Ÿ“ฆ - - Extract man page files from RPM packages - - Handle gzipped and plain text man pages - - Support for multiple languages - -4. **Conversion** ๐Ÿ”„ - - Convert troff format to HTML using mandoc - - Clean up HTML output - - Parallel processing for speed - -5. **Web Generation** ๐ŸŒ - - Wrap HTML in beautiful templates - - Generate search index with fuzzy search - - Create multi-version navigation - -6. **Cleanup** ๐Ÿงน - - Automatically remove temporary files (configurable) - - Keep only what you need +1. **Package Discovery** - Parse repository `filelists.xml` to identify packages with man pages +2. **Smart Download** - Download only packages containing man pages with parallel downloads +3. **Extraction** - Extract man page files from RPM packages +4. **Conversion** - Convert troff format to HTML using mandoc +5. **Web Generation** - Wrap HTML in templates and generate search index +6. **Cleanup** - Automatically remove temporary files (configurable) ## Command Line Options @@ -266,183 +198,6 @@ Options: -v, --verbose Enable verbose logging ``` -### Examples - -```bash -# Quick test with one version -python -m rocky_man.main --versions 9.6 - -# Production build with all versions (default) -python -m rocky_man.main - -# Fast build with more parallelism -python -m rocky_man.main --parallel-downloads 15 --parallel-conversions 30 - -# Keep files for debugging -python -m rocky_man.main --keep-rpms --keep-extracts --verbose - -# Custom mirror (faster for your location) -python -m rocky_man.main --mirror https://mirror.usi.edu/pub/rocky/ - -# Only BaseOS (faster) -python -m rocky_man.main --repo-types BaseOS --versions 9.6 -``` - -## GitHub Actions Integration - -This project includes a **production-ready GitHub Actions workflow** that: - -- โœ… Runs automatically every Sunday at midnight UTC -- โœ… Can be manually triggered with custom version selection -- โœ… Builds man pages in a Rocky Linux container -- โœ… Automatically deploys to GitHub Pages -- โœ… Artifacts available for download - -### Setup Instructions - -1. **Enable GitHub Pages** - - Go to your repository โ†’ Settings โ†’ Pages - - Set source to **"GitHub Actions"** - - Save - -2. **Trigger the workflow** - - Go to Actions tab - - Select "Build Rocky Man Pages" - - Click "Run workflow" - - Choose versions (or use default) - -3. **Access your site** - - Will be available at: `https://YOUR_USERNAME.github.io/rocky-man/` - - Updates automatically every week! - -### Workflow File - -Located at `.github/workflows/build.yml`, it: -- Uses Rocky Linux 9 container -- Installs all dependencies -- Runs the build -- Uploads artifacts -- Deploys to GitHub Pages - -## What's Different from the Original - -| Feature | Old Version | New Version | -|---------|-------------|-------------| -| **Architecture** | Single 400-line file | Modular, 16 files across 6 modules | -| **Package Filtering** | Downloads everything | Pre-filters with filelists.xml | -| **Performance** | 2-3 hours, ~10 GB | 30-45 min, ~2-3 GB | -| **UI** | Basic template | Modern GitHub-inspired design | -| **Search** | Simple filter | Fuzzy search with Fuse.js | -| **Container** | Basic Podman commands | Multi-stage Dockerfile + compose | -| **Thread Safety** | Global dict issues | Proper locking mechanisms | -| **Cleanup** | Method exists but unused | Automatic, configurable | -| **Documentation** | Minimal comments | Comprehensive docstrings | -| **Type Hints** | None | Throughout codebase | -| **Error Handling** | Basic try/catch | Comprehensive with logging | -| **CI/CD** | None | GitHub Actions ready | -| **Testing** | None | Ready for pytest integration | -| **Configuration** | Hardcoded | Config class with defaults | - -## Project Structure Details - -``` -rocky-man/ -โ”œโ”€โ”€ src/rocky_man/ # Main source code -โ”‚ โ”œโ”€โ”€ __init__.py # Package initialization -โ”‚ โ”œโ”€โ”€ main.py # Entry point and orchestration (200 lines) -โ”‚ โ”œโ”€โ”€ models/ # Data models -โ”‚ โ”‚ โ”œโ”€โ”€ __init__.py -โ”‚ โ”‚ โ”œโ”€โ”€ package.py # Package model with properties -โ”‚ โ”‚ โ””โ”€โ”€ manfile.py # ManFile model with path parsing -โ”‚ โ”œโ”€โ”€ repo/ # Repository operations -โ”‚ โ”‚ โ”œโ”€โ”€ __init__.py -โ”‚ โ”‚ โ”œโ”€โ”€ manager.py # DNF integration, downloads -โ”‚ โ”‚ โ””โ”€โ”€ contents.py # Filelists parser (key optimization) -โ”‚ โ”œโ”€โ”€ processor/ # Processing pipeline -โ”‚ โ”‚ โ”œโ”€โ”€ __init__.py -โ”‚ โ”‚ โ”œโ”€โ”€ extractor.py # RPM extraction with rpmfile -โ”‚ โ”‚ โ””โ”€โ”€ converter.py # mandoc conversion wrapper -โ”‚ โ”œโ”€โ”€ web/ # Web generation -โ”‚ โ”‚ โ”œโ”€โ”€ __init__.py -โ”‚ โ”‚ โ””โ”€โ”€ generator.py # Template rendering, search index -โ”‚ โ””โ”€โ”€ utils/ # Utilities -โ”‚ โ”œโ”€โ”€ __init__.py -โ”‚ โ””โ”€โ”€ config.py # Configuration management -โ”œโ”€โ”€ templates/ # Jinja2 templates -โ”‚ โ”œโ”€โ”€ base.html # Base layout (modern dark theme) -โ”‚ โ”œโ”€โ”€ index.html # Search page (Fuse.js integration) -โ”‚ โ”œโ”€โ”€ manpage.html # Man page display -โ”‚ โ””โ”€โ”€ root.html # Multi-version landing -โ”œโ”€โ”€ old/ # Your original code (preserved) -โ”‚ โ”œโ”€โ”€ rocky_man.py -โ”‚ โ”œโ”€โ”€ rocky_man2.py -โ”‚ โ””โ”€โ”€ templates/ -โ”œโ”€โ”€ .github/ -โ”‚ โ””โ”€โ”€ workflows/ -โ”‚ โ””โ”€โ”€ build.yml # GitHub Actions workflow -โ”œโ”€โ”€ Dockerfile # Multi-stage build -โ”œโ”€โ”€ .dockerignore # Optimize Docker context -โ”œโ”€โ”€ docker-compose.yml # Dev environment -โ”œโ”€โ”€ pyproject.toml # Python project config -โ”œโ”€โ”€ .gitignore # Updated for new structure -โ””โ”€โ”€ README.md # This file! -``` - -## Development - -### Adding New Features - -The modular design makes it easy to extend: - -- **New repositories**: Add to `config.repo_types` in `utils/config.py` -- **Custom templates**: Use `--template-dir` flag or modify `templates/` -- **Additional metadata**: Extend `Package` or `ManFile` models -- **Alternative converters**: Implement new converter in `processor/` -- **Different outputs**: Add new generator in `web/` - -### Running Tests - -```bash -# Install dev dependencies -pip3 install -e ".[dev]" - -# Run tests (when implemented) -pytest - -# Type checking -mypy src/ - -# Linting -ruff check src/ -``` - -### Development Workflow - -```bash -# 1. Make changes to code -vim src/rocky_man/processor/converter.py - -# 2. Test locally in container -podman run --rm -it -v $(pwd):/app rockylinux:9 /bin/bash -cd /app -python3 -m rocky_man.main --versions 9.6 --verbose - -# 3. Build Docker image -docker build -t rocky-man . - -# 4. Test Docker image -docker run --rm -v $(pwd)/html:/data/html rocky-man --versions 9.6 - -# 5. Preview output -docker-compose up nginx -# Visit http://localhost:8080 - -# 6. Commit and push -git add . -git commit -m "feat: your feature description" -git push -``` - ## Troubleshooting ### DNF Errors @@ -510,12 +265,6 @@ python -m rocky_man.main --parallel-downloads 2 --parallel-conversions 5 python -m rocky_man.main --mirror https://mirror.example.com/rocky/ ``` -### UTF-8 Decode Errors - -**Problem**: `'utf-8' codec can't decode byte...` - -**Solution**: This is now handled with `errors='replace'` in the new version. The man page will still be processed with replacement characters for invalid UTF-8. - ## Performance Tips 1. **Use closer mirrors** - Significant speed improvement for downloads @@ -547,34 +296,3 @@ Contributions welcome! Please: 5. Commit with clear messages (`git commit -m 'feat: add amazing feature'`) 6. Push to your branch (`git push origin feature/amazing-feature`) 7. Open a Pull Request - -## Acknowledgments - -- Inspired by [debiman](https://github.com/Debian/debiman) for Debian -- Uses [mandoc](https://mandoc.bsd.lv/) for man page conversion -- Search powered by [Fuse.js](https://fusejs.io/) -- Modern UI design inspired by GitHub's dark theme - -## Links - -- [Rocky Linux](https://rockylinux.org/) -- [Man Page Format](https://man7.org/linux/man-pages/) -- [Mandoc Documentation](https://mandoc.bsd.lv/) -- [DNF Documentation](https://dnf.readthedocs.io/) - -## Roadmap - -- [ ] Add pytest test suite -- [ ] Implement incremental updates (checksum-based) -- [ ] Add support for localized man pages (es, fr, etc.) -- [ ] Create redirect system like debiman -- [ ] Add statistics page (most viewed, etc.) -- [ ] Implement RSS feed for updates -- [ ] Add support for Rocky Linux 10 (when released) -- [ ] Create sitemap.xml for SEO -- [ ] Add dark/light theme toggle -- [ ] Implement caching for faster rebuilds - ---- - -**Made with โค๏ธ for the Rocky Linux community**