add-feedback-improvements #5

Open
admin wants to merge 10 commits from add-feedback-improvements into main
Showing only changes of commit f474c238dc - Show all commits

420
README.md
View File

@@ -1,121 +1,85 @@
# Rocky Man 📚
**Rocky Man** is a comprehensive man page hosting solution for Rocky Linux, providing beautiful, searchable documentation for all packages in BaseOS and AppStream repositories across Rocky Linux 8, 9, and 10.
> **✨ This is a complete rewrite** with 60-80% faster performance, modern architecture, and production-ready features!
## 🎉 What's New in This Rewrite
This version is a **complete ground-up rebuild** with major improvements:
- 🚀 **60-80% faster** - Pre-filters packages using filelists.xml (downloads only ~800 packages instead of ~3000)
- 🏗️ **Modular architecture** - Clean separation into models, repo, processor, web, and utils
- 🎨 **Modern UI** - Beautiful dark theme with instant fuzzy search
- 🐳 **Container ready** - Multi-stage Dockerfile that works on any architecture
-**Parallel processing** - Concurrent downloads and HTML conversions
- 🧹 **Smart cleanup** - Automatic cleanup of temporary files
- 📝 **Well documented** - Comprehensive docstrings and type hints throughout
- 🔒 **Thread safe** - Proper locking and resource management
- 🤖 **GitHub Actions** - Automated weekly builds and deployment
### Performance Comparison
| Metric | Old Version | New Version | Improvement |
|--------|-------------|-------------|-------------|
| Packages Downloaded | ~3000 | ~800 | 73% reduction |
| Processing Time | 2-3 hours | 30-45 minutes | 75% faster |
| Bandwidth Used | ~10 GB | ~2-3 GB | 80% reduction |
| Architecture | Single file | Modular (16 files) | Much cleaner |
| Thread Safety | ⚠️ Issues | ✅ Safe | Fixed |
| Cleanup | Manual | Automatic | Improved |
| UI Quality | Basic | Modern | Much better |
**Rocky Man** is a tool for generating searchable HTML documentation from Rocky Linux man pages across BaseOS and AppStream repositories for Rocky Linux 8, 9, and 10.
## Features
- **Fast & Efficient**: Uses filelists.xml to pre-filter packages with man pages (massive bandwidth savings)
- 🔍 **Fuzzy Search**: Instant search across all man pages with Fuse.js
- 🎨 **Modern UI**: Clean, responsive dark theme interface inspired by GitHub
- 📦 **Complete Coverage**: All packages from BaseOS and AppStream repositories
- 🐳 **Container Ready**: Architecture-independent Docker support (works on x86_64, aarch64, arm64, etc.)
- 🚀 **GitHub Actions**: Automated weekly builds and deployment to GitHub Pages
- 🧹 **Smart Cleanup**: Automatic cleanup of temporary files (configurable)
-**Parallel Processing**: Concurrent downloads and conversions for maximum speed
- 🌐 **Multi-version**: Support for Rocky Linux 8, 9, and 10 simultaneously
- **Fast & Efficient**: Uses filelists.xml to pre-filter packages with man pages
- **Complete Coverage**: All packages from BaseOS and AppStream repositories
- **Container Ready**: Works on x86_64, aarch64, arm64, etc.
- **Smart Cleanup**: Automatic cleanup of temporary files (configurable)
- **Parallel Processing**: Concurrent downloads and conversions for maximum speed
- **Multi-version**: Support for Rocky Linux 8, 9, and 10 simultaneously
## Quick Start
### Option 1: Docker (Recommended)
```bash
# Build the image
docker build -t rocky-man .
# Generate man pages for Rocky Linux 9.6
docker run --rm -v $(pwd)/html:/data/html rocky-man --versions 9.6
# Generate for multiple versions
docker run --rm -v $(pwd)/html:/data/html rocky-man --versions 8.10 9.6 10.0
# With verbose logging
docker run --rm -v $(pwd)/html:/data/html rocky-man --versions 9.6 --verbose
# Keep downloaded RPMs (mount the download directory)
docker run --rm -it \
-v $(pwd)/html:/data/html \
-v $(pwd)/downloads:/data/tmp/downloads \
rocky-man --versions 9.6 --keep-rpms --verbose
```
### Option 2: Podman (Native Rocky Linux)
### Podman (Recommended)
```bash
# Build the image
podman build -t rocky-man .
# Run with podman (note the :Z flag for SELinux)
podman run --rm -v $(pwd)/html:/data/html:Z rocky-man --versions 9.6
# Generate man pages for Rocky Linux 9.6 (using defaults, no custom args)
podman run --rm -v $(pwd)/html:/data/html:Z rocky-man
# Interactive mode for debugging
podman run --rm -it -v $(pwd)/html:/data/html:Z rocky-man --versions 9.6 --verbose
# Generate for specific versions (requires explicit paths)
podman run --rm -v $(pwd)/html:/app/html:Z rocky-man \
--versions 8.10 9.6 10.0 --output-dir /app/html
# With verbose logging
podman run --rm -v $(pwd)/html:/app/html:Z rocky-man \
--versions 9.6 --output-dir /app/html --verbose
# Keep downloaded RPMs (mount the download directory)
podman run --rm -it \
-v $(pwd)/html:/data/html:Z \
-v $(pwd)/downloads:/data/tmp/downloads:Z \
rocky-man --versions 9.6 --keep-rpms --verbose
-v $(pwd)/html:/app/html:Z \
-v $(pwd)/downloads:/app/tmp/downloads:Z \
rocky-man --versions 9.6 --keep-rpms \
--output-dir /app/html --download-dir /app/tmp/downloads --verbose
```
### Option 3: Docker Compose (Development)
### Docker
```bash
# Build and run
docker-compose up
# Build the image
docker build -t rocky-man .
# The generated HTML will be in ./html/
# Preview at http://localhost:8080 (nginx container)
# Generate man pages (using defaults, no custom args)
docker run --rm -v $(pwd)/html:/data/html rocky-man
# Generate for specific versions (requires explicit paths)
docker run --rm -v $(pwd)/html:/app/html rocky-man \
--versions 9.6 --output-dir /app/html
# Interactive mode for debugging
docker run --rm -it -v $(pwd)/html:/app/html rocky-man \
--versions 9.6 --output-dir /app/html --verbose
# Keep downloaded RPMs (mount the download directory)
docker run --rm -it \
-v $(pwd)/html:/app/html \
-v $(pwd)/downloads:/app/tmp/downloads \
rocky-man --versions 9.6 --keep-rpms \
--output-dir /app/html --download-dir /app/tmp/downloads --verbose
```
### Directory Structure in Container
When running in a container, rocky-man uses these directories inside `/data/`:
The container uses different paths depending on whether you pass custom arguments:
- `/data/html` - Generated HTML output (mount this to access results)
- `/data/tmp/downloads` - Downloaded RPM files (temporary)
- `/data/tmp/extracts` - Extracted man page files (temporary)
**Without custom arguments** (using Dockerfile CMD defaults):
- `/data/html` - Generated HTML output
- `/data/tmp/downloads` - Downloaded RPM files
- `/data/tmp/extracts` - Extracted man page files
By default, RPMs and extracts are automatically cleaned up after processing. If you want to keep the RPMs (e.g., for debugging or multiple runs), mount the download directory and use `--keep-rpms`:
**With custom arguments** (argparse defaults from working directory `/app`):
- `/app/html` - Generated HTML output
- `/app/tmp/downloads` - Downloaded RPM files
- `/app/tmp/extracts` - Extracted man page files
```bash
# This keeps RPMs on your host in ./downloads/
podman run --rm -it \
-v $(pwd)/html:/data/html:Z \
-v $(pwd)/downloads:/data/tmp/downloads:Z \
rocky-man --versions 9.6 --keep-rpms
```
**Important**: When passing custom arguments, the container's CMD is overridden and the code falls back to relative paths (`./html` = `/app/html`). You must explicitly specify `--output-dir /app/html --download-dir /app/tmp/downloads` to match your volume mounts. Without this, files are written inside the container and lost when it stops (especially with `--rm`).
**Note**: Without mounting `/data/tmp/downloads`, the `--keep-rpms` flag will keep files inside the container, but they'll be lost when the container stops (especially with `--rm`).
### Option 4: Local Development
### Local Development
#### Prerequisites
@@ -154,6 +118,9 @@ python -m rocky_man.main --parallel-downloads 10 --parallel-conversions 20
# Use a different mirror
python -m rocky_man.main --mirror https://mirrors.example.com/
# Only BaseOS (faster)
python -m rocky_man.main --repo-types BaseOS --versions 9.6
```
## Architecture
@@ -164,59 +131,24 @@ Rocky Man is organized into clean, modular components:
rocky-man/
├── src/rocky_man/
│ ├── models/ # Data models (Package, ManFile)
│ ├── package.py # RPM package representation
│ └── manfile.py # Man page file representation
│ ├── repo/ # Repository management
│ ├── manager.py # DNF repository operations
│ └── contents.py # Filelists.xml parser (key optimization!)
│ ├── processor/ # Man page processing
│ │ ├── extractor.py # Extract man pages from RPMs
│ │ └── converter.py # Convert to HTML with mandoc
│ ├── web/ # Web page generation
│ │ └── generator.py # HTML and search index generation
│ ├── utils/ # Utilities
│ │ └── config.py # Configuration management
│ └── main.py # Main entry point and orchestration
├── templates/ # Jinja2 templates
│ ├── base.html # Base template with modern styling
│ ├── index.html # Search page with Fuse.js
│ ├── manpage.html # Individual man page display
│ └── root.html # Multi-version landing page
├── Dockerfile # Multi-stage, arch-independent
├── docker-compose.yml # Development setup with nginx
├── .github/workflows/ # GitHub Actions automation
└── pyproject.toml # Python project configuration
│ ├── repo/ # Repository management
├── processor/ # Man page processing
│ ├── web/ # Web page generation
│ ├── utils/ # Utilities
│ └── main.py # Main entry point and orchestration
├── templates/ # Jinja2 templates
├── Dockerfile # Multi-stage, arch-independent
└── pyproject.toml # Python project configuration
```
### How It Works
1. **Package Discovery** 🔍
- Parse repository `filelists.xml` to identify packages with man pages
- This is the **key optimization** - we know what to download before downloading!
2. **Smart Download** ⬇️
- Download only packages containing man pages (60-80% reduction)
- Parallel downloads for speed
- Architecture-independent (man pages are the same across arches)
3. **Extraction** 📦
- Extract man page files from RPM packages
- Handle gzipped and plain text man pages
- Support for multiple languages
4. **Conversion** 🔄
- Convert troff format to HTML using mandoc
- Clean up HTML output
- Parallel processing for speed
5. **Web Generation** 🌐
- Wrap HTML in beautiful templates
- Generate search index with fuzzy search
- Create multi-version navigation
6. **Cleanup** 🧹
- Automatically remove temporary files (configurable)
- Keep only what you need
1. **Package Discovery** - Parse repository `filelists.xml` to identify packages with man pages
2. **Smart Download** - Download only packages containing man pages with parallel downloads
3. **Extraction** - Extract man page files from RPM packages
4. **Conversion** - Convert troff format to HTML using mandoc
5. **Web Generation** - Wrap HTML in templates and generate search index
6. **Cleanup** - Automatically remove temporary files (configurable)
## Command Line Options
@@ -266,183 +198,6 @@ Options:
-v, --verbose Enable verbose logging
```
### Examples
```bash
# Quick test with one version
python -m rocky_man.main --versions 9.6
# Production build with all versions (default)
python -m rocky_man.main
# Fast build with more parallelism
python -m rocky_man.main --parallel-downloads 15 --parallel-conversions 30
# Keep files for debugging
python -m rocky_man.main --keep-rpms --keep-extracts --verbose
# Custom mirror (faster for your location)
python -m rocky_man.main --mirror https://mirror.usi.edu/pub/rocky/
# Only BaseOS (faster)
python -m rocky_man.main --repo-types BaseOS --versions 9.6
```
## GitHub Actions Integration
This project includes a **production-ready GitHub Actions workflow** that:
- ✅ Runs automatically every Sunday at midnight UTC
- ✅ Can be manually triggered with custom version selection
- ✅ Builds man pages in a Rocky Linux container
- ✅ Automatically deploys to GitHub Pages
- ✅ Artifacts available for download
### Setup Instructions
1. **Enable GitHub Pages**
- Go to your repository → Settings → Pages
- Set source to **"GitHub Actions"**
- Save
2. **Trigger the workflow**
- Go to Actions tab
- Select "Build Rocky Man Pages"
- Click "Run workflow"
- Choose versions (or use default)
3. **Access your site**
- Will be available at: `https://YOUR_USERNAME.github.io/rocky-man/`
- Updates automatically every week!
### Workflow File
Located at `.github/workflows/build.yml`, it:
- Uses Rocky Linux 9 container
- Installs all dependencies
- Runs the build
- Uploads artifacts
- Deploys to GitHub Pages
## What's Different from the Original
| Feature | Old Version | New Version |
|---------|-------------|-------------|
| **Architecture** | Single 400-line file | Modular, 16 files across 6 modules |
| **Package Filtering** | Downloads everything | Pre-filters with filelists.xml |
| **Performance** | 2-3 hours, ~10 GB | 30-45 min, ~2-3 GB |
| **UI** | Basic template | Modern GitHub-inspired design |
| **Search** | Simple filter | Fuzzy search with Fuse.js |
| **Container** | Basic Podman commands | Multi-stage Dockerfile + compose |
| **Thread Safety** | Global dict issues | Proper locking mechanisms |
| **Cleanup** | Method exists but unused | Automatic, configurable |
| **Documentation** | Minimal comments | Comprehensive docstrings |
| **Type Hints** | None | Throughout codebase |
| **Error Handling** | Basic try/catch | Comprehensive with logging |
| **CI/CD** | None | GitHub Actions ready |
| **Testing** | None | Ready for pytest integration |
| **Configuration** | Hardcoded | Config class with defaults |
## Project Structure Details
```
rocky-man/
├── src/rocky_man/ # Main source code
│ ├── __init__.py # Package initialization
│ ├── main.py # Entry point and orchestration (200 lines)
│ ├── models/ # Data models
│ │ ├── __init__.py
│ │ ├── package.py # Package model with properties
│ │ └── manfile.py # ManFile model with path parsing
│ ├── repo/ # Repository operations
│ │ ├── __init__.py
│ │ ├── manager.py # DNF integration, downloads
│ │ └── contents.py # Filelists parser (key optimization)
│ ├── processor/ # Processing pipeline
│ │ ├── __init__.py
│ │ ├── extractor.py # RPM extraction with rpmfile
│ │ └── converter.py # mandoc conversion wrapper
│ ├── web/ # Web generation
│ │ ├── __init__.py
│ │ └── generator.py # Template rendering, search index
│ └── utils/ # Utilities
│ ├── __init__.py
│ └── config.py # Configuration management
├── templates/ # Jinja2 templates
│ ├── base.html # Base layout (modern dark theme)
│ ├── index.html # Search page (Fuse.js integration)
│ ├── manpage.html # Man page display
│ └── root.html # Multi-version landing
├── old/ # Your original code (preserved)
│ ├── rocky_man.py
│ ├── rocky_man2.py
│ └── templates/
├── .github/
│ └── workflows/
│ └── build.yml # GitHub Actions workflow
├── Dockerfile # Multi-stage build
├── .dockerignore # Optimize Docker context
├── docker-compose.yml # Dev environment
├── pyproject.toml # Python project config
├── .gitignore # Updated for new structure
└── README.md # This file!
```
## Development
### Adding New Features
The modular design makes it easy to extend:
- **New repositories**: Add to `config.repo_types` in `utils/config.py`
- **Custom templates**: Use `--template-dir` flag or modify `templates/`
- **Additional metadata**: Extend `Package` or `ManFile` models
- **Alternative converters**: Implement new converter in `processor/`
- **Different outputs**: Add new generator in `web/`
### Running Tests
```bash
# Install dev dependencies
pip3 install -e ".[dev]"
# Run tests (when implemented)
pytest
# Type checking
mypy src/
# Linting
ruff check src/
```
### Development Workflow
```bash
# 1. Make changes to code
vim src/rocky_man/processor/converter.py
# 2. Test locally in container
podman run --rm -it -v $(pwd):/app rockylinux:9 /bin/bash
cd /app
python3 -m rocky_man.main --versions 9.6 --verbose
# 3. Build Docker image
docker build -t rocky-man .
# 4. Test Docker image
docker run --rm -v $(pwd)/html:/data/html rocky-man --versions 9.6
# 5. Preview output
docker-compose up nginx
# Visit http://localhost:8080
# 6. Commit and push
git add .
git commit -m "feat: your feature description"
git push
```
## Troubleshooting
### DNF Errors
@@ -510,12 +265,6 @@ python -m rocky_man.main --parallel-downloads 2 --parallel-conversions 5
python -m rocky_man.main --mirror https://mirror.example.com/rocky/
```
### UTF-8 Decode Errors
**Problem**: `'utf-8' codec can't decode byte...`
**Solution**: This is now handled with `errors='replace'` in the new version. The man page will still be processed with replacement characters for invalid UTF-8.
## Performance Tips
1. **Use closer mirrors** - Significant speed improvement for downloads
@@ -547,34 +296,3 @@ Contributions welcome! Please:
5. Commit with clear messages (`git commit -m 'feat: add amazing feature'`)
6. Push to your branch (`git push origin feature/amazing-feature`)
7. Open a Pull Request
## Acknowledgments
- Inspired by [debiman](https://github.com/Debian/debiman) for Debian
- Uses [mandoc](https://mandoc.bsd.lv/) for man page conversion
- Search powered by [Fuse.js](https://fusejs.io/)
- Modern UI design inspired by GitHub's dark theme
## Links
- [Rocky Linux](https://rockylinux.org/)
- [Man Page Format](https://man7.org/linux/man-pages/)
- [Mandoc Documentation](https://mandoc.bsd.lv/)
- [DNF Documentation](https://dnf.readthedocs.io/)
## Roadmap
- [ ] Add pytest test suite
- [ ] Implement incremental updates (checksum-based)
- [ ] Add support for localized man pages (es, fr, etc.)
- [ ] Create redirect system like debiman
- [ ] Add statistics page (most viewed, etc.)
- [ ] Implement RSS feed for updates
- [ ] Add support for Rocky Linux 10 (when released)
- [ ] Create sitemap.xml for SEO
- [ ] Add dark/light theme toggle
- [ ] Implement caching for faster rebuilds
---
**Made with ❤️ for the Rocky Linux community**