Files
resf-testing-repo/docs/BOOTSTRAP-APPROACH.md
Stephen Simpson ec04f0bec5 Implement Ansible roles for Rocky Linux Testing Framework
- Added `bootstrap_sparrowdo` role for bootstrapping Sparrowdo on a VM.
- Introduced `cleanup_vm` role for cleaning up VMs and disk images.
- Created `download_image` role to download and cache QCOW2 images.
- Developed `golden_image` role for creating and customizing golden images.
- Implemented `provision_vm` role for provisioning VMs as linked clones.
- Added `run_test` role for executing tests with Sparrowdo.
- Created playbooks for building golden images, running single tests, and running test suites.
- Enhanced documentation with usage examples, configuration details, and troubleshooting tips.
- Added support for multiple cloud providers (AWS, Azure) in the test execution workflow.

Signed-off-by: Stephen Simpson <ssimpson89@users.noreply.github.com>
2025-12-29 16:02:39 -06:00

297 lines
7.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Bootstrap Approach for Sparrowdo
## Overview
This framework bootstraps Sparrowdo **ONCE** in the golden image, not on every test VM. This provides significant time and resource savings.
## How It Works
### Traditional Approach (Slow) ❌
```
For each test:
1. Provision VM from golden image
2. Bootstrap Sparrowdo (5-10 minutes)
3. Run test
4. Destroy VM
```
**Problem**: If you run 70 tests, you bootstrap 70 times = 6-12 hours wasted!
### Our Approach (Fast) ✅
```
Once per golden image:
1. Create golden image with Raku/zef installed
2. Boot temporary VM from golden image
3. Bootstrap Sparrowdo (5-10 minutes)
4. Shutdown VM (changes saved to golden image)
For each test:
1. Provision VM from bootstrapped golden image
2. Run test immediately (no bootstrap needed!)
3. Destroy VM
```
**Benefit**: Bootstrap once, test 70 times = 5-10 minutes total!
## Implementation Details
### Step 1: Golden Image Preparation (Offline)
The `docs/default-prep.sh` script runs inside `virt-customize` (offline mode):
```bash
# Install Raku and zef package manager
dnf install -y rakudo rakudo-zef
# Create rocky user with sudo access
useradd -m rocky
echo "rocky ALL=(ALL) NOPASSWD:ALL" >> /etc/sudoers.d/rocky
```
**Why offline?** This is fast and requires no network/VM boot.
### Step 2: Bootstrap Sparrowdo (Online - Once)
The `scripts/bootstrap_golden.sh` script:
1. **Boots temporary VM** from golden image
2. **Runs sparrowdo --bootstrap** via SSH
3. **Shuts down VM cleanly** (changes persist in golden image)
4. **Cleans up VM definition**
```bash
./scripts/bootstrap_golden.sh /path/to/golden.qcow2 ~/.ssh/id_rsa
```
**What bootstrap installs:**
- Sparrowdo Raku module
- All Sparrowdo dependencies
- Testing utilities
- Configuration files
### Step 3: Test Execution (Fast)
Each test VM is a **linked clone** of the bootstrapped golden image:
```bash
# Provision takes < 1 second (copy-on-write)
VM_IP=$(provision_vm.sh test-vm golden.qcow2)
# Run test immediately (no bootstrap!)
sparrowdo --host $VM_IP --ssh_user rocky --no_sudo --sparrowfile test.raku
```
## Manual Workflow
### First Time Setup
```bash
# 1. Create base golden image
./scripts/setup_base.sh \
/var/lib/libvirt/images/Rocky-9-GenericCloud-Base.qcow2 \
docs/default-prep.sh \
/var/lib/libvirt/images/golden-rocky9.qcow2 \
~/.ssh/id_rsa.pub
# 2. Bootstrap the golden image (one time, 5-10 minutes)
./scripts/bootstrap_golden.sh \
/var/lib/libvirt/images/golden-rocky9.qcow2 \
~/.ssh/id_rsa
```
### Running Tests
```bash
# No bootstrap needed! Just run tests directly
./scripts/provision_vm.sh test-vm-1 /var/lib/libvirt/images/golden-rocky9.qcow2
# ... run sparrowdo tests ...
./scripts/cleanup_vm.sh test-vm-1
```
## Workflow
Bootstrap is handled automatically in the build process:
1. **Prepare Golden Image**
- setup_base.sh → golden.qcow2 (with Raku)
2. **Bootstrap Golden Image**
- bootstrap_golden.sh → golden.qcow2 (with Sparrowdo)
3. **Run Tests in Parallel**
- provision → run test → cleanup
- provision → run test → cleanup
- provision → run test → cleanup
- (No bootstrap in any test!)
## Time Savings Example
### 70 Tests with Bootstrap-Per-Test
```
Bootstrap time: 7 minutes per test
Total bootstrap time: 70 × 7 = 490 minutes (8.2 hours)
Test time: 70 × 2 minutes = 140 minutes (2.3 hours)
TOTAL: 630 minutes (10.5 hours)
```
### 70 Tests with Bootstrap-Once
```
Bootstrap time: 7 minutes once
Total bootstrap time: 1 × 7 = 7 minutes
Test time: 70 × 2 minutes = 140 minutes (2.3 hours)
TOTAL: 147 minutes (2.5 hours)
```
**Savings: 8 hours!** (80% reduction in total time)
## Disk Space Considerations
### Without Linked Clones
```
Base image: 2 GB
Golden image: 2 GB
Test VMs: 70 × 2 GB = 140 GB
TOTAL: 144 GB
```
### With Linked Clones (Our Approach)
```
Base image: 2 GB (cached, reused)
Golden image: 2.5 GB (with Sparrowdo)
Test VMs: 70 × ~100 MB = 7 GB (only diffs stored)
TOTAL: 11.5 GB
```
**Savings: 132 GB!** (92% reduction in disk usage)
## Updating the Golden Image
If you need to update Sparrowdo or dependencies:
```bash
# Option 1: Rebuild from scratch
rm -f /var/lib/libvirt/images/golden-rocky9.qcow2
./scripts/setup_base.sh ... # Create fresh
./scripts/bootstrap_golden.sh ... # Bootstrap fresh
# Option 2: Update existing golden image
# Boot a VM from golden image, update packages, shutdown
VM_IP=$(./scripts/provision_vm.sh update-vm golden-rocky9.qcow2)
ssh rocky@$VM_IP 'zef upgrade Sparrowdo'
ssh rocky@$VM_IP 'sudo shutdown -h now'
# Changes are saved to golden image
```
## Troubleshooting
### Bootstrap Script Hangs
```bash
# Check if VM is running
virsh -c qemu:///system list | grep bootstrap
# Connect to VM console
virsh -c qemu:///system console bootstrap-golden-XXXXX
# Check bootstrap logs on VM
ssh rocky@VM_IP 'cat ~/.zef/*log'
```
### Test Fails with "Sparrowdo not found"
```bash
# Verify Sparrowdo is in golden image
ssh rocky@GOLDEN_VM_IP 'which sparrowdo'
ssh rocky@GOLDEN_VM_IP 'sparrowdo --version'
# If missing, re-run bootstrap
./scripts/bootstrap_golden.sh /path/to/golden.qcow2
```
### Bootstrap Fails on First Try
```bash
# Common issue: Network not ready during bootstrap
# Solution: Increase wait time in bootstrap script
# Or manually retry bootstrap command
sparrowdo --host VM_IP --ssh_user rocky --bootstrap --color
```
## Best Practices
1. **Cache base images** - Reuse downloaded QCOW2 files
2. **Bootstrap once per Rocky version** - Create golden-rocky8.qcow2, golden-rocky9.qcow2, etc.
3. **Version your golden images** - Use timestamps: golden-rocky9-20250125.qcow2
4. **Test golden images** - Always verify bootstrap succeeded before running full test suite
5. **Update periodically** - Rebuild golden images monthly to get security updates
## Security Considerations
### SSH Keys
- Golden image contains injected SSH public key
- Anyone with private key can SSH to any VM from this golden image
- **Recommendation**: Use dedicated testing SSH keys, not personal keys
### Passwords
- Rocky user password: `rockypass` (change in prep script if needed)
- Root password: `rockytesting`
- **Recommendation**: Disable password auth, use keys only
### Sudo Access
- Rocky user has NOPASSWD sudo (required for bootstrap)
- **Recommendation**: Only use these VMs in isolated test networks
## Advanced: Pre-cached Test Dependencies
You can extend the bootstrap to pre-install common test dependencies:
```bash
# In bootstrap_golden.sh, after sparrowdo --bootstrap:
ssh rocky@$VM_IP 'zef install Test::Class'
ssh rocky@$VM_IP 'zef install JSON::Fast'
ssh rocky@$VM_IP 'sudo dnf install -y postgresql-server'
```
This makes tests even faster by eliminating package install time during tests.
## Monitoring Bootstrap Success
The bootstrap script outputs:
```
[1/4] Provisioning temporary VM...
[2/4] Waiting for SSH to be ready...
[3/4] Running Sparrowdo bootstrap...
[4/4] Shutting down VM to save changes...
```
If any step fails, the golden image is NOT bootstrapped. Check logs and retry.
## Integration with Automation
### Nightly Golden Image Rebuild
```bash
# Cron job to rebuild golden images nightly
0 2 * * * cd /path/to/repo && ./scripts/rebuild-golden.sh
```
### Pre-commit Hook to Validate Tests
```bash
# .git/hooks/pre-commit
./scripts/validate-tests.sh
# Provisions temp VM, runs one test, destroys
```
### Scheduled Builds
Use your automation tool (cron, systemd timers, etc.) to rebuild golden images periodically:
```bash
# Weekly golden image rebuild
0 2 * * 0 /path/to/repo/ansible/playbooks/build-golden-image.yml
```
## Conclusion
Bootstrapping the golden image once provides:
- **10x faster test execution** (no per-test bootstrap)
- **90% less disk usage** (linked clones vs full copies)
- **Simpler test scripts** (no bootstrap logic needed)
- **Better reliability** (bootstrap failures affect one build, not all tests)
This approach is essential for running large test suites efficiently!