Add update_readme_coverage.py script that calculates per-platform
package coverage using check_missing_packages.py logic and updates
README.md with shields.io badges and a summary table. Integrate
the script into both the daily stats workflow and the build workflow.
Add GitHub Issue Forms template for requesting pre-built wheels
with fields for Flash Attention, Python, PyTorch, CUDA versions
and platform selection. Enable blank issues via config.yml.
gcc-toolset-14 (GCC 14) in manylinux_2_28 containers is incompatible
with CUDA 12.6 nvcc, causing compilation errors in type_traits headers.
Use gcc-toolset-13 instead and export CC/CXX to GITHUB_ENV to ensure
subsequent steps use the correct compiler.
- Rename docs/ to doc/ (contains packages.md, release_history.md, etc.)
- Rename pages/ to docs/ (contains search page index.html)
- Update all references in README.md, workflows, and Python scripts
GitHub Pages only supports / or /docs as the source directory.
Apply the same fix to _build_windows.yml and _build_windows_code_build.yml
to ensure gh CLI uses the correct repository context when uploading
release assets.
Remove working-directory from Upload Release Asset step to ensure
gh CLI uses the correct repository (flash-attention-prebuild-wheels)
instead of the cloned flash-attention repository.
- Replace manual GITHUB_REF string manipulation with github.ref_name in _build_windows.yml, _build_windows_code_build.yml, and _build_windows_self_host.yml.
- Add _build_windows_self_host.yml for self-hosted Windows wheel builds.
- Integrate self-hosted Windows build job into main build.yml workflow.
- Update create_matrix.py to include and enable Windows self-hosted build matrix.
- Implement comprehensive cleanup steps in the self-hosted runner workflow to ensure a clean state for subsequent runs.
- Fix wheel path issue caused by build_windows.ps1 changing to
flash-attention directory (was causing double path like
flash-attention/flash-attention/dist)
- Add working-directory to Install Test and Upload steps for explicit
directory control
- Add log grouping (::group::) in build_windows.ps1 for collapsible
logs in GitHub Actions
- Suppress verbose output with pip -q, git clone -q, and
NINJA_STATUS=""
- Add pwsh and vswhere to prerequisites list
- Increase timeout to 2160 minutes for long builds
- Improve CUDA cleanup using proper Windows uninstaller
- Update README platform table and manylinux compatibility note
- Add .github/workflows/test-windows-self-hosted.yml for Windows self-hosted runner testing.
- Update README.md with comprehensive self-hosted runner setup guides for Linux, ARM64, and Windows.
- Update self-hosted-runner/compose.yml to enable both x86_64 and ARM64 runner services.
- Add a note about manylinux2_28 and update the sponsor list in README.md.
- Merge _build_manylinux_self_host.yml into _build_linux_self_host.yml
- Add automatic package manager detection (apt-get/dnf) for both x86_64 and ARM64
- Add environment check steps to all self-hosted workflows
- Update build.yml to use unified workflow with container-image parameter
- Remove duplicate build_wheels_manylinux_self_hosted job
- Update test workflows to use consolidated workflow
- Remove manylinux_self_hosted matrix configuration
- Set manylinux container image for ARM64 test workflow
- Create .github/actions/build-and-upload composite action for shared build logic
- Consolidate Python/uv setup, CUDA setup, wheel building, testing, and upload steps
- Update _build_linux.yml to use new composite action (reduced from 136 to 69 lines)
- Update _build_linux_arm_self_host.yml to use composite action with cleanup enabled
- Update _build_linux_self_host.yml for both container and no-container jobs
- Update _build_manylinux_self_host.yml to use composite action
- Reduce code duplication by ~200+ lines across all workflow files
- Update .venv/bin to .venv/Scripts for Windows compatibility
- Use PowerShell syntax for path assignment in _build_windows_code_build.yml
- Ensure correct GitHub PATH environment variable setup on Windows
- Add step ID to auditwheel_repair for proper output tracking
- Update wheel path references to use correct step outputs
- Add patchelf dependency for ARM Linux builds
- Add explicit shell specification for manylinux workflow
- Remove unnecessary sudo calls from dnf commands in self-hosted manylinux environment
- Commands are already executed with root privileges in the CI runner
- Change wheel output to use full path instead of basename for better flexibility
- Add patchelf to build dependencies for wheel repair operations
- Enable auditwheel repair step with proper exclusions for CUDA/torch libraries
- Separate wheel path variable to track both standard and manylinux versions
- Rename manylinux workflow file to reflect its dedicated purpose
- Update workflow references to use renamed manylinux workflow
- Use fromjson() for runner parameter parsing in ARM build workflow
- Update test workflows to use container environment consistently
- Support both manylinux and standard wheel uploads to releases
- Replace WHEEL_NAME variables with WHEEL_PATH to store full file paths
instead of just filenames, eliminating redundant directory concatenation
- Update all workflow references to use WHEEL_PATH directly in install
and upload commands
- Simplify manylinux wheel handling by storing full paths in output variables
- Reduce tested versions in Linux self-hosted matrix to focus on Python 3.14
- Reduce tested versions in ARM64 self-hosted matrix to focus on Python 3.14
and flash-attn 2.8.3
- Apply formatting standardization to Windows CodeBuild matrix
- Replace actions/setup-python with uv-based Python installation
- Consolidate build dependencies into Python installation step
- Enable auditwheel repair and manylinux wheel generation across all Linux builds
- Add patchelf as build dependency for glibc compatibility
- Update Python version to 3.14 in build and test workflows
- Convert runs-on array syntax to single runner value with fromjson()
- Update runner parameter defaults to JSON string format
- Add runner labels to build workflow calls
- Update test workflow runner configurations
- Remove unused test-utils.yml workflow
- Add arm64 label to ARM64 self-hosted runner job
- Add x64 label to x86_64 self-hosted runner jobs (both container and non-container)
- Ensures runner selection by both custom runner name and architecture type
- Move auditwheel repair after initial release upload with continue-on-error to allow pipeline continuation
- Add manylinux platform normalization support in normalize_platform_name()
- Expand self-hosted build matrix to include Python 3.14 and Flash Attention 2.8.3
- Improve wheel upload flow by separating regular and manylinux wheel handling
- Rename workflow file from _build_linux_self_host.yml to _build_linux_arm_self_host.yml for clarity
- Update workflow_call inputs to use flexible container image configuration
- Remove use-container flag and replace with container-image parameter
- Consolidate ubuntu image handling with --platform linux/arm64 option
- Update test-arm-self-hosted.yml to reference the renamed workflow