=============================================== Moving LLVM Projects to GitHub with Sub-Modules =============================================== Introduction ============ This is a proposal to move our current revision control system from our own hosted Subversion to GitHub. Below are the financial and technical arguments as to why we need such a move and how will people (and validation infrastructure) continue to work with a Git-based LLVM. There will be a survey pointing at this document when we'll know the community's reaction and, if we collectively decide to move, the time-frames. Be sure to make your views count. Essentially, the proposal is divided in the following parts: * Outline of the reasons to move to Git and GitHub * Description on what the work flow will look like (compared to SVN) * Remaining issues and potential problems * The proposed migration plan Why Git, and Why GitHub? ======================== Why move at all? ---------------- The strongest reason for the move, and why this discussion started in the first place, is that we currently host our own Subversion server and Git mirror in a voluntary basis. The LLVM Foundation sponsors the server and provides limited support, but there is only so much it can do. The volunteers are not Sysadmins themselves, but compiler engineers that happen to know a thing or two about hosting servers. We also don't have 24/7 support, and we sometimes wake up to see that continuous integration is broken because the SVN server is either down or unresponsive. With time and money, the foundation and volunteers could improve our services, implement more functionality and provide around the clock support, so that we can have a first class infrastructure with which to work. But the cost is not small, both in money and time invested. On the other hand, there are multiple services out there (GitHub, GitLab, BitBucket among others) that offer that same service (24/7 stability, disk space, Git server, code browsing, forking facilities, etc) for the very affordable price of *free*. Why Git? -------- Most new coders nowadays start with Git. A lot of them have never used SVN, CVS or anything else. Websites like GitHub have changed the landscape of open source contributions, reducing the cost of first contribution and fostering collaboration. Git is also the version control most LLVM developers use. Despite the sources being stored in an SVN server, most people develop using the Git-SVN integration, and that shows that Git is not only more powerful than SVN, but people have resorted to using a bridge because its features are now indispensable to their internal and external workflows. In essence, Git allows you to: * Commit, squash, merge, fork locally without any penalty to the server * Add as many branches as necessary to allow for multiple threads of development * Collaborate with peers directly, even without access to the Internet * Have multiple trees without multiplying disk space. In addition, because Git seems to be replacing every project's version control system, there are many more tools that can use Git's enhanced feature set, so new tooling is much more likely to support Git first (if not only), than any other version control system. Why GitHub? ----------- GitHub, like GitLab and BitBucket, provide free code hosting for open source projects. Essentially, they will completely replace *all* the infrastructure that we have today that serves code repository, mirroring, user control, etc. They also have a dedicated team to monitor, migrate, improve and distribute the contents of the repositories depending on region and load. A level of quality that we'd never have without spending money that would be better spent elsewhere, for example development meetings, sponsoring disadvantaged people to work on compilers and foster diversity and equality in our community. GitHub has the added benefit that we already have a presence there. Many developers use it already, and the mirror from our current repository is already set up. Furthermore, GitHub has an *SVN view* (https://github.com/blog/626-announcing-svn-support) where people that still have/want to use SVN infrastructure and tooling can slowly migrate or even stay working as if it was an SVN repository (including read-write access). So, any of the three solutions solve the cost and maintenance problem, but GitHub has two additional features that would be beneficial to the migration plan as well as the community already settled there. What will the new workflow look like ==================================== In order to move version control, we need to make sure that we get all the benefits with the least amount of problems. That's why the migration plan will be slow, one step at a time, and we'll try to make it look as close as possible to the current style without impacting the new features we want. Each LLVM project will continue to be hosted as separate GitHub repository under a single GitHub organisation. Users can continue to choose to use either SVN or Git to access the repositories to suit their current workflow. In addition, we'll create a repository that will mimic our current *linear history* repository. The most accepted proposal, then, was to have an umbrella project that will contain *sub-modules* (https://git-scm.com/book/en/v2/Git-Tools-Submodules) of all the LLVM projects and nothing else. This repository can be checked out on its own, in order to have *all* LLVM projects in a single check-out, as many people have suggested, but it can also only hold the references to the other projects, and be used for the sole purpose of understanding the *sequence* in which commits were added by using the ``git rev-list --count hash`` or ``git describe hash`` commands. One example of such a repository is Takumi's llvm-project-submodule (https://github.com/chapuni/llvm-project-submodule), which when checked out, will have the references to all sub-modules but not check them out, so one will need to *init* the module manually. This will allow the *exact* same behaviour as checking out individual SVN repositories, as it will keep the correct linear history. There is no need to additional tags, flags and properties, or external services controlling the history, since both SVN and *git rev-list* can already do that on their own. We will need additional server hooks to avoid non-fast-forwards commits (ex. merges, forced pushes, etc) in order to keep the linearity of the history. The three types hooks to be implemented are: * Status Checks: By placing status checks on a protected branch, we can guarantee that the history is kept linear and sane at all times, on all repositories. See: https://help.github.com/articles/about-required-status-checks/ * Umbrella updates: By using GitHub web hooks, we can update a small web-service inside LLVM's own infrastructure to update the umbrella project remotely. The maintenance of this service will be lower than the current SVN maintenance and the scope of its failures will be less severe. See: https://developer.github.com/webhooks/ * Commits email update: By adding an email web hook, we can make every push show in the lists, allowing us to retain history and do post-commit reviews. See: https://help.github.com/articles/managing-notifications-for-pushes-to-a-repository/ Access will be transferred one-to-one to GitHub accounts for everyone that already has commit access to our current repository. Those who don't have accounts will have to create one in order to continue contributing to the project. In the future, people only need to provide their GitHub accounts to be granted access. In a nutshell: * The projects' repositories will remain identical, with a new address (GitHub). * They'll continue to have SVN access (Read-Write), but will also gain Git RW access. * The linear history can still be accessed in the (RO) submodule meta project. * Individual projects' history will be local (ie. not interlaced with the other projects, as the current SVN repos are), and we need the umbrella project (using submodules) to have the same view as we had in SVN. Additionally, each repository will have the following server hooks: * Pre-commit hooks to stop people from applying non-fast-forward merges * Webhook to update the umbrella project (via buildbot or web services) * Email hook to each commits list (llvm-commit, cfe-commit, etc) Essentially, we're adding Git RW access in addition to the already existing structure, with all the additional benefits of it being in GitHub. Example of a working version: * Repository: https://github.com/llvm-beanz/llvm-submodules * Update bot: http://beanz-bot.com:8180/jenkins/job/submodule-update/ What will *not* be changed -------------------------- This is a change of version control system, not the whole infrastructure. There are plans to replace our current tools (review, bugs, documents), but they're all orthogonal to this proposal. We'll also be keeping the buildbots (and migrating them to use Git) as well as LNT, and any other system that currently provides value upstream. Any discussion regarding those tools are out of scope in this proposal. Remaining questions and problems ================================ 1. How much the SVN view emulates and how much it'll break tools/CI? For this one, we'll need people that will have problems in that area to tell us what's wrong and how to help them fix it. We also recommend people and companies to migrate to Git, for its many other additional benefits. 2. Which tools will need changing? LNT may break, since it relies on SVN's history. We can continue to use LNT with the SVN-View, but it would be best to move it to Git once and for all. The LLVMLab bisect tool will also be affected and will need adjusting. As with LNT, it should be fine to use GitHub's SVN view, but changing it to work on Git will be required in the long term. Phabricator will also need to change its configuration to point at the GitHub repositories, but since it already works with Git, this will be a trivial change. Migration Plan ============== If we decide to move, we'll have to set a date for the process to begin. As usual, we should be announcing big changes in one release to happen in the next one. But since this won't impact external users (if they rely on our source release tarballs), we don't necessarily have to. We will have to make sure all the *problems* reported are solved before the final push. But we can start all non-binding processes (like mirroring to GitHub and testing the SVN interface in it) before any hard decision. Here's a proposed plan: STEP #1 : Pre Move 0. Update docs to mention the move, so people are aware the it's going on. 1. Register an official GitHub project with the LLVM foundation. 2. Setup another (read-only) mirror of llvm.org/git at this GitHub project, adding all necessary hooks to avoid broken history (merge, dates, pushes), as well as a webhook to update the umbrella project (see below). 3. Make sure we have an llvm-project (with submodules) setup in the official account, with all necessary hooks (history, update, merges). 4. Make sure bisecting with llvm-project works. 5. Make sure no one has any other blocker. STEP #2 : Git Move 6. Update the buildbots to pick up updates and commits from the official git repository. 7. Update Phabricator to pick up commits from the official git repository. 8. Tell people living downstream to pick up commits from the official git repository. 9. Give things time to settle. We could play some games like disabling the SVN repository for a few hours on purpose so that people can test that their infrastructure has really become independent of the SVN repository. Until this point nothing has changed for developers, it will just boil down to a lot of work for buildbot and other infrastructure owners. Once all dependencies are cleared, and all problems have been solved: STEP #3: Write Access Move 10. Collect peoples GitHub account information, adding them to the project. 11. Switch SVN repository to read-only and allow pushes to the GitHub repository. 12. Mirror Git to SVN. STEP #4 : Post Move 13. Archive the SVN repository, if GitHub's SVN is good enough. 14. Review and update *all* LLVM documentation. 15. Review website links pointing to viewvc/klaus/phab etc. to point to GitHub instead.