use prettiery to auto format md files (#398)

2026-07-01 21:34:01 -04:00 · 2021-06-05 13:01:58 +08:00
parent db6371400e
commit 2ddc7174af
9 changed files with 107 additions and 101 deletions
@@ -27,7 +27,6 @@ env:
  ARCHERY_DOCKER_PASSWORD: ${{ secrets.DOCKERHUB_TOKEN }}

 jobs:
-
  lint:
    name: Lint C++, Python, R, Rust, Docker, RAT
    runs-on: ubuntu-latest
@@ -41,3 +40,16 @@ jobs:
        run: pip install -e dev/archery[docker]
      - name: Lint
        run: archery lint --rat
+  prettier:
+    name: Use prettier to check formatting of documents
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v2
+      - uses: actions/setup-node@v2
+        with:
+          node-version: "14"
+      - name: Prettier check
+        run: |
+          # if you encounter error, try rerun the command below with --write instead of --check
+          # and commit the changes
+          npx prettier@2.3.0 --check {arrow,arrow-flight,dev,integration-testing,parquet}/**/*.md README.md CODE_OF_CONDUCT.md CONTRIBUTING.md
@@ -19,6 +19,6 @@

 # Code of Conduct

-* [Code of Conduct for The Apache Software Foundation][1]
+- [Code of Conduct for The Apache Software Foundation][1]

-[1]: https://www.apache.org/foundation/policies/conduct.html
+[1]: https://www.apache.org/foundation/policies/conduct.html
@@ -21,15 +21,15 @@

 ## Did you find a bug?

-The Arrow project uses JIRA as a bug tracker.  To report a bug, you'll have
+The Arrow project uses JIRA as a bug tracker. To report a bug, you'll have
 to first create an account on the
-[Apache Foundation JIRA](https://issues.apache.org/jira/).  The JIRA server
-hosts bugs and issues for multiple Apache projects.  The JIRA project name
+[Apache Foundation JIRA](https://issues.apache.org/jira/). The JIRA server
+hosts bugs and issues for multiple Apache projects. The JIRA project name
 for Arrow is "ARROW".

 To be assigned to an issue, ask an Arrow JIRA admin to go to
 [Arrow Roles](https://issues.apache.org/jira/plugins/servlet/project-config/ARROW/roles),
-click "Add users to a role," and add you to the "Contributor" role.  Most
+click "Add users to a role," and add you to the "Contributor" role. Most
 committers are authorized to do this; if you're a committer and aren't
 able to load that project admin page, have someone else add you to the
 necessary role.
@@ -39,15 +39,15 @@ Before you create a new bug entry, we recommend you first
 among existing Arrow issues.

 When you create a new JIRA entry, please don't forget to fill the "Component"
-field.  Arrow has many subcomponents and this helps triaging and filtering
-tremendously.  Also, we conventionally prefix the issue title with the component
+field. Arrow has many subcomponents and this helps triaging and filtering
+tremendously. Also, we conventionally prefix the issue title with the component
 name in brackets, such as "[C++] Crash in Array::Frobnicate()", so as to make
 lists more easy to navigate, and we'd be grateful if you did the same.

 ## Did you write a patch that fixes a bug or brings an improvement?

-First create a JIRA entry as described above.  Then, submit your changes
-as a GitHub Pull Request.  We'll ask you to prefix the pull request title
+First create a JIRA entry as described above. Then, submit your changes
+as a GitHub Pull Request. We'll ask you to prefix the pull request title
 with the JIRA issue number and the component name in brackets.
 (for example: "ARROW-2345: [C++] Fix crash in Array::Frobnicate()").
 Respecting this convention makes it easier for us to process the backlog
@@ -55,13 +55,13 @@ of submitted Pull Requests.

 ### Minor Fixes

-Any functionality change should have a JIRA opened.  For minor changes that
-affect documentation, you do not need to open up a JIRA.  Instead you can
+Any functionality change should have a JIRA opened. For minor changes that
+affect documentation, you do not need to open up a JIRA. Instead you can
 prefix the title of your PR with "MINOR: " if meets the following guidelines:

-*  Grammar, usage and spelling fixes that affect no more than 2 files
-*  Documentation updates affecting no more than 2 files and not more
-   than 500 words.
+- Grammar, usage and spelling fixes that affect no more than 2 files
+- Documentation updates affecting no more than 2 files and not more
+  than 500 words.

 ## Do you want to propose a significant new feature or an important refactoring?

@@ -25,13 +25,13 @@ Welcome to the implementation of Arrow, the popular in-memory columnar format, i

 This part of the Arrow project is divided in 4 main components:

-| Crate     | Description | Documentation |
-|-----------|-------------|---------------|
-|Arrow        | Core functionality (memory layout, arrays, low level computations) | [(README)](arrow/README.md) |
-|Parquet      | Parquet support | [(README)](parquet/README.md) |
-|Arrow-flight | Arrow data between processes | [(README)](arrow-flight/README.md) |
-|DataFusion   | In-memory query engine with SQL support | [(README)](https://github.com/apache/arrow-datafusion/blob/master/README.md) |
-|Ballista     | Distributed query execution | [(README)](https://github.com/apache/arrow-datafusion/blob/master/ballista/README.md) |
+| Crate        | Description                                                        | Documentation                                                                         |
+| ------------ | ------------------------------------------------------------------ | ------------------------------------------------------------------------------------- |
+| Arrow        | Core functionality (memory layout, arrays, low level computations) | [(README)](arrow/README.md)                                                           |
+| Parquet      | Parquet support                                                    | [(README)](parquet/README.md)                                                         |
+| Arrow-flight | Arrow data between processes                                       | [(README)](arrow-flight/README.md)                                                    |
+| DataFusion   | In-memory query engine with SQL support                            | [(README)](https://github.com/apache/arrow-datafusion/blob/master/README.md)          |
+| Ballista     | Distributed query execution                                        | [(README)](https://github.com/apache/arrow-datafusion/blob/master/ballista/README.md) |

 Independently, they support a vast array of functionality for in-memory computations.

@@ -39,15 +39,15 @@ Together, they allow users to write an SQL query or a `DataFrame` (using the `da

 Generally speaking, the `arrow` crate offers functionality to develop code that uses Arrow arrays, and `datafusion` offers most operations typically found in SQL, with the notable exceptions of:

-* `join`
-* `window` functions
+- `join`
+- `window` functions

 There are too many features to enumerate here, but some notable mentions:

-* `Arrow` implements all formats in the specification except certain dictionaries
-* `Arrow` supports SIMD operations to some of its vertical operations
-* `DataFusion` supports `async` execution
-* `DataFusion` supports user-defined functions, aggregates, and whole execution nodes
+- `Arrow` implements all formats in the specification except certain dictionaries
+- `Arrow` supports SIMD operations to some of its vertical operations
+- `DataFusion` supports `async` execution
+- `DataFusion` supports user-defined functions, aggregates, and whole execution nodes

 You can find more details about each crate in their respective READMEs.

@@ -118,7 +118,6 @@ export ARROW_TEST_DATA=$(cd ../testing/data; pwd)

 From here on, this is a pure Rust project and `cargo` can be used to run tests, benchmarks, docs and examples as usual.

-
 ### Running the tests

 Run tests using the Rust standard `cargo test` command:
@@ -156,9 +155,10 @@ If you use Visual Studio Code with the `rust-analyzer` plugin, you can enable `c
 One of the concerns with `clippy` is that it often produces a lot of false positives, or that some recommendations may hurt readability. We do not have a policy of which lints are ignored, but if you disagree with a `clippy` lint, you may disable the lint and briefly justify it.

 Search for `allow(clippy::` in the codebase to identify lints that are ignored/allowed. We currently prefer ignoring lints on the lowest unit possible.
-* If you are introducing a line that returns a lint warning or error, you may disable the lint on that line.
-* If you have several lints on a function or module, you may disable the lint on the function or module.
-* If a lint is pervasive across multiple modules, you may disable it at the crate level.
+
+- If you are introducing a line that returns a lint warning or error, you may disable the lint on that line.
+- If you have several lints on a function or module, you may disable the lint on the function or module.
+- If a lint is pervasive across multiple modules, you may disable it at the crate level.

 ## Git Pre-Commit Hook

@@ -79,12 +79,12 @@ The above script will run the `flatc` compiler and perform some adjustments to t

 Arrow uses the following features:

-* `simd` - Arrow uses the [packed_simd](https://crates.io/crates/packed_simd) crate to optimize many of the
- implementations in the [compute](https://github.com/apache/arrow/tree/master/rust/arrow/src/compute)
- module using SIMD intrinsics. These optimizations are turned *off* by default.
- If the `simd` feature is enabled, an unstable version of Rust is required (we test with `nightly-2021-03-24`)
-* `flight` which contains useful functions to convert between the Flight wire format and Arrow data
-* `prettyprint` which is a utility for printing record batches
+- `simd` - Arrow uses the [packed_simd](https://crates.io/crates/packed_simd) crate to optimize many of the
+  implementations in the [compute](https://github.com/apache/arrow/tree/master/rust/arrow/src/compute)
+  module using SIMD intrinsics. These optimizations are turned _off_ by default.
+  If the `simd` feature is enabled, an unstable version of Rust is required (we test with `nightly-2021-03-24`)
+- `flight` which contains useful functions to convert between the Flight wire format and Arrow data
+- `prettyprint` which is a utility for printing record batches

 Other than `simd` all the other features are enabled by default. Disabling `prettyprint` might be necessary in order to
 compile Arrow to the `wasm32-unknown-unknown` WASM target.
@@ -99,12 +99,12 @@ This crate only accepts the usage of `unsafe` code upon careful consideration, a

 Generally, `unsafe` should only be used when a `safe` counterpart is not available and there is no `safe` way to achieve additional performance in that area. The following is a summary of the current components of the crate that require `unsafe`:

-* alloc, dealloc and realloc of buffers along cache lines
-* Interpreting bytes as certain rust types, for access, representation and compute
-* Foreign interfaces (C data interface)
-* Inter-process communication (IPC)
-* SIMD
-* Performance (e.g. omit bounds checks, use of pointers to avoid bound checks)
+- alloc, dealloc and realloc of buffers along cache lines
+- Interpreting bytes as certain rust types, for access, representation and compute
+- Foreign interfaces (C data interface)
+- Inter-process communication (IPC)
+- SIMD
+- Performance (e.g. omit bounds checks, use of pointers to avoid bound checks)

 #### cache-line aligned memory management

@@ -147,13 +147,13 @@ Usage of `unsafe` for performance reasons is justified only when all other alter

 ### Considerations when introducing `unsafe`

-Usage of `unsafe` in this crate *must*:
+Usage of `unsafe` in this crate _must_:

-* not expose a public API as `safe` when there are necessary invariants for that API to be defined behavior.
-* have code documentation for why `safe` is not used / possible
-* have code documentation about which invariant the user needs to enforce to ensure [soundness](https://rust-lang.github.io/unsafe-code-guidelines/glossary.html#soundness-of-code--of-a-library), or which
-* invariant is being preserved.
-* if applicable, use `debug_assert`s to relevant invariants (e.g. bound checks)
+- not expose a public API as `safe` when there are necessary invariants for that API to be defined behavior.
+- have code documentation for why `safe` is not used / possible
+- have code documentation about which invariant the user needs to enforce to ensure [soundness](https://rust-lang.github.io/unsafe-code-guidelines/glossary.html#soundness-of-code--of-a-library), or which
+- invariant is being preserved.
+- if applicable, use `debug_assert`s to relevant invariants (e.g. bound checks)

 Example of code documentation:

@@ -2,9 +2,9 @@ Revision: {revision}

 Submitted crossbow builds: [{repo} @ {branch}](https://github.com/{repo}/branches/all?query={branch})

-|Task|Status|
-|----|------|
-|docker-cpp-cmake32|[![CircleCI](https://img.shields.io/circleci/build/github/{repo}/{branch}-circle-docker-cpp-cmake32.svg)](https://circleci.com/gh/{repo}/tree/{branch}-circle-docker-cpp-cmake32)|
-|wheel-osx-cp36m|[![TravisCI](https://img.shields.io/travis/{repo}/{branch}-travis-wheel-osx-cp36m.svg)](https://travis-ci.com/{repo}/branches)|
-|wheel-osx-cp37m|[![TravisCI](https://img.shields.io/travis/{repo}/{branch}-travis-wheel-osx-cp37m.svg)](https://travis-ci.com/{repo}/branches)|
-|wheel-win-cp36m|[![Appveyor](https://img.shields.io/appveyor/ci/{repo}/{branch}-appveyor-wheel-win-cp36m.svg)](https://ci.appveyor.com/project/{repo}/history)|
+| Task               | Status                                                                                                                                                                            |
+| ------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| docker-cpp-cmake32 | [![CircleCI](https://img.shields.io/circleci/build/github/{repo}/{branch}-circle-docker-cpp-cmake32.svg)](https://circleci.com/gh/{repo}/tree/{branch}-circle-docker-cpp-cmake32) |
+| wheel-osx-cp36m    | [![TravisCI](https://img.shields.io/travis/{repo}/{branch}-travis-wheel-osx-cp36m.svg)](https://travis-ci.com/{repo}/branches)                                                    |
+| wheel-osx-cp37m    | [![TravisCI](https://img.shields.io/travis/{repo}/{branch}-travis-wheel-osx-cp37m.svg)](https://travis-ci.com/{repo}/branches)                                                    |
+| wheel-win-cp36m    | [![Appveyor](https://img.shields.io/appveyor/ci/{repo}/{branch}-appveyor-wheel-win-cp36m.svg)](https://ci.appveyor.com/project/{repo}/history)                                    |
@@ -22,15 +22,17 @@
 ## Branching

 We would maintain two branches: `active_release` and `master`.
-* All new PRs are created and merged against `master`
-* All versions are created from the `active_release` branch
-* Once merged to master, changes are "cherry-picked"  (via a hopefully soon to be automated process), to the `active_release` branch based on the judgement of the original PR author and maintainers.

-* We do not merge breaking api changes, as defined in [Rust RFC 1105](https://github.com/rust-lang/rfcs/blob/master/text/1105-api-evolution.md) to the `active_release`
+- All new PRs are created and merged against `master`
+- All versions are created from the `active_release` branch
+- Once merged to master, changes are "cherry-picked" (via a hopefully soon to be automated process), to the `active_release` branch based on the judgement of the original PR author and maintainers.
+
+- We do not merge breaking api changes, as defined in [Rust RFC 1105](https://github.com/rust-lang/rfcs/blob/master/text/1105-api-evolution.md) to the `active_release`

 Please see the [original proposal](https://docs.google.com/document/d/1tMQ67iu8XyGGZuj--h9WQYB9inCk6c2sL_4xMTwENGc/edit?ts=60961758) document the rational of this change.

 ## Release Branching
+
 We aim to release every other week from the `active_release` branch.

 Every other Monday, a maintainer proposes a minor (e.g. `4.1.0` to `4.2.0`) or patch (e.g `4.1.0` to `4.1.1`) release, depending on changes to the `active_release` in the previous 2 weeks, following the process beloe.
@@ -44,6 +46,7 @@ Apache Arrow in general does synchronized major releases every three months. The
 This directory contains the scripts used to manage an Apache Arrow Release.

 # Process Overview
+
 As part of the Apache governance model, official releases consist of
 signed source tarballs approved by the PMC.

@@ -52,7 +55,6 @@ crates.io, the Rust ecosystem's package manager.

 ## Branching

-
 # Release Preparation

 # Change Log
@@ -65,16 +67,13 @@ The CHANGELOG is created automatically using
 This script creates a changelog using github issues and the
 labels associated with them.

-
-
-
 # Mechanics of creating a release

 ## Prepare the release branch and tags

 First, ensure that `active_release` contains the content of the desired release. For minor and patch releases, no additional steps are needed.

-To prepare for *a major release*, change `active release` to point at the latest `master` with commands such as:
+To prepare for _a major release_, change `active release` to point at the latest `master` with commands such as:

 ```
 git checkout active_release
@@ -111,7 +110,6 @@ Note that when reviewing the change log, rather than editing the
 `CHANGELOG.md`, it is preferred to update the issues and their labels
 (e.g. add `invalid` label to exclude them from release notes)

-
 ## Prepare release candidate tarball

 (Note you need to be a committer to run these scripts as they upload to the apache svn distribution servers)
@@ -135,7 +133,7 @@ Pick numbers in sequential order, with `0` for `rc1`, `1` for `rc1`, etc.

 ### Create, sign, and upload tarball

-Run the  `create-tarball.sh` with the  `<version>` tag and `<rc>` and you found in previous steps:
+Run the `create-tarball.sh` with the `<version>` tag and `<rc>` and you found in previous steps:

 ```shell
 ./dev/release/create-tarball.sh 4.1.0 2
@@ -144,12 +142,11 @@ Run the  `create-tarball.sh` with the  `<version>` tag and `<rc>` and you found
 This script

 1. creates and uploads a release candidate tarball to the [arrow
-dev](https://dist.apache.org/repos/dist/dev/arrow) location on the
-apache distribution svn server
+   dev](https://dist.apache.org/repos/dist/dev/arrow) location on the
+   apache distribution svn server

 2. provide you an email template to
-send to dev@arrow.apache.org for release voting.
-
+   send to dev@arrow.apache.org for release voting.

 ### Vote on Release Candidate tarball

@@ -185,7 +182,6 @@ The vote will be open for at least 72 hours.

 For the release to become "official" it needs at least three PMC members to vote +1 on it.

-
 #### Verifying Release Candidates

 There is a script in this repository which can be used to help `dev/release/verify-release-candidate.sh` assist the verification process. Run it like:
@@ -194,12 +190,10 @@ There is a script in this repository which can be used to help `dev/release/veri
 ./dev/release/verify-release-candidate.sh 4.1.0 2
 ```

-
 #### If the release is not approved

 If the release is not approved, fix whatever the problem is and try again with the next RC number

-
 ### If the release is approved,

 Move tarball to the release location in SVN, e.g. https://dist.apache.org/repos/dist/release/arrow/arrow-4.1.0/, using the `release-tarball.sh` script:
@@ -225,7 +219,7 @@ of the [arrow crate](https://crates.io/crates/arrow).
 Download and unpack the official release tarball

 Verify that the Cargo.toml in the tarball contains the correct version
-(e.g.  `version = "0.11.0"`) and then publish the crate with the
+(e.g. `version = "0.11.0"`) and then publish the crate with the
 following commands

 ```shell
@@ -247,8 +241,6 @@ Step 3a: If CI passes, merge cherry-pick PR

 Step 3b: If CI doesn't pass or some other changes are needed, the PR should be reviewed / approved as normal prior to merge

-
-
 For example, to backport `b2de5446cc1e45a0559fb39039d0545df1ac0d26` to active_release use the folliwing

 ```shell
@@ -258,6 +250,7 @@ ARROW_GITHUB_API_TOKEN=$ARROW_GITHUB_API_TOKEN CHECKOUT_ROOT=/tmp/arrow-rs CHERR
 ```

 ## Rationale for creating PRs:
+
 1. PRs are a natural place to run the CI tests to make sure there are no logical conflicts
 2. PRs offer a place for the original author / committers to comment and say it should/should not be backported.
 3. PRs offer a way to make cleanups / fixups and approve (if needed) for non cherry pick PRs
@@ -23,8 +23,8 @@ See [Integration.rst](../../docs/source/format/Integration.rst) for an overview

 This crate contains the following binaries, which are invoked by Archery during integration testing with other Arrow implementations.

-| Binary | Purpose |
-|--------|---------|
-| arrow-file-to-stream | Converts an Arrow file to an Arrow stream |
-| arrow-stream-to-file | Converts an Arrow stream to an Arrow file |
-| arrow-json-integration-test | Converts between Arrow and JSON formats |
+| Binary                      | Purpose                                   |
+| --------------------------- | ----------------------------------------- |
+| arrow-file-to-stream        | Converts an Arrow file to an Arrow stream |
+| arrow-stream-to-file        | Converts an Arrow stream to an Arrow file |
+| arrow-json-integration-test | Converts between Arrow and JSON formats   |
@@ -76,23 +76,23 @@ version is available. Then simply update version of `parquet-format` crate in Ca

 ## Features

- [X] All encodings supported
- [X] All compression codecs supported
- [X] Read support
-  - [X] Primitive column value readers
-  - [X] Row record reader
-  - [X] Arrow record reader
+- [x] All encodings supported
+- [x] All compression codecs supported
+- [x] Read support
+  - [x] Primitive column value readers
+  - [x] Row record reader
+  - [x] Arrow record reader
 - [ ] Statistics support
- [X] Write support
-  - [X] Primitive column value writers
+- [x] Write support
+  - [x] Primitive column value writers
  - [ ] Row record writer
-  - [X] Arrow record writer
+  - [x] Arrow record writer
 - [ ] Predicate pushdown
- [X] Parquet format 2.6.0 support
+- [x] Parquet format 2.6.0 support

 ## Requirements

-Parquet requires LLVM.  Our windows CI image includes LLVM but to build the libraries locally windows
+Parquet requires LLVM. Our windows CI image includes LLVM but to build the libraries locally windows
 users will have to install LLVM. Follow [this](https://github.com/appveyor/ci/issues/2651) link for info.

 ## Build
@@ -109,18 +109,19 @@ Run `cargo test` for unit tests. To also run tests related to the binaries, use
 ## Binaries

 The following binaries are provided (use `cargo install --features cli` to install them):
+
 - **parquet-schema** for printing Parquet file schema and metadata.
-`Usage: parquet-schema <file-path>`, where `file-path` is the path to a Parquet file. Use `-v/--verbose` flag
-to print full metadata or schema only (when not specified only schema will be printed).
+  `Usage: parquet-schema <file-path>`, where `file-path` is the path to a Parquet file. Use `-v/--verbose` flag
+  to print full metadata or schema only (when not specified only schema will be printed).

 - **parquet-read** for reading records from a Parquet file.
-`Usage: parquet-read <file-path> [num-records]`, where `file-path` is the path to a Parquet file,
-and `num-records` is the number of records to read from a file (when not specified all records will
-be printed). Use `-j/--json` to print records in JSON lines format.
+  `Usage: parquet-read <file-path> [num-records]`, where `file-path` is the path to a Parquet file,
+  and `num-records` is the number of records to read from a file (when not specified all records will
+  be printed). Use `-j/--json` to print records in JSON lines format.

 - **parquet-rowcount** for reporting the number of records in one or more Parquet files.
-`Usage: parquet-rowcount <file-paths>...`, where `<file-paths>...` is a space separated list of one or more
-files to read.
+  `Usage: parquet-rowcount <file-paths>...`, where `<file-paths>...` is a space separated list of one or more
+  files to read.

 If you see `Library not loaded` error, please make sure `LD_LIBRARY_PATH` is set properly: