Extends avo to support most AVX-512 instruction sets.
The instruction type is extended to support suffixes. The K family of opmask
registers is added to the register package, and the operand package is updated
to support the new operand types. Move instruction deduction in `Load` and
`Store` is extended to support KMOV* and VMOV* forms.
Internal code generation packages were overhauled. Instruction database loading
required various messy changes to account for the additional complexities of the
AVX-512 instruction sets. The internal/api package was added to introduce a
separation between instruction forms in the database, and the functions avo
provides to create them. This was required since with instruction suffixes there
is no longer a one-to-one mapping between instruction constructors and opcodes.
AVX-512 bloated generated source code size substantially, initially increasing
compilation and CI test times to an unacceptable level. Two changes were made to
address this:
1. Instruction constructors in the `x86` package moved to an optab-based
approach. This compiles substantially faster than the verbose code
generation we had before.
2. The most verbose code-generated tests are moved under build tags and
limited to a stress test mode. Stress test builds are run on
schedule but not in regular CI.
An example of AVX-512 accelerated 16-lane MD5 is provided to demonstrate and
test the new functionality.
Updates #20#163#229
Co-authored-by: Vaughn Iverson <vsivsi@yahoo.com>
Adds a test for function signature memory layout by generating functions with
random signatures. This confirms that the size and offsets computed by
`gotypes` agree with `asmdecl`.
Updates #191#195
This fixes a bug in argument size calculation in the case where the function
has no return values. Previously it was padding the argument struct to max
alignment, but this only happens if there are return values following.
Updates #191
As part of fixing failing third-party tests, this PR significantly
rearchitects their specification and execution.
Third-party tests are now specified in a much more flexible format allowing
more customization on a per-package level. In addition, third-party tests are
now used to auto-generate a Github Actions workflow to perform the tests in
parallel. This not only gives faster feedback on PRs, but will also allow us
to more quickly narrow down on which packages are failing. An additional
workflow also confirms that local execution of third-party tests is consistent
with the Github Actions version. This workflow only runs when tests/thirdparty
itself is changed.
Update github.com/klauspost/compress version.
Add github.com/klauspost/reedsolomon and github.com/minio/md5-simd.
Add `-mod=mod` to commands due to golang/go#44129.
Currently `avo` uses `BP` as a standard general-purpose register. However, `BP` is used for the frame pointer and should be callee-save. Under some circumstances, the Go assembler will do this automatically, but not always. At the moment `avo` can produce code that clobbers the `BP` register. Since Go 1.16 this code will also fail a new `go vet` check.
This PR provides a (currently sub-optimal) fix for the issue. It introduces an `EnsureBasePointerCalleeSaved` pass which will check if the base pointer is written to by a function, and if so will artificially ensure that the function has a non-zero frame size. This will trigger the Go assembler to automatically save and restore the BP register.
In addition, we update the `asmdecl` tool to `asmvet`, which includes the `framepointer` vet check.
Updates #156
Adds @klauspost's S2 implementation to the third party test suite.
The full klauspost/compress tests are slow but we only care about the S2 sub-package. Therefore this PR also adds the option to only run a subset of the package tests, controlled by a "test" parameter in the JSON configuration.
Closes#130
Adds a regression test based on klauspost/compress#186. This necessitated some related changes:
* Mark "RET" as a terminal instruction
* printer refactor to maintain compatibility with asmfmt
* Tweaks to other regression tests to ensure they are run correctly in CI
Updates #100#65#8
Issue #100 demonstrated that register allocation for aliased registers is
fundamentally broken. The root of the issue is that currently accesses to the
same virtual register with different masks are treated as different registers.
This PR takes a different approach:
* Liveness analysis is masked: we now properly consider which parts of a register are live
* Register allocation produces a mapping from virtual to physical ID, and aliasing is applied later
In addition, a new pass ZeroExtend32BitOutputs accounts for the fact that 32-bit writes in 64-bit mode should actually be treated as 64-bit writes (the result is zero-extended).
Closes#100
Go added the TOPFRAME attribute in https://golang.org/cl/169726/. This diff adds the new attribute to avo, and also updates handling of the REFLECTMETHOD attribute.
Adds support for a `CancellingInputs` instruction flag, to indicate cases like `XORQ R10, R10` where the instruction actually does not depend on the value of `R10` at all.
Closes#89
In jump-table-like constructs, the natural way of writing the code can sometimes produce redundant jumps or labels. Therefore some basic cleanup steps have been proposed. This diff adds two transforms:
1. Remove unconditional jumps to a label immediately following.
2. Remove labels with no references at all.
Fixes#75
In some cases natural use of abstraction in avo programs can lead to redundant move instructions, specifically self-moves such as MOVQ AX, AX. This does not produce incorrect code but it is incorrect and inelegant.
This diff introduces a PruneSelfMoves pass that removes such unnecessary instructions.
Closes#76
The Go assembler merges MOVD/MOVQ instruction forms. The logic in the
avo instruction loader was discarding the MOVD forms. This diff should
merge them correctly.
Updates #50
Run asmfmt suring linting and confirm git repository isn't dirty.
This introduces a developer tools dependency on asmfmt, but not a
runtime dependency.
Updates #8