Supporting extra instructions not included in the Opcodes database is
currently a challenge. Short of migrating to an entirely different source
(such as #23), the options are either to patch the XML data file or to append
additional instructions at the loading phase.
An example of patching the XML was shown in the as-yet unlanded PR #234. This
shows the XML patching approach is unwieldy and requires more information than
we actually need (for example instruction form encodings).
In #335 we discussed the alternative of adding extra instructions during
loading. This has the advantage of using avo's simpler internal data
structure.
This PR prepares for using that approach by adding an `internal/opcodesextra`
package, intended to contain manually curated lists of extra instructions to
add to the instruction database during loading. At the moment, the only
instruction added here is the `MOVLQZX` instruction that's already handled
this way.
Updates #335#234#23
Extends avo to support most AVX-512 instruction sets.
The instruction type is extended to support suffixes. The K family of opmask
registers is added to the register package, and the operand package is updated
to support the new operand types. Move instruction deduction in `Load` and
`Store` is extended to support KMOV* and VMOV* forms.
Internal code generation packages were overhauled. Instruction database loading
required various messy changes to account for the additional complexities of the
AVX-512 instruction sets. The internal/api package was added to introduce a
separation between instruction forms in the database, and the functions avo
provides to create them. This was required since with instruction suffixes there
is no longer a one-to-one mapping between instruction constructors and opcodes.
AVX-512 bloated generated source code size substantially, initially increasing
compilation and CI test times to an unacceptable level. Two changes were made to
address this:
1. Instruction constructors in the `x86` package moved to an optab-based
approach. This compiles substantially faster than the verbose code
generation we had before.
2. The most verbose code-generated tests are moved under build tags and
limited to a stress test mode. Stress test builds are run on
schedule but not in regular CI.
An example of AVX-512 accelerated 16-lane MD5 is provided to demonstrate and
test the new functionality.
Updates #20#163#229
Co-authored-by: Vaughn Iverson <vsivsi@yahoo.com>
Previously aliases were stored in a map which was causing
non-deterministic code generation (see recent build failures). This diff
changes to a slice to avoid this problem.
Updates #50