High-level Golang x86 Assembly Generator
avo aims to make high-performance Go assembly easier to write, review and maintain. It's a Go package that presents a familiar assembly-like interface, together with features to simplify development without sacrificing performance:
avoprograms are Go programs: use control structures for assembly generation- Register allocation: write your kernels with virtual registers and
avoassigns physical registers for you - Automatic parameter load/stores: ensure memory offsets are always correct even for complex data structures
- Generation of stub files to interface with your Go package
Inspired by the PeachPy and asmjit projects.
Note: APIs subject to change while avo is still in an experimental phase. You can use it to build real things but we suggest you pin a version with your package manager of choice.
Install
Install avo with go get:
$ go get -u github.com/mmcloughlin/avo
Quick Start
avo assembly generators are pure Go programs. Let's get started with a function that adds two uint64 values.
// +build ignore
package main
import (
. "github.com/mmcloughlin/avo/build"
)
func main() {
TEXT("Add", "func(x, y uint64) uint64")
Doc("Add adds x and y.")
x := Load(Param("x"), GP64())
y := Load(Param("y"), GP64())
ADDQ(x, y)
Store(y, ReturnIndex(0))
RET()
Generate()
}
You can go run this code to see the assembly output. To integrate this into the rest of your Go package we recommend a go:generate line to produce the assembly and the corresponding Go stub file.
//go:generate go run asm.go -out add.s -stubs stub.go
After running go generate the add.s file will contain the Go assembly.
// Code generated by command: go run asm.go -out add.s -stubs stub.go. DO NOT EDIT.
// func Add(x uint64, y uint64) uint64
TEXT ·Add(SB), $0-24
MOVQ x(FP), AX
MOVQ y+8(FP), CX
ADDQ AX, CX
MOVQ CX, ret+16(FP)
RET
The same call will produce the stub file stub.go which will enable the function to be called from your Go code.
// Code generated by command: go run asm.go -out add.s -stubs stub.go. DO NOT EDIT.
package add
// Add adds x and y.
func Add(x uint64, y uint64) uint64
See the examples/add directory for the complete working example.
Examples
See examples for the full suite of examples.
Slice Sum
Sum a slice of uint64s:
func main() {
TEXT("Sum", "func(xs []uint64) uint64")
Doc("Sum returns the sum of the elements in xs.")
ptr := Load(Param("xs").Base(), GP64())
n := Load(Param("xs").Len(), GP64())
s := GP64()
XORQ(s, s)
Label("loop")
CMPQ(n, Imm(0))
JE(LabelRef("done"))
ADDQ(Mem{Base: ptr}, s)
ADDQ(Imm(8), ptr)
DECQ(n)
JMP(LabelRef("loop"))
Label("done")
Store(s, ReturnIndex(0))
RET()
Generate()
}
Parameter Load/Store
avo provides deconstruction of complex data datatypes into components. For example, load the length of a string argument with:
TEXT("StringLen", "func(s string) int")
strlen := Load(Param("s").Len(), GP64())
Index an array:
TEXT("ArrayThree", "func(a [7]uint64) uint64")
a3 := Load(Param("a").Index(3), GP64())
Access a struct field (provided you have loaded your package with the Package function):
TEXT("FieldFloat64", "func(s Struct) float64")
f64 := Load(Param("s").Field("Float64"), XMM())
Component accesses can be arbitrarily nested:
TEXT("FieldArrayTwoBTwo", "func(s Struct) byte")
b2 := Load(Param("s").Field("Array").Index(2).Field("B").Index(2), GP8())
Very similar techniques apply to writing return values. See examples/args and examples/returns for more.
SHA-1
SHA-1 is an excellent example of how powerful this kind of technique can be. The following is a (hopefully) clearly structured implementation of SHA-1 in avo, which ultimately generates a 1000+ line impenetrable assembly file.
func main() {
TEXT("block", "func(h *[5]uint32, m []byte)")
Doc("block SHA-1 hashes the 64-byte message m into the running state h.")
h := Mem{Base: Load(Param("h"), GP64())}
m := Mem{Base: Load(Param("m").Base(), GP64())}
// Store message values on the stack.
w := AllocLocal(64)
W := func(r int) Mem { return w.Offset((r % 16) * 4) }
// Load initial hash.
h0, h1, h2, h3, h4 := GP32(), GP32(), GP32(), GP32(), GP32()
MOVL(h.Offset(0), h0)
MOVL(h.Offset(4), h1)
MOVL(h.Offset(8), h2)
MOVL(h.Offset(12), h3)
MOVL(h.Offset(16), h4)
// Initialize registers.
a, b, c, d, e := GP32(), GP32(), GP32(), GP32(), GP32()
MOVL(h0, a)
MOVL(h1, b)
MOVL(h2, c)
MOVL(h3, d)
MOVL(h4, e)
// Generate round updates.
quarter := []struct {
F func(Register, Register, Register) Register
K uint32
}{
{choose, 0x5a827999},
{xor, 0x6ed9eba1},
{majority, 0x8f1bbcdc},
{xor, 0xca62c1d6},
}
for r := 0; r < 80; r++ {
q := quarter[r/20]
// Load message value.
u := GP32()
if r < 16 {
MOVL(m.Offset(4*r), u)
BSWAPL(u)
} else {
MOVL(W(r-3), u)
XORL(W(r-8), u)
XORL(W(r-14), u)
XORL(W(r-16), u)
ROLL(U8(1), u)
}
MOVL(u, W(r))
// Compute the next state register.
t := GP32()
MOVL(a, t)
ROLL(U8(5), t)
ADDL(q.F(b, c, d), t)
ADDL(e, t)
ADDL(U32(q.K), t)
ADDL(u, t)
// Update registers.
ROLL(Imm(30), b)
a, b, c, d, e = t, a, b, c, d
}
// Final add.
ADDL(a, h0)
ADDL(b, h1)
ADDL(c, h2)
ADDL(d, h3)
ADDL(e, h4)
// Store results back.
MOVL(h0, h.Offset(0))
MOVL(h1, h.Offset(4))
MOVL(h2, h.Offset(8))
MOVL(h3, h.Offset(12))
MOVL(h4, h.Offset(16))
RET()
Generate()
}
This relies on the bitwise functions that are defined as subroutines. For example here is bitwise choose; the others are similar.
func choose(b, c, d Register) Register {
r := GP32()
MOVL(d, r)
XORL(c, r)
ANDL(b, r)
XORL(d, r)
return r
}
See the complete code at examples/sha1.
Real Examples
- fnv1a: FNV-1a hash function.
- dot: Vector dot product.
- geohash: Integer geohash encoding.
- stadtx:
StadtXhash port from dgryski/go-stadtx.
Contributing
Contributions to avo are welcome:
- Feedback from using
avoin a real project is incredibly valuable. - Submit bug reports to the issues page.
- Pull requests accepted. Take a look at outstanding issues for ideas (especially the "good first issue" label).
License
avo is available under the BSD 3-Clause License.