Bittensor In Go

Posted 3/18/2024

Introduction

By rewriting our backend in go, we're able to save multiple seconds off of our request times on Sybil. Rewriting a dendrite without the bittensor library is not easy, and at times felt impossible. This can be made much simpler with standardization, language-agnostic patterns, and clear documentation with easy-to-understand and non-esoteric naming conventions.

Bittensor is a powerful platform that seems to be limiting its growth by forcing applications to be locked into one stack. What we have done at manifold labs as far as we know has never been done before and is a huge step for the community- by de-mystifying the internals of Bittensor and making it possible for more people to try and make their own Bittensor interfaces in more and more languages.

What the hell is a synapse?

The terminology in the bittensor community, to the demise of anyone not in the bittensor community, mimics that of the brain. There are 3 main parts of this system that we will focus on; Dendrites, Axons, and Synapses.

Dendrites are applications that send requests to axons, axons are applications that receive requests, and synapses are the requests themselves. In the case of our subnet (4) our dendrites (senders) send out inference requests (synapses) to axons (applications that do the inference, also called miners). The language used in the bittensor community can be very complex, but in reality, these are simple and standard concepts.

For the rest of this article, I will use standard naming conventions that are not uselessly confusing and complex.

What is the problem?

Bittensor itself is a Python library that makes the process of sending requests or creating applications to receive these requests possible. The motivations for rewriting our client were:

There is a lot of bloat in the bittensor library for parts of the ecosystem that our search API is not using. This includes blocking calls to 3rd party APIs that we can skip. This was the largest factor overall.
Python as a language is very heavy. Since we self-host our search API we would like to keep this as small as possible to take advantage of as many resources as we can before we need to scale.
Since we are a search platform, every millisecond counts. Moving to a compiled language will help decrease the latency when processing requests.
As someone new to the ecosystem, this was also a good chance for me to better understand the internals of Bittensor and the ecosystem as a whole

Problems in bit tensor

The most difficult part of this task was reverse engineering the current mechanisms in which senders and receivers communicate. There is currently no standard/language-agnostic format for senders or receivers. This keeps the ecosystem trapped into using Python for almost anyone who wants to build on Bittensor. Note that not only is there not a standard, but a complete disregard for anything that could be considered standard, safe, or reasonable.

Example 1: Nonce Implementation

Nonce is shorthand for Number only used once. We use this to protect against replay attacks. Since receivers in the bittensor network are decentralized, requiring domain names for every receiver would be self-defeating. This forces us to depend on HTTP communication which is prone to many more attacks than HTTPS. One of these problems is called a replay attack. This is when a malicious agent intercepts a message being sent to a miner and sends it again, "replaying" the request.

A nonce is only used once, so sending another request with the same nonce is required to fail. To accomplish this the server holds a dictionary of sender identifiers -> last nonce, and makes sure the next nonce is less than the previous.

# bittensor/axon.py
endpoint_key = f"{synapse.dendrite.hotkey}:{synapse.dendrite.uuid}"

# Check the nonce from the endpoint key.
if (
    endpoint_key in self.nonces.keys()
    and self.nonces[endpoint_key] is not None
    and synapse.dendrite.nonce is not None
    and synapse.dendrite.nonce <= self.nonces[endpoint_key]
):
    raise Exception("Nonce is too small")

This problem here is that nonce's are held in memory. If the server restarts then there is no nonce held in memory and therefore a duplicate request can be freely sent by a malicious user.

To solve this, receivers should both keep the last nonce in memory and require nonces to be UNIX timestamps with a pre-determined delta to the current time. A delta of 4 seconds was chosen since miners generally take a few seconds to restart & requests should be able to reach an axon sent from a dendrite and start the verification process of the request within 4 seconds including network latency. This way if an attacker attempts to replay a message after the receiver re-starts the replayed nonce time stamp will be too far behind the delta to the current time and be rendered invalid.

The PR for this fix can be found here.

Example 2: Signature

The message signature is used to verify that the sender has access to the private key associated with their account proving that they are allowed to send the request.

# bittensor/dendrite.py
message = f"{synapse.dendrite.nonce}.{synapse.dendrite.hotkey}.{synapse.axon.hotkey}.{synapse.dendrite.uuid}.{synapse.body_hash}"
synapse.dendrite.signature = f"0x{self.keypair.sign(message).hex()}"

This works well, but the problem arises with re-creating the body_hash.

# bittensor/synapse.py
def body_hash(self) -> str:
    ...
    hashes = []
    # Getting the fields of the instance
    instance_fields = self.dict()

    for field, value in instance_fields.items():
        # If the field is required in the subclass schema, hash and add it.
        if field in self.required_hash_fields:
            hashes.append(bittensor.utils.hash(str(value)))

    # Hash and return the hashes that have been concatenated
    return bittensor.utils.hash("".join(hashes))

The str function in Python is unique to how Python interprets objects. In our case, we pass a list of sources (urls) to be part of the body hash since these are part of what makes a request unique.

Strings in Python are represented with double quotes or single quotes. When building a string representation of a list of strings python will use the following pseudo algorithm to determine which quote to use.

surrounding = '
if input contians '
    surrounding = "
if input contains "
    surrounding = '

Python will also escape any instances of the surrounding character that are inside the string. Repeat this process for all strings inside the list and then surround the results with brackets, and join them with commas and you have the string representation of a list of strings. The problem is that recreating this process in a language other than Python is in no way obvious or intuitive.

The resulting go code for converting a list of strings to a Python string representation in go:

func formatListToPythonString(list []string) string {
	strList := "["
	for i, element := range list {
		element = strconv.Quote(element)
		element = strings.TrimPrefix(element, "\"")
		element = strings.TrimSuffix(element, "\"")
		separator := "'"
		if strings.ContainsRune(element, '\'') && !strings.ContainsRune(element, '"') {
			separator = "\""
		} else {
			element = strings.ReplaceAll(element, "'", "\\'")
			element = strings.ReplaceAll(element, "\\\"", "\"")
		}
		if i != 0 {
			strList += ", "
		}
		strList += separator + element + separator
	}
	strList += "]"
	return strList
}

The other hard part of recreating this signature in a different language is the actual signing algorithm called sr25519. In Bittensor the Python code is deceiving simple as shown above. In reality, there is a custom-made rust binding Bittensor uses to sign the message that has to be re-implemented in go. There is no documentation on how this signing process works, or how it is implemented in general, everything must be reverse-engineered

func signMessage(message string, public string, private string) string {
	var pubk [32]byte
	data, err := hex.DecodeString(public)
	if err != nil {
		log.Fatalf("Failed to decode public key: %s", err)
	}
	copy(pubk[:], data)

	var prik [32]byte
	data, err = hex.DecodeString(private)
	if err != nil {
		log.Fatalf("Failed to decode private key: %s", err)
	}
	copy(prik[:], data)

	msg := []byte(message)
	priv := schnorrkel.SecretKey{}
	priv.Decode(prik)
	pub := schnorrkel.PublicKey{}
	pub.Decode(pubk)
	signingCtx := []byte("substrate")
	signingTranscript := schnorrkel.NewSigningContext(signingCtx, msg)
	sig, _ := priv.Sign(signingTranscript)
	sigEncode := sig.Encode()
	out := hex.EncodeToString(sigEncode[:])
	return "0x" + out
}

NOTE: The signature was one of the most challenging parts in reverse Engineering a dendrite. From digging into the sr25519 rust bindings, to Re-building the body_hash and string representation of a list of strings; Nothing is standard or documented

Example 3: Dangerous Headers

When building a request the request will self-report the "size" of its headers and total size. Firstly, these have very little correlation to the actual size of the headers. Both of these functions return the size in memory of the internal Python objects. Secondly, I will let the user derive why self-reporting the size of a request is problematic

# bittensor/synapse.py
headers["header_size"] = str(sys.getsizeof(headers))
headers["total_size"] = str(self.get_total_size())

Example 4: Unused / Unset Fields

Many fields in a request are left blank, but required on the receiver side. In a Go implementation for example, only 3 fields need to be filled in order for a request to work.

Axon: DendriteOrAxon{
    StatusCode:    nil,
    StatusMessage: nil,
    ProcessTime:   nil,
    Version:       nil,
    Nonce:         nil,
    Uuid:          nil,
    Signature:     nil,
    Ip:            miner.Ip,
    Port:          &port,
    Hotkey:        miner.Hotkey,
},

Building A Request

Once we figured out what needed to be in the request to a receiver, now we needed to build the request. This is semi-trivial and involves printing the headers and body of the request right before it gets sent out by the sender by patching the Bittensor library with print statements. Once you have the actual JSON and headers visible, most of the fields are self-explanatory and trivial to implement other than those outlined here.

Once you have the shape of the body and all of the data required to fill it, sending and streaming a request becomes fairly simple

reader := bufio.NewReader(res.Body)
finished = false
for {
    // Read word by word
    token, err := reader.ReadString(' ')

    // Parse out end tokens
    if strings.Contains(token, "<s>") || strings.Contains(token, "</s>") || strings.Contains(token, "<im_end>") {
        finished = true
        token = strings.ReplaceAll(token, "<s>", "")
        token = strings.ReplaceAll(token, "</s>", "")
        token = strings.ReplaceAll(token, "<im_end>", "")
    }

    // Keep a full copy of the response
    ans += token

    // end early if we find an error
    if err != nil && err != io.EOF {
        break
    }

    // send tokens back
    sendToken(tokens)

    // If we finished reading, break
    if err == io.EOF {
        break
    }
}

Conclusion

Rewriting our backend in Go not only optimized request times but also uncovered opportunities to enhance Bittensor's usability and scalability. By addressing challenges and sharing our insights, we aim to foster a more accessible and inclusive ecosystem for Bittensor developers across various languages.