How to shard your TON smart contract and why — studying the anatomy of TON's Jettons
How to shard your TON smart contract and why — studying the anatomy of TON's Jettons
by Tal Kol
Aug 16, 2022
TON's ultimate mission is to bring blockchain to mass adoption. How many people in the world have actually used blockchain so far? Ethereum's statistics mention several millions, so taking a figure of 50 million global users for blockchain so far is probably being generous. How do we increase this number to 1 billion?
The current version of Ethereum processes about 1 million transactions per day at peak capacity. Back in 2016, Telegram, a messaging app with mass adoption nearing the numbers we're aiming for, was delivering 15 billion messages per day. This massive amount of data led to the architecture design choices employed in the design of TON blockchain. Increasing scalability of systems x10000 is not normally achieved by mere protocol improvements, this feat requires a radical change in approach.
The concept of sharding
Sharding is a mature concept originating in database design. It involves splitting and distributing one logical data set across multiple databases that share nothing and can be deployed across multiple servers. Simply put, sharding allows *horizontal scalability* - splitting data to distinct, independent pieces that can be processed in parallel. This is a key concept in the world's transition from data to big data. When data sets are becoming too big to be handled by traditional means, there's no other way to scale other than to break them down into smaller pieces.

TON is not the first to apply sharding to blockchain. Ethereum 2.0 supports a fixed number of 64 shards. TON's approach is radical not because the number of shards is larger, but because of two unique conceptual changes:

  • The number of shards is not fixed - TON supports adding more and more shards as needed with an upper bound of 2^60 (per workchain). This number is practically limitless, enough for every person in the world to be allotted 100 million shards, and still have spare.
  • The number of shards is elastic - TON supports automatic splitting of shardchains in two when the load is high, and then merging them back together when the load is low. This is the only way to deal with dynamic scaling requirements that are impossible to predict in advance.
You can read more about these novel ideas in the TON whitepaper.

Trying to change the world in a fundamental way rarely comes without a price. To make use of this radical approach, TON smart contract developers must design their contracts differently. If you have prior smart contract experience, for example in Solidity, TON's architecture will feel alien. I recommend reading a previous post of mine, Six unique aspects of TON Blockchain that will surprise Solidity developers, to ease the transition.
Sharding TON smart contracts
The basic atomic unit in TON blockchain is a smart contract instance. A smart contract instance has an address, code and data cells (persistent state). We refer to this unit as atomic because a smart contract always has atomic synchronous access to all of its persistent state.

Communication between smart contract instances on TON is neither atomic nor synchronous. Think of smart contracts on TON like microservices. Each microservice has atomic synchronous access only to its local data. Communication between two microservices involves sending asynchronous messages over the network.

As every system architect knows, larger systems require a shift from monolithic architectures to microservices. This distributed approach takes some effort to adopt, but unlocks several desirable benefits. The modern systems paradigm relies on an orchestrator like Kubernetes to take a group of containerized microservices and launch new instances automatically on demand (autoscale) as well as partition them efficiently across machines.

I like the Kubernetes analogy because this is exactly what TON does. As load on a specific shardchain increases, it will be split in two. Since smart contracts instances are atomic, they're never broken in half. This means that some smart contract instances that once lived on the same shardchain may one day find themselves residing on different ones!
To sum it up, TON's virtual machine (TVM) is applying the concept of distributed microservices to Ethereum EVM's monolith.
Designing smart contracts for sharding
A common question by novice systems architects is "how big should my microservices be?" - or in other words, "when is a microservice too monolithic and should be split in two?"

There is no one answer to this question, it's an art. The idea is to assist Kubernetes in doing its thing. The smaller the microservices are, the easier it is for Kubernetes to optimize the system by creating new instances and moving them around on demand. But, the smaller they are, the harder it is for the developer to implement complex flows as more actions become asynchronous.

I've discovered that the same reasoning works for TON contract sharding. The idea is to let TON auto-sharding do its thing - split state data into multiple smart contract instances so that when load increases, they can be broken down into smaller pieces and moved to different shardchains efficiently. But, if you shard too aggressively, you will have to deal with too much complexity due to increased asynchronicity.
A practical example - TON's Jetton contract
This post has been quite theoretical so far. I want to take a sharp turn to practicalities. Let's analyze a real world example to understand this architecture. The example we'll use is TON's Jetton smart contract. A Jetton is a smart contract implementing a fungible token (very similar to TON coin itself). This is TON's version of Ethereum's popular ERC20 token standard.

Implementing a token is pretty simple. We will need one basic action - transfer - which allows an owner to transfer some token amount to a different owner. We will also need a *mint* action - the ability to add new tokens to circulation, and its opposite *burn* - which removes tokens from circulation. What about persistent state? We also need to store the balances of all users. On Ethereum, this would normally require a map where the key is a user's wallet address and the value is the balance amount.

As the architects of this smart contract on TON, we have to decide if and how this smart contract needs to be broken down to multiple smaller instances to support auto-sharding effectively. What would happen if our Jetton has 1 billion users? Will our architecture hold up in this case?
Distributing Jetton to multiple smart contracts
Let's try to apply the reasoning outlined above to find the "correct" amount of sharding for Jetton. I realize that this is a little too theoretical. Luckily, there's a very practical test that I've found to work quite well:
If you ever find yourself designing a smart contract with an unbounded data structure, there's a good chance you're supposed to break this contract into multiple instances.
Unbounded data structures are arrays or maps that can grow indefinitely. Under Ethereum, our smart contract would require a map that holds all user balances. This map can grow indefinitely because the number of holders of our token is unbounded. New accounts can be created practically indefinitely and because the numerical precision is so high, it is possible to transfer miniscule amounts of the token to all of these accounts.

Let's apply our practical rule. If we were to hold all balances in a single smart contract on TON, we would have an unbounded data structure. This means we have an excellent candidate for sharding!

So how do we shard? This is quite straightforward. If we don't want all balances to be listed in a single smart contract instance, what if we split the list so that every balance is held in its own dedicated smart contract instance?
The Jetton architecture
Let's assume that our Jetton instance is for a token called Shiba-Inu or SHIB for short. We have two users who are holding some SHIB - Alison and Becky. We already said that the balance of each user is held in its own contract instance, which means we have 2 instances (the "children"). It turns out that we also want another instance to hold global shared information about SHIB (the "parent").

This brings us to the following architecture:
I promised to be practical. Let's start reading the actual Jetton code! The TON core team has an official implementation of the Jetton standard which you can find here. Open it up so you can get familiar with the code.

You can see in the code two main FunC smart contracts:

  • jetton-minter.fc - This is the parent, which holds global shared information about the token, such as its name and symbol. There's just a single instance of the parent. I'm not entirely sure why the core team chose the name jetton-minter, I would have preferred the name jetton-parent. It is true that this contract is in charge of minting, but even if minting is disabled, you still need it, which is somewhat confusing.
  • jetton-wallet.fc- This is the child, which holds the token balance for a single user. There are multiple instances of this contract, one for each user address. The core team chose the name jetton-wallet for this contract, I would have preferred the name jetton-child.

If our token is held by 1,000,000 different users, there will be exactly 1,000,001 contract instances deployed. This is where the magic of auto-sharding happens. By default, all contract instances will be found on a single shardchain. But, if these users start issuing a large number of transactions and this single shardchain is under high load, TON will automatically split it into smaller shardchains. Theoretically, the system can keep splitting and splitting until every contract instance is found on a dedicated shard. That's the secret that enables TON to scale to billions of users.
Various Jetton user stories
Now that we understand the basic architecture, let's look at several different scenarios. For example, let's explore what happens when one user transfers tokens to another.

Under TON, the entities participating are always smart contract instances. Which contracts will play a part?
You've already met the first three. The source code for these is found in the Jetton repo. What about the three contracts on the right? Our user stories will involve three different users. Alison and Becky are holders of SHIB. The user Admin is the creator that deployed SHIB. Admin has a special role because it is the only user that can mint new SHIB into circulation (that's how new SHIB tokens are born). This is a trusted role that should normally be revoked (changed to the zero address) once the token starts trading in order to keep the total possible supply capped.

Users on TON are also represented by smart contracts. These are wallet smart contracts that are normally deployed for users by wallet apps such as TonKeeper. If you're not familiar with how wallet contracts work on TON, please read my previous post How TON wallets work and how to access them from JavaScript. Alison, Becky and Admin each hold their TON coin balance in these wallets. These wallets are not specifically related to the Jetton code. Here is an example implementation for such a wallet contract from the core TON repo.
User story 1: Alison has SHIB and sends some to Becky
Our user stories will always start with one of our users (Alison in this case) that decides to perform some action with the SHIB Jetton. In this case, Alison has decided to send some tokens to Becky. Alison will open her wallet app of choice (TonKeeper for example) and approve the action. Once this happens, the wallet app will send a signed transaction to Alison's wallet contract.

The transaction contains a message intended for some destination contract. Messages are how smart contracts communicate on TON. Messages are encoded as a bag of cells, which is in essence a packed binary format. One of the key fields of the message is a 32 bit integer called op which describes the operation type of this message.

  1. In our example, since Alison wants to send some tokens, she sends a message with op type transfer to the smart contract instance holding her SHIB balance. This message is encoded on the transaction she sends to her wallet contract. Once her wallet contract verifies the signature on the transaction [code], it will forward Alison's message to the destination she requested [code].
  2. Once the transfer message reaches its destination [code], the contract holding Alison's SHIB balance, this contract will process the message and alter its persistent state (reduce Alison's SHIB balance by the sent amount [code]). If the contract needs to contact other contracts, it may send additional messages. In our case, the contract will send a message with op type internal transfer to the contract holding Becky's SHIB balance [code].
  3. Once the internal transfer message reaches its destination [code], this contract will now process the message and alter its persistent state (increase Becky's SHIB balance by the sent amount [code]). This contract will normally send one last message with op type excesses back to refund any remaining gas back to Alison's wallet contract and let it know the transfer is complete [code].
This is the flow of messages:
Messages on TON are asynchronous. We don't know exactly when they will be handled. There's a chance all messages will be handled in a single block and there's a chance each message will be handled on a different block. This means that the the transfer may take some time to process. Even if the first transaction has been successfully confirmed, the transfer may still fail.
User story 2: Alison has SHIB and sends some to Becky AND notifies Becky about it
What if the SHIB recipient Becky is more than just a person, it's an online store contract that should do something when being paid? For example, change a DNS record to point to a new owner. It would be nice if we could trigger this smart contract with a dedicated message.

Fortunately, the transfer message supports this behavior. It allows the original sender to specify some notification payload that will be forwarded to the recipient SHIB wallet owner.

  1. The flow in this case is nearly identical, except in the last step. Before sending the message with the op type excesses, the contract holding Becky's SHIB balance will first send a message of op type transfer notification to the owner of Becky's SHIB wallet - Becky's wallet contract [code]. This story would make more sense if you rename "Becky" to an online store like "DNS-Superstore". In that case, the contract "DNS-Superstore" will receive this notification because it is the owner of the SHIB wallet for "DNS-Superstore". This contract, upon receipt of the message, will implement the behavior for changing DNS record ownership according to data provided in the message.
This is the flow of messages:
How can you know what other features are supported by the transfer message? Messages are normally encoded in a language called TL-B. As best practice, the creator of the contract should normally publish the TL-B specification for all messages their contract handles. Here is the relevant TL-B spec [code]:
transfer query_id:uint64 amount:(VarUInteger 16) destination:MsgAddress
           response_destination:MsgAddress custom_payload:(Maybe ^Cell)
           forward_ton_amount:(VarUInteger 16) forward_payload:(Either Cell ^Cell)
           = InternalMsgBody;
  • amount is the number of SHIB tokens to transfer
  • destination is Becky's wallet contract address
  • response_destination is the address for the recipient of excesses (normally Alison's wallet contract)
  • forward_payload is the notification payload for the "DNS-Superstore" use-case
User story 3: Alison has some SHIB and burns it
In this user story, Alison decides to burn some of the SHIB she has. Burning SHIB will remove it from circulation and reduce an interesting metric of a token called total supply. Users care about the total supply of the token because that helps to calculate the token's market cap.

Where is the total supply stored? As you've probably guessed it, since this persistent state data is globally shared, it would make sense to store it under our parent jetton-minter.

  1. To initiate the burn, Alison sends a message with op type burn to the smart contract instance holding her SHIB balance. This message is encoded on the transaction she sends her wallet contract like before, which forwards it to its destination after verifying the signature.
  2. Once the burn message reaches its destination [code], the contract holding Alison's SHIB balance, this contract will process the message and alter its persistent state (reduce Alison's SHIB balance by the burned amount [code]). The contract will then send a message with op type burn notification to the parent minter contract [code].
  3. Once the burn notification message reaches its destination [code], this contract will now process the message and alter its persistent state (reduce the total supply by the burned amount [code]). This contract will normally send one last message with op type excesses back to refund any remaining gas back to Alison's wallet contract and let it know the burn is complete [code].

This is the flow of messages:
The parent minter contract allows users to query the total supply of the token using a Getter method [code].
User story 4: Admin mints SHIB for Becky
When all contracts are initially deployed, the total SHIB supply is zero and nobody has any tokens. How are tokens created? The action of creating new tokens is called minting. It can only be performed by a special admin role - our Admin user. The Admin user may also transfer the admin privilege to any other address. As best practice, before a token starts trading, the admin privilege should be transferred to the zero address to guarantee that nobody can mint new tokens and inflate the total supply.

  1. To initiate a mint, Admin sends a message with op type mint to the parent jetton-minter. This message is encoded on the transaction Admin sends to its wallet contract, which forwards it to its destination after verifying the signature.
  2. Once the mint message reaches its destination [code], the parent minter contract, this contract will process the message and verify the message indeed originated from Admin [code]. Then, the contract will alter its persistent state (increase the total supply by the minted amount [code]). The contract will send a message with op type internal transfer to the contract holding Becky's SHIB balance [code].
  3. Once the internal transfer message reaches its destination [code], this contract will now process the message and alter its persistent state (increase Becky's SHIB balance by the minted amount [code]). This contract will normally send one last message with op type excesses back to refund any remaining gas back to Admin's wallet contract and let it know the mint is complete [code].

This is the flow of messages:

The last step of this flow is almost identical to the transfer flow. Similar to the transfer flow, it is also possible to notify the SHIB recipient with a dedicated message so they can handle the payment - remember our "DNS-Superstore" example? I won't add another full user story for this case, but here is the flow of messages just in case:
Who deploys the child contracts?
Let's recall the SHIB contract architecture - one deployed instance of the jetton-minter parent and one deployed instance per SHIB holder of jetton-wallet contract:
The parent minter contract is naturally deployed by SHIB's creator, probably the Admin user. But what about the child contracts, who deploys them? The design is efficient - a child contract is deployed only when its owner receives SHIB for the first time. This may sound a bit tricky, because the recipient is not necessarily aware that they were sent any SHIB.

If you recall the transfer user story above, receiving SHIB is triggered by the internal transfer message. If the recipient child contract of this message has never been deployed, the sender of this message will have to deploy the child! You can see this happening in the code here. The state_init section of the message is actually responsible for the deployment. You can see it calculated here from the initial code cell of the child (the compiled TVM bytecode for this contract implementation) and its initial data cell.

Since the sender of the internal transfer message is never sure if the recipient is deployed or not, it *always* includes the deployment part. TON is clever enough to ignore the deployment element if the contract has previously been deployed.
Authenticating the messages between parent and children
In the user stories above, we saw that a full flow is distributed over multiple messages. The message internal transfer, for example, causes its recipient to increase their SHIB balance (as you recall, it was sent at the end of the transfer flow). What would happen if an attacker tries to forge this message and send it to a contract holding their own SHIB balance? If we're not careful, such forgery would result in the attacker's ability to generate new tokens for themselves from thin air!

To secure the contract against this forgery, we would need to authenticate that these critical messages that change balances indeed originate from a valid sender. You can see the validation code here - the contract will only handle a message if it was sent by the minter parent (labeled jetton_master_address for some reason) or by one of the valid children.

This leads to a very interesting question - how can we tell if some random address is a valid child jetton address? Wait a minute, earlier when Alison wanted to send a message to Becky, how did she know Becky's contract address in the first place?

This, again, is a beautiful system design - the addresses for smart contracts on TON are derived from the initial code cell of the contract (the compiled TVM bytecode of its implementation) and the initial data cell of the contract (its initial persistent state upon construction). If we know these two values, we can calculate the address of a contract even before it is deployed. This calculation is deterministic and immutable.

The Jetton code contains a utility function that calculates a child's address - meaning the address of the contract holding Alison's SHIB balance - based on Alison's address. You can see this function here. As you can see, it indeed depends on the initial [code cell] of the child and its initial [data cell].

It's a little tricky to understand why this mechanism is secure. What prevents an attacker from somehow deploying their malicious contract in the address of one of the legal children? To land on a legal address, the malicious contract would have to have the initial code cell of the official children - this has already limited the attacker's ability to add malicious code to this implementation. In addition, the initial data cell guarantees that the child will only obey the correct minter parent since the initial data cell contains its address.
Handling partial changes when things go wrong
As you recall, the transfer flow is distributed over multiple asynchronous messages. The first message reduces the sender's SHIB balance and the second message increases the recipient's SHIB balance. This makes sense in a happy flow when everything goes well, but what happens if the second message somehow fails?

Most smart contract machines, like Ethereum's EVM, process transactions in a fully atomic and synchronous manner - so if one of the later stages fails, the entire transaction reverts and all state changes caused by this transaction are reverted as well. This mechanism is indeed very easy to comprehend. Unfortunately, since messages in TON are neither atomic nor synchronous, we don't get this automatic revert out of the box.

So what can we do? We will need to handle the revert flow ourselves. This is an example of what makes smart contract development on TON a little more difficult.

When the handling of a message on TON fails due to any thrown exception, if this message's bounce flag is set, the system will automatically send the failed message back to the sender with the bounced flag set. You can read the spec for this message bouncing mechanism here.

Let's return to the example above - failure of the second message in the transfer flow. This message fails after the SHIB sender has reduced their SHIB balance by the sent amount. To keep the system consistent, we would somehow need to undo this reduction on failure. How would that work? Assuming the second message was sent with the bounce flag set, we can undo the sender's reduction when a bounced second message is received. You can see the official Jetton code handling bounced messages here and undoing the reduction here.

Do this carefully! When designing complex message flows on TON, pull out a whiteboard and draw the different message flow diagrams like I did in this post. My favorite tool for this job is the wonderful and open source Excalidraw. Then, start simulating a potential failure and message bounce on every step of the flow to make sure your code handles the undo correctly.

Happy coding!

Tal is a founder of Orbs Network (https://orbs.com). He's a passionate blockchain developer, open source advocate and a contributor to the TON ecosystem. He is also one of the main developers for TONcoin Fund (https://www.toncoin.fund). For Tal's work on TON, follow on GitHub (https://github.com/ton-defi-org). For Tal's personal work, follow on GitHub (https://github.com/talkol) and Twitter (https://twitter.com/koltal). If you found any mistakes in this post, please let Tal know on Telegram (https://t.me/talkol).