Chainlet Orbits: Topological Address Embedding for Blockchain ACM KDD 2025 Conference

Poupak Azad1, Baris Coskunuzer2, Murat Kantarcioglu2, Cuneyt Gurcan Akcora3

1University of Manitoba, 2University of Texas Dallas, 3University of Central Florida

Code Dataset arXiv

Abstract

The rise of cryptocurrencies like Bitcoin has not only increased trade volumes but also broadened the use of graph machine learning techniques, such as address embeddings, to analyze transactions and decipher user patterns. Traditional detection methods rely on simple heuristics and extensive data gathering, while more advanced Graph Neural Networks encounter challenges such as scalability, poor interpretability, and label scarcity in massive blockchain transaction networks.

To overcome the computational and interpretability limitations of existing techniques, we introduce a topological approach, Chainlet Orbits, which embeds blockchain addresses by leveraging their topological characteristics in temporal transactions. This work builds on the earlier chainlet concept introduced to analyze transaction patterns in blockchain graphs. We employ our innovative address embeddings to investigate financial behavior and e-crime in the Bitcoin, Litecoin, and Ethereum networks, focusing on distinctive substructures that arise from user behavior. In node classification experiments, our model demonstrates exceptional performance when compared to GNN-based approaches. Furthermore, our approach embeds all daily nodes of the largest blockchain transaction network, Bitcoin, and creates explainable machine learning models in less than 17 minutes, which takes days for GNN-based approaches.

Chainlets and Orbits

The temporal position of an address node within a directed transaction graph determines its structural role, referred to as an orbit, which can be leveraged in supervised learning tasks. Our primary objective is to establish these roles and utilize them in graph machine learning models. Orbits apply to UTXO blockchains naturally because a one-time-use-only transaction appears in the heterogeneous graph as a node, which allows ordering address nodes. For this reason, we use UTXO networks to define orbits.

Directed Bitcoin Network Graph
By analyzing transaction patterns within the directed Bitcoin network, we identify specific address behaviors (circular nodes) in transactions (rectangular nodes), assigning structural roles as orbits. An e-crime address (red, striped) demonstrates orbits 9, 13, and 14, typical for ransomware transactions.

We model the UTXO network with a graph \( G = (V, E, B) \) from a set of transactions \( TX \) and input and output addresses in \( TX \).

On i\( G \), \( V \) is a set of nodes, and \( E \subseteq V \times V \) is a set of edges. \( B = \{ \textbf{Address}, \textbf{Transaction} \} \) represents the set of node types. A node \( u \in V \) has a node type \( \phi(u) \in B \).

Fori each edge \( e_{uv} \in E \) between adjacent nodes \( u \) and \( v \), we have \( \phi(u) \neq \phi(v) \). In other words, there is no edge between the same node types (Transaction → Transaction or Address → Address), and an edge \( e \in E \) represents a coin transfer between an address node and a transaction node.

Chainlets. Chainlets have been introduced to define position-invariant versions of transaction patterns. The key idea of the chainlet approach is to offer a lossless summary of all transactions by assigning two indices \( i, o \) in the \([0, n]\) range to each transaction with respect to its input and output addresses. A Bitcoin subgraph \( G' = (V', E', B) \) is a subgraph of \( G \), if \( V' \subseteq V \) and \( E' \subseteq E \). If \( G' = (V', E', B) \) is a subgraph of \( G \) and \( E' \) contains all edges \( e_{uv} \in E \) such that \( (u, v) \in V' \), then \( G' \) is called an induced subgraph of \( G \). Two graphs \( G' = (V', E', B) \) and \( G'' = (V'', E'', B) \) are called isomorphic if there exists a bijection \( h: V' \to V'' \) such that all node pairs \( u, v \in V' \) are adjacent in \( G' \) if and only if \( h(u) \) and \( h(v) \) are adjacent in \( G'' \).

\( k \)-chainlet. Let \( k \)-chainlet \( C_k \) in \( G \) be the subgraph of \( G \) containing \( k \) subsequent transaction nodes, all output nodes of these \( k \) transaction nodes, and all input nodes of the first transaction.

A 1-chainlet lists a transaction node with its input and output addresses. Figure 2 shows a 2-chainlet; transactions \( t_1 \) and \( t_2 \) share the common address \( a_1 \). A \( k \)-chainlet where \( k > 1 \) has a natural ordering of transactions because transactions are ordered in blocks, which, in turn, are ordered in a blockchain. This natural order allows us to assign roles to an address depending on where the address appears in a 2-chainlet. In the next section, we will count address roles and list them as address orbits.

Chainlet Classes. Let \( A_k(G) \) be the set of all \( k \)-chainlets in \( G \). For a given \( k \)-chainlet \( C_k \in A_k(G) \), the equivalence class of \( C_k \), \( E(C_k) \), is the set of all isomorphic copies (occurrences) of \( C_k \) in \( A_k \).

Having defined equivalence classes of chainlets, we next define an equivalence class of address nodes for a fixed chainlet class.

Orbit of an Address Node. Let \( C_k \) be a \( k \)-chainlet in \( G \), and \( u \) be an address node in \( C_k \). We define the orbit of \( u \) with respect to \( C_k \) as the collection of the images \( \phi(u) \) where \( \phi \) is an isomorphism from \( C_k \) to another \( k \)-chainlet \( C'_k \) in \( G \), i.e.,

\( O_u = \{ \phi(u) \} \).

We use the notation \( o \) to refer to orbits, e.g., \( o_{15} \) for the orbit 15.

Active and Passive Orbits

We define an orbit as active or passive based on the role of the associated address within a 2-chainlet.

Active Orbit. Let \( C \) be a 2-chainlet. An active orbit with respect to \( C \) is an orbit where its addresses appear as the output of the first and the input of the second transaction in the corresponding occurrence of \( C \). When \( C \) is understood, we denote the set of active orbits as \( O_{\text{act}} \).

Active orbits indicate behavior created by the address owner, as the address in this orbit initiates a new transaction and sends coins to other addresses.

Passive Orbit. Let \( C \) be a 2-chainlet. A passive orbit with respect to \( C \) is an orbit where it appears as the output of the first or the second transaction but not as an input of the second transaction in the corresponding occurrence of \( C \). When \( C \) is understood, we denote the set of passive orbits as \( O_{\text{pass}} \).

Unlike active orbits, passive orbits for a chainlet class \( E(C) \) receive coins without shaping any occurrences of \( C \), as the address in this orbit only receives coins from another address without initiating a new transaction. It is worth noting that an address can belong to an active orbit for one chainlet class while belonging to a passive orbit for a different chainlet class. This means that an address may exhibit both initiating and receiving behavior in different 2-chainlet types. The below figure provides sample active and passive orbit addresses in 2-chainlets.

Active and Passive Orbits Example
Sample active and passive orbit addresses in 2-chainlets. Active orbits (striped) show addresses initiating transactions, while passive orbits (checkered) only receive coins.

The distinction between active and passive orbits is significant for analyzing specific behaviors within the transaction graph. Active orbits help train classifiers to identify spending patterns, such as paying for ransom, as they reflect behavior initiated by the address owner. Conversely, passive orbits are useful for examining receiving behaviors.

Experimental Setup

Datasets. We extract data from the Bitcoin, Litecoin, and Ethereum transaction networks.

Bitcoin Transaction Network. We have downloaded and analyzed the complete Bitcoin transaction graph from its inception in 2009 by utilizing the Bitcoin Core Wallet and the Bitcoin-ETL library.

BitcoinHeist Addresses. Our ransomware dataset is a union of datasets from three widely adopted studies: Montreal, Princeton, and Padua. The combined dataset contains addresses from 27 ransomware families. A total of 19,930 unique ransomware addresses appear 62,641 times since 2009.

Bitcoin Darknet Addresses. We downloaded the Grams dataset from the Darknet Market Archives. This dataset spans from June 9, 2014, to July 12, 2015. We identified 7,557 unique addresses associated with transactions from darknet marketplaces, totaling 1,288,100 occurrences.

Ethereum Transaction Network. The complete Ethereum transaction graph was downloaded and analyzed from its inception in 2015 using the Geth Wallet and the Ethereum-ETL library.

Ethereum DeFi Addresses. Our DeFi dataset consists of lending, asset, derivative, and decentralized exchange addresses. The combined dataset contains 1,407 addresses, appearing a total of 108 million times since 2015.

Litecoin Transaction Network. We downloaded and analyzed the complete Litecoin transaction graph from its inception in 2011 using the Litecoin Core Wallet and RPC calls.

Hardware. We ran experiments on a single Dell PowerEdge R630, featuring an Intel Xeon E5-2650 v3 Processor (10 cores, 2.30 GHz, 20MB Cache), and 192GB of RAM (DDR4-2133MHz).

Preprocessing. We did not apply any normalization or preprocessing. Extracting orbits is computationally efficient, enabling the entire transaction network to be used without removing any edges or nodes.

We made the full address orbits, which are 27 GB in size, publicly available at Huggingface Blockchain Orbits. Classifier data is also provided there.

Computational Complexity. Extracting a 2-chainlet requires three searches on the adjacency matrix. The first step involves finding all transaction nodes, costing \( O(V) \). The second step, which searches for address nodes shared between transaction pairs, has a complexity of \( O(V^3) \). However, because 57.40% of Bitcoin transactions have fewer than two input and output addresses, the complexity is effectively reduced to \( k \times O(V^2) \), where \( k \approx 2 \).

As shown in the below figure, on the majority of days, orbit extraction is completed within 1,000 seconds (less than 17 minutes).

Orbit Discovery Time
Orbit discovery time costs from the daily Bitcoin transaction graph. Each data point denotes a specific day.

Experimental Results

We apply orbit discovery to the Bitcoin, Litecoin, and Ethereum blockchains. As ground truth address labels are predominantly available for the Bitcoin network, we primarily focus on the Bitcoin transaction network and detail our findings in this section. Results for Ethereum and Litecoin are also discussed in the appendices.

Heuristics

We employ co-spending and transition heuristics with all historical ransomware addresses. This leads to the discovery of 40 unique addresses associated with well-known ransomware families such as CryptoLocker, CryptoWall, and CryptoTorLocker. These low numbers suggest that researchers had previously expanded these datasets using similar heuristics.

Orbits on the BitcoinHeist Benchmark

We use orbits of Bitcoin addresses in a binary (white vs. ransomware) node classification task on the BitcoinHeist dataset. The dataset includes:

Transactions with amounts less than 0.3 BTC were excluded, as ransomware payments are typically larger. For classification, we use a Random Forest model with 300 trees and an 80/20 train/test split.

Results. The following table compares the AUC scores for address classification. The results demonstrate that orbits provide robust performance compared to BitcoinHeist features alone, especially in larger datasets.

Sample BitcoinHeist Orbit BitcoinHeist + Orbit
50K 0.807 0.857 0.908
100K 0.808 0.856 0.908
500K 0.798 0.849 0.903
1M 0.791 0.839 0.895

GNNs on the BitcoinHeist Benchmark

We could not apply existing node classification approaches to the Bitcoin transaction network because the network is too large and contains as few as four e-crime addresses per day across over ten thousand days. Instead, we reformulated the node classification task as the classification of the subgraph of the two-hop neighborhood of an address.

Formally, we define the neighborhood graph of an address as the combination of the induced subgraphs of:

  1. 2-chainlets where the second transaction outputs coins to the address, and
  2. 2-chainlets where the first transaction outputs coins to the address.
An example of a neighborhood graph is shown in Figure 1.

Advantages of the Neighborhood Approach:

GNN Models. We evaluate the orbit-based classifier against four GNNs on the BitcoinHeist data:

We report the average accuracy of five runs, along with their standard deviation, using grid search parameters from the benchmark study by Errica et al.

Despite the sparsity of the Bitcoin network and challenges of data imbalance, GNNs provide strong results. Table 2 summarizes the AUC scores for address classification:

Sample Size GCN GIN DGCNN
20K 0.76 ± 0.002 0.90 ± 0.02 0.79 ± 0.09
10K 0.70 ± 0.004 0.88 ± 0.01 0.75 ± 0.09
5K 0.70 ± 0.002 0.86 ± 0.02 0.78 ± 0.09

Comparison of Orbits and GIN. The GIN model demonstrates strong performance, achieving an AUC of 0.90 for smaller datasets (up to 20K addresses). However, GNN models are constrained to smaller datasets due to computational limitations. In contrast, orbit-based models are scalable and can incorporate temporal patterns by analyzing entire daily transaction networks.

Orbits on the Large Bitcoin Network

We evaluate the generalization of e-crime classification with a larger Bitcoin dataset, sampling 1% of the daily white addresses (3.16 million addresses). The labeled data includes:

We employ a one-versus-rest Random Forest classifier with an 80/20 train/test split. Figure 5 shows consistent AUC performance exceeding 0.8, demonstrating the strength of orbit-based classifiers.

Active and Passive Orbits

We analyze the role of orbit types (active vs. passive) in detecting e-crime payments. The following table summarizes the macro-averaged AUC performance:

Class Active Orbits AUC Passive Orbits AUC
White 0.731 0.870
Ransomware (RS) 0.725 0.736
Darknet Market (DM) 0.727 0.876

Passive orbits yield higher AUC values compared to active orbits. This result suggests that studying payer behaviors (e.g., victims in ransomware and buyers in darknet markets) is particularly effective for identifying e-crime addresses.

Orbit-based Classification Performance
AUC scores for orbit-based e-crime address classification. White and darknet classes are undersampled. Orbit performance remains robust as sample sizes increase.

Adversarial Patterns of E-Crime

We analyze adversarial patterns within e-crime addresses, focusing on ransomware and darknet markets. Two primary patterns stand out:

Ransomware Address Patterns

Ransomware addresses primarily fall into two distinct orbit patterns:

We observed that ransomware operators often reuse addresses to minimize transaction costs, which creates observable orbit patterns (e.g., Orbits 9 and 12). However, these patterns rarely connect to coin-mixing transactions, as shown in Figure 6.

Coin-Mixing Transactions

In coin-mixing transactions, orbits form more complex structures. We identify the following:

Our analysis indicates that e-crime coins are first consolidated in specific addresses (Orbit 30 and 31) and subsequently mixed through intermediary steps. This makes identifying and tracking these addresses particularly challenging.

Adversarial Patterns of E-Crime
Adversarial patterns in ransomware and coin-mixing transactions. Darknet market sellers consolidate payments (Orbit 30, 31) before coin-mixing.

Address Behavior Insights

By analyzing orbits, we extract critical address behaviors:

These insights highlight the utility of orbits in identifying adversarial address patterns and tracking e-crime behavior within blockchain networks.

Account Network and Chainlets

We extend our definition to account-based blockchains by using the temporal order. Unlike Bitcoin’s UTXO model, Ethereum uses an account-based model where transactions involve two addresses: one input address and one output address. Therefore, the concept of chainlets needs to be adapted to this model where 2-chainlets are not naturally defined. We define chainlet orbits for Ethereum by focusing on the temporal transaction patterns within the network.

Specifically, we look at the \( k = 3 \)-hop neighborhoods of nodes (accounts) in the Ethereum transaction graph to capture local substructures. This choice allows us to both apply the vanilla orbit definitions to the account network without modification and also standardize orbit roles across UTXO and account-based blockchains.

In order to apply the UTXO orbit models to the Ethereum network, we employ ordinary nodes as anchors to serve a role similar to that of a UTXO transaction node. An anchor node is an address involved in a specific transaction pattern within the 3-hop neighborhood.

Anchor 1. An anchor 1 is an address node that is part of an orbit initiating at least one transaction such that its output is used in two subsequent transactions.

Anchor 2. An anchor 2 node is an address node that is part of an orbit receiving coins from at least one other node and forwarding to at least one other address.

Ethereum Anchor Example
Orbit example in Ethereum network. Anchors in the figure act as the transaction nodes in Orbits 9, 10, and 11.

Once anchors 1 and 2 are defined, we extract 2-chainlets and compute orbits of address nodes. By analyzing these orbit patterns, we effectively capture the dynamic transaction behaviors characteristic of account-based blockchain networks such as Ethereum.

Limitations. We note that account-based transactions inherently involve simpler, direct transactions between two parties (one sender and one receiver). This simplicity restricts the complexity of the transaction graph and subsequently limits the depth of analysis that can be performed using orbits. Without the natural formation of complex chainlets as in UTXO models, it’s challenging to capture rich, multi-step transaction patterns.

Ethereum Results

We train an address classifier on Ethereum for four types of DeFi protocols: lending, assets, derivatives, and decentralized exchanges, as well as a class for ordinary (“normal”) addresses. The classification problem is significant because new protocols are introduced on the Ethereum blockchain daily, and existing protocols create new addresses. Users and blockchain data analytics companies scour offline resources to find the entity behind an address with the goal of tracking investment and portfolio decisions, detecting transaction anomalies, and identifying smart contract hacking attempts. Automated address classification alleviates the need for much manual work. Hence, address type classification has appeared as an important and challenging task in Graph Machine Learning.

It is important to note that Ethereum addresses, whether they belong to smart contracts or externally owned accounts, share the same format: a 42-character hexadecimal string beginning with "0x." This uniformity means that, when looking at the address string alone, there is no inherent way to differentiate between an address that represents a smart contract and one that represents an externally owned account.

The distinction between these two types of addresses is critical in understanding their behavior and capabilities on the Ethereum network. Smart contracts are autonomous programs with code execution capabilities, while externally owned accounts are controlled by private keys and are responsible for initiating transactions. Without additional context or metadata, simply examining the address string does not reveal this information.

Experimental Setup: We selected the daily graph of September 20, 2021, because it contained the highest number of protocol addresses in ether transactions. However, even for this day, the data imbalance is significant: only 89 protocol addresses appeared in over 1 million ether transactions. To address this, we applied a down-sampling strategy for the “normal” addresses. Due to the limited dataset of protocol addresses, we were constrained to use a binary classification scheme distinguishing between protocol and non-protocol addresses.

Results: We used a random forest algorithm with 300 trees on orbits, where the most important orbits identified by the Gini importance measure were orbits 0, 1, 2, 8, and 29. These orbits are all passive orbits used to receive coin payments from users.

The orbit model achieved an accuracy of 0.97, and the following figure shows an Area Under the Curve (AUC) of 0.86. Adding more ordinary nodes initially improves the classifier's performance but eventually deteriorates it. These results indicate that orbits are highly informative for Ethereum address classification, but additional labeled data, specifically unique addresses, is essential for further validation.

Ethereum Address Classification Performance
Ethereum address classification performance. The AUC achieves a peak value of 0.86 for protocol addresses.

Conclusion

We have proposed a topological address embedding model, called chainlet orbits, that captures a node’s structural role in a blockchain graph. This approach results in compact and effective embeddings that can be easily integrated into node classification models, outperforming GNNs that can be trained for a small subset of the addresses only. Additionally, the orbit patterns can be visualized and used with interpretable models.

Our approach is applicable to graphs of both UTXO and account-based blockchains. The orbit discovery process is computationally cheap and scales to massive graphs easily; our approach embeds all daily nodes of the Bitcoin transaction network in less than 17 minutes.

Future Work: Future work will focus on multi-modal graphs where meta attributes of transactions can be incorporated as node features. Another promising area of research is developing self-supervised ways to learn node and edge functions from data to define new orbits.



For additional details and experimental results, please refer to our main paper. Thank you!



BibTeX

@article{azad2023chainlet,
title={Chainlet Orbits: Topological Address Embedding for Blockchain},
author={Azad, Poupak and Coskunuzer, Baris and Kantarcioglu, Murat and Akcora, Cuneyt Gurcan},
journal={SIGKDD International Conference on Knowledge Discovery and Data Mining},
year={2025}
}