Abstract
The rise of cryptocurrencies like Bitcoin has not only increased trade volumes but also broadened the use of graph machine learning techniques, such as address embeddings, to analyze transactions and decipher user patterns. Traditional detection methods rely on simple heuristics and extensive data gathering, while more advanced Graph Neural Networks encounter challenges such as scalability, poor interpretability, and label scarcity in massive blockchain transaction networks.
To overcome the computational and interpretability limitations of existing techniques, we introduce a topological approach, Chainlet Orbits, which embeds blockchain addresses by leveraging their topological characteristics in temporal transactions. This work builds on the earlier chainlet concept introduced to analyze transaction patterns in blockchain graphs. We employ our innovative address embeddings to investigate financial behavior and e-crime in the Bitcoin, Litecoin, and Ethereum networks, focusing on distinctive substructures that arise from user behavior. In node classification experiments, our model demonstrates exceptional performance when compared to GNN-based approaches. Furthermore, our approach embeds all daily nodes of the largest blockchain transaction network, Bitcoin, and creates explainable machine learning models in less than 17 minutes, which takes days for GNN-based approaches.
Chainlets and Orbits
The temporal position of an address node within a directed transaction graph determines its structural role, referred to as an orbit, which can be leveraged in supervised learning tasks. Our primary objective is to establish these roles and utilize them in graph machine learning models. Orbits apply to UTXO blockchains naturally because a one-time-use-only transaction appears in the heterogeneous graph as a node, which allows ordering address nodes. For this reason, we use UTXO networks to define orbits.

We model the UTXO network with a graph \( G = (V, E, B) \) from a set of transactions \( TX \) and input and output addresses in \( TX \).
On i\( G \), \( V \) is a set of nodes, and \( E \subseteq V \times V \) is a set of edges. \( B = \{ \textbf{Address}, \textbf{Transaction} \} \) represents the set of node types. A node \( u \in V \) has a node type \( \phi(u) \in B \).
Fori each edge \( e_{uv} \in E \) between adjacent nodes \( u \) and \( v \), we have \( \phi(u) \neq \phi(v) \). In other words, there is no edge between the same node types (Transaction → Transaction or Address → Address), and an edge \( e \in E \) represents a coin transfer between an address node and a transaction node.
Chainlets. Chainlets have been introduced to define position-invariant versions of transaction patterns. The key idea of the chainlet approach is to offer a lossless summary of all transactions by assigning two indices \( i, o \) in the \([0, n]\) range to each transaction with respect to its input and output addresses. A Bitcoin subgraph \( G' = (V', E', B) \) is a subgraph of \( G \), if \( V' \subseteq V \) and \( E' \subseteq E \). If \( G' = (V', E', B) \) is a subgraph of \( G \) and \( E' \) contains all edges \( e_{uv} \in E \) such that \( (u, v) \in V' \), then \( G' \) is called an induced subgraph of \( G \). Two graphs \( G' = (V', E', B) \) and \( G'' = (V'', E'', B) \) are called isomorphic if there exists a bijection \( h: V' \to V'' \) such that all node pairs \( u, v \in V' \) are adjacent in \( G' \) if and only if \( h(u) \) and \( h(v) \) are adjacent in \( G'' \).
\( k \)-chainlet. Let \( k \)-chainlet \( C_k \) in \( G \) be the subgraph of \( G \) containing \( k \) subsequent transaction nodes, all output nodes of these \( k \) transaction nodes, and all input nodes of the first transaction.
A 1-chainlet lists a transaction node with its input and output addresses. Figure 2 shows a 2-chainlet; transactions \( t_1 \) and \( t_2 \) share the common address \( a_1 \). A \( k \)-chainlet where \( k > 1 \) has a natural ordering of transactions because transactions are ordered in blocks, which, in turn, are ordered in a blockchain. This natural order allows us to assign roles to an address depending on where the address appears in a 2-chainlet. In the next section, we will count address roles and list them as address orbits.
Chainlet Classes. Let \( A_k(G) \) be the set of all \( k \)-chainlets in \( G \). For a given \( k \)-chainlet \( C_k \in A_k(G) \), the equivalence class of \( C_k \), \( E(C_k) \), is the set of all isomorphic copies (occurrences) of \( C_k \) in \( A_k \).
Having defined equivalence classes of chainlets, we next define an equivalence class of address nodes for a fixed chainlet class.
Orbit of an Address Node. Let \( C_k \) be a \( k \)-chainlet in \( G \), and \( u \) be an address node in \( C_k \). We define the orbit of \( u \) with respect to \( C_k \) as the collection of the images \( \phi(u) \) where \( \phi \) is an isomorphism from \( C_k \) to another \( k \)-chainlet \( C'_k \) in \( G \), i.e.,
\( O_u = \{ \phi(u) \} \).
We use the notation \( o \) to refer to orbits, e.g., \( o_{15} \) for the orbit 15.
Active and Passive Orbits
We define an orbit as active or passive based on the role of the associated address within a 2-chainlet.
Active Orbit. Let \( C \) be a 2-chainlet. An active orbit with respect to \( C \) is an orbit where its addresses appear as the output of the first and the input of the second transaction in the corresponding occurrence of \( C \). When \( C \) is understood, we denote the set of active orbits as \( O_{\text{act}} \).
Active orbits indicate behavior created by the address owner, as the address in this orbit initiates a new transaction and sends coins to other addresses.
Passive Orbit. Let \( C \) be a 2-chainlet. A passive orbit with respect to \( C \) is an orbit where it appears as the output of the first or the second transaction but not as an input of the second transaction in the corresponding occurrence of \( C \). When \( C \) is understood, we denote the set of passive orbits as \( O_{\text{pass}} \).
Unlike active orbits, passive orbits for a chainlet class \( E(C) \) receive coins without shaping any occurrences of \( C \), as the address in this orbit only receives coins from another address without initiating a new transaction. It is worth noting that an address can belong to an active orbit for one chainlet class while belonging to a passive orbit for a different chainlet class. This means that an address may exhibit both initiating and receiving behavior in different 2-chainlet types. The below figure provides sample active and passive orbit addresses in 2-chainlets.

The distinction between active and passive orbits is significant for analyzing specific behaviors within the transaction graph. Active orbits help train classifiers to identify spending patterns, such as paying for ransom, as they reflect behavior initiated by the address owner. Conversely, passive orbits are useful for examining receiving behaviors.
Experimental Setup
Datasets. We extract data from the Bitcoin, Litecoin, and Ethereum transaction networks.
Bitcoin Transaction Network. We have downloaded and analyzed the complete Bitcoin transaction graph from its inception in 2009 by utilizing the Bitcoin Core Wallet and the Bitcoin-ETL library.
BitcoinHeist Addresses. Our ransomware dataset is a union of datasets from three widely adopted studies: Montreal, Princeton, and Padua. The combined dataset contains addresses from 27 ransomware families. A total of 19,930 unique ransomware addresses appear 62,641 times since 2009.
Bitcoin Darknet Addresses. We downloaded the Grams dataset from the Darknet Market Archives. This dataset spans from June 9, 2014, to July 12, 2015. We identified 7,557 unique addresses associated with transactions from darknet marketplaces, totaling 1,288,100 occurrences.
Ethereum Transaction Network. The complete Ethereum transaction graph was downloaded and analyzed from its inception in 2015 using the Geth Wallet and the Ethereum-ETL library.
Ethereum DeFi Addresses. Our DeFi dataset consists of lending, asset, derivative, and decentralized exchange addresses. The combined dataset contains 1,407 addresses, appearing a total of 108 million times since 2015.
Litecoin Transaction Network. We downloaded and analyzed the complete Litecoin transaction graph from its inception in 2011 using the Litecoin Core Wallet and RPC calls.
Hardware. We ran experiments on a single Dell PowerEdge R630, featuring an Intel Xeon E5-2650 v3 Processor (10 cores, 2.30 GHz, 20MB Cache), and 192GB of RAM (DDR4-2133MHz).
Preprocessing. We did not apply any normalization or preprocessing. Extracting orbits is computationally efficient, enabling the entire transaction network to be used without removing any edges or nodes.
We made the full address orbits, which are 27 GB in size, publicly available at Huggingface Blockchain Orbits. Classifier data is also provided there.
Computational Complexity. Extracting a 2-chainlet requires three searches on the adjacency matrix. The first step involves finding all transaction nodes, costing \( O(V) \). The second step, which searches for address nodes shared between transaction pairs, has a complexity of \( O(V^3) \). However, because 57.40% of Bitcoin transactions have fewer than two input and output addresses, the complexity is effectively reduced to \( k \times O(V^2) \), where \( k \approx 2 \).
As shown in the below figure, on the majority of days, orbit extraction is completed within 1,000 seconds (less than 17 minutes).

Experimental Results
We apply orbit discovery to the Bitcoin, Litecoin, and Ethereum blockchains. As ground truth address labels are predominantly available for the Bitcoin network, we primarily focus on the Bitcoin transaction network and detail our findings in this section. Results for Ethereum and Litecoin are also discussed in the appendices.
Heuristics
We employ co-spending and transition heuristics with all historical ransomware addresses. This leads to the discovery of 40 unique addresses associated with well-known ransomware families such as CryptoLocker, CryptoWall, and CryptoTorLocker. These low numbers suggest that researchers had previously expanded these datasets using similar heuristics.
Orbits on the BitcoinHeist Benchmark
We use orbits of Bitcoin addresses in a binary (white vs. ransomware) node classification task on the BitcoinHeist dataset. The dataset includes:
- 2,916,697 ransomware and white addresses
- 1,048,576 transactions from January 2009 to December 2018
Transactions with amounts less than 0.3 BTC were excluded, as ransomware payments are typically larger. For classification, we use a Random Forest model with 300 trees and an 80/20 train/test split.
Results. The following table compares the AUC scores for address classification. The results demonstrate that orbits provide robust performance compared to BitcoinHeist features alone, especially in larger datasets.
Sample | BitcoinHeist | Orbit | BitcoinHeist + Orbit |
---|---|---|---|
50K | 0.807 | 0.857 | 0.908 |
100K | 0.808 | 0.856 | 0.908 |
500K | 0.798 | 0.849 | 0.903 |
1M | 0.791 | 0.839 | 0.895 |
GNNs on the BitcoinHeist Benchmark
We could not apply existing node classification approaches to the Bitcoin transaction network because the network is too large and contains as few as four e-crime addresses per day across over ten thousand days. Instead, we reformulated the node classification task as the classification of the subgraph of the two-hop neighborhood of an address.
Formally, we define the neighborhood graph of an address as the combination of the induced subgraphs of:
- 2-chainlets where the second transaction outputs coins to the address, and
- 2-chainlets where the first transaction outputs coins to the address.
Advantages of the Neighborhood Approach:
- It enables an inductive learner, where an address from any day can be classified using its neighborhood graph from that day.
- It allows control over the number of addresses classified, mitigating the computational costs of graph neural networks (GNNs).
GNN Models. We evaluate the orbit-based classifier against four GNNs on the BitcoinHeist data:
- DGCNN
- GIN
- GCN
- GraphSage
We report the average accuracy of five runs, along with their standard deviation, using grid search parameters from the benchmark study by Errica et al.
Despite the sparsity of the Bitcoin network and challenges of data imbalance, GNNs provide strong results. Table 2 summarizes the AUC scores for address classification:
Sample Size | GCN | GIN | DGCNN |
---|---|---|---|
20K | 0.76 ± 0.002 | 0.90 ± 0.02 | 0.79 ± 0.09 |
10K | 0.70 ± 0.004 | 0.88 ± 0.01 | 0.75 ± 0.09 |
5K | 0.70 ± 0.002 | 0.86 ± 0.02 | 0.78 ± 0.09 |
Comparison of Orbits and GIN. The GIN model demonstrates strong performance, achieving an AUC of 0.90 for smaller datasets (up to 20K addresses). However, GNN models are constrained to smaller datasets due to computational limitations. In contrast, orbit-based models are scalable and can incorporate temporal patterns by analyzing entire daily transaction networks.
Orbits on the Large Bitcoin Network
We evaluate the generalization of e-crime classification with a larger Bitcoin dataset, sampling 1% of the daily white addresses (3.16 million addresses). The labeled data includes:
- 1.28 million darknet market addresses
- 64,000 ransomware addresses
We employ a one-versus-rest Random Forest classifier with an 80/20 train/test split. Figure 5 shows consistent AUC performance exceeding 0.8, demonstrating the strength of orbit-based classifiers.
Active and Passive Orbits
We analyze the role of orbit types (active vs. passive) in detecting e-crime payments. The following table summarizes the macro-averaged AUC performance:
Class | Active Orbits AUC | Passive Orbits AUC |
---|---|---|
White | 0.731 | 0.870 |
Ransomware (RS) | 0.725 | 0.736 |
Darknet Market (DM) | 0.727 | 0.876 |
Passive orbits yield higher AUC values compared to active orbits. This result suggests that studying payer behaviors (e.g., victims in ransomware and buyers in darknet markets) is particularly effective for identifying e-crime addresses.

Adversarial Patterns of E-Crime
We analyze adversarial patterns within e-crime addresses, focusing on ransomware and darknet markets. Two primary patterns stand out:
Ransomware Address Patterns
Ransomware addresses primarily fall into two distinct orbit patterns:
- Immediate Forwarding: Payments received are immediately forwarded to another address in a subsequent transaction.
- Escrow Behavior: Payments remain in the receiving address without further transfers, mimicking historical escrow practices in darknet markets.
We observed that ransomware operators often reuse addresses to minimize transaction costs, which creates observable orbit patterns (e.g., Orbits 9 and 12). However, these patterns rarely connect to coin-mixing transactions, as shown in Figure 6.
Coin-Mixing Transactions
In coin-mixing transactions, orbits form more complex structures. We identify the following:
- Multi-Step Patterns: Addresses combine coins from multiple transactions before forwarding them to new addresses.
- Intermediary Addresses: Specific addresses act as intermediaries, breaking the link between input and output transactions.
Our analysis indicates that e-crime coins are first consolidated in specific addresses (Orbit 30 and 31) and subsequently mixed through intermediary steps. This makes identifying and tracking these addresses particularly challenging.

Address Behavior Insights
By analyzing orbits, we extract critical address behaviors:
- Ransomware payments are typically larger and exhibit forwarding behavior (Orbits 9, 12).
- Coin-mixing involves consolidation and multi-step transfers, which stand out in orbit-based analysis.
- White addresses rarely form the same complex structures, making anomalies easier to detect.
These insights highlight the utility of orbits in identifying adversarial address patterns and tracking e-crime behavior within blockchain networks.
Account Network and Chainlets
We extend our definition to account-based blockchains by using the temporal order. Unlike Bitcoin’s UTXO model, Ethereum uses an account-based model where transactions involve two addresses: one input address and one output address. Therefore, the concept of chainlets needs to be adapted to this model where 2-chainlets are not naturally defined. We define chainlet orbits for Ethereum by focusing on the temporal transaction patterns within the network.
Specifically, we look at the \( k = 3 \)-hop neighborhoods of nodes (accounts) in the Ethereum transaction graph to capture local substructures. This choice allows us to both apply the vanilla orbit definitions to the account network without modification and also standardize orbit roles across UTXO and account-based blockchains.
In order to apply the UTXO orbit models to the Ethereum network, we employ ordinary nodes as anchors to serve a role similar to that of a UTXO transaction node. An anchor node is an address involved in a specific transaction pattern within the 3-hop neighborhood.
Anchor 1. An anchor 1 is an address node that is part of an orbit initiating at least one transaction such that its output is used in two subsequent transactions.
Anchor 2. An anchor 2 node is an address node that is part of an orbit receiving coins from at least one other node and forwarding to at least one other address.

Once anchors 1 and 2 are defined, we extract 2-chainlets and compute orbits of address nodes. By analyzing these orbit patterns, we effectively capture the dynamic transaction behaviors characteristic of account-based blockchain networks such as Ethereum.
Limitations. We note that account-based transactions inherently involve simpler, direct transactions between two parties (one sender and one receiver). This simplicity restricts the complexity of the transaction graph and subsequently limits the depth of analysis that can be performed using orbits. Without the natural formation of complex chainlets as in UTXO models, it’s challenging to capture rich, multi-step transaction patterns.
Ethereum Results
We train an address classifier on Ethereum for four types of DeFi protocols: lending, assets, derivatives, and decentralized exchanges, as well as a class for ordinary (“normal”) addresses. The classification problem is significant because new protocols are introduced on the Ethereum blockchain daily, and existing protocols create new addresses. Users and blockchain data analytics companies scour offline resources to find the entity behind an address with the goal of tracking investment and portfolio decisions, detecting transaction anomalies, and identifying smart contract hacking attempts. Automated address classification alleviates the need for much manual work. Hence, address type classification has appeared as an important and challenging task in Graph Machine Learning.
It is important to note that Ethereum addresses, whether they belong to smart contracts or externally owned accounts, share the same format: a 42-character hexadecimal string beginning with "0x." This uniformity means that, when looking at the address string alone, there is no inherent way to differentiate between an address that represents a smart contract and one that represents an externally owned account.
The distinction between these two types of addresses is critical in understanding their behavior and capabilities on the Ethereum network. Smart contracts are autonomous programs with code execution capabilities, while externally owned accounts are controlled by private keys and are responsible for initiating transactions. Without additional context or metadata, simply examining the address string does not reveal this information.
Experimental Setup: We selected the daily graph of September 20, 2021, because it contained the highest number of protocol addresses in ether transactions. However, even for this day, the data imbalance is significant: only 89 protocol addresses appeared in over 1 million ether transactions. To address this, we applied a down-sampling strategy for the “normal” addresses. Due to the limited dataset of protocol addresses, we were constrained to use a binary classification scheme distinguishing between protocol and non-protocol addresses.
Results: We used a random forest algorithm with 300 trees on orbits, where the most important orbits identified by the Gini importance measure were orbits 0, 1, 2, 8, and 29. These orbits are all passive orbits used to receive coin payments from users.
The orbit model achieved an accuracy of 0.97, and the following figure shows an Area Under the Curve (AUC) of 0.86. Adding more ordinary nodes initially improves the classifier's performance but eventually deteriorates it. These results indicate that orbits are highly informative for Ethereum address classification, but additional labeled data, specifically unique addresses, is essential for further validation.

Conclusion
We have proposed a topological address embedding model, called chainlet orbits, that captures a node’s structural role in a blockchain graph. This approach results in compact and effective embeddings that can be easily integrated into node classification models, outperforming GNNs that can be trained for a small subset of the addresses only. Additionally, the orbit patterns can be visualized and used with interpretable models.
Our approach is applicable to graphs of both UTXO and account-based blockchains. The orbit discovery process is computationally cheap and scales to massive graphs easily; our approach embeds all daily nodes of the Bitcoin transaction network in less than 17 minutes.
Future Work: Future work will focus on multi-modal graphs where meta attributes of transactions can be incorporated as node features. Another promising area of research is developing self-supervised ways to learn node and edge functions from data to define new orbits.
For additional details and experimental results, please refer to our main paper. Thank you!