The Node Operator's Guide to the Lightning Galaxy, Part 2: Node Scoring and Pathfinding

by Bryan Vu

In the first post in this series, we discussed some of the requirements for becoming a routing node operator as well as some of the basic mechanics of setting up a routing node. In this post, we’ll first be discussing channel connectivity and how the concept of “node scoring” can help routing nodes identify other “good” peers to open channels to and maintain channels with. The second topic is high-level discussion about “pathfinding” and about how individual payments are routed. This section will further provide suggestions for how a routing node operator can configure their node so as to become more attractive to Lightning routing algorithms.

Finding and attracting “good” peers


A node operator seeking to contribute to the Lightning Network and also seeking to earn fees should keep in mind that the purpose of the routing node network is to serve end users, merchants and service providers. Our expectation is that incentives will evolve such that those routing nodes that provide high-quality, fast and reliable routing will be rewarded with higher transaction forwarding volumes and greater fee revenue over time. One of the primary incentive systems that we’ve begun to implement in lnd is called node scoring, with related systems for making channel requests and deciding whether channel requests should be accepted. Note that the development of automated node scoring is still in the early phases, so at this point, routing node operators will have to perform much of this work manually. By explaining some of the high-level ideas and learning from the experience of the community, we hope to improve our thinking in this area as we further develop these tools.

Routing node scoring and channel selection


As discussed previously, lnd employs a system called “Autopilot,” which allows end-user nodes (such as Lightning App) to select routing peers with which to open channels. With the release of our Lightning App at the beginning of the year, we started experimenting with “node scoring,” which is the process of gathering data about nodes and channels to identify those which seem more likely to be reliable routing nodes. We’re still very much in the early experimentation phase, but some of the factors that could potentially be used to calculate node scores include:

channel age - In general, a node with older channels is more likely to be a reliable router, particularly if those channels connect to other high-scoring nodes.

uptime - By observing the use of disabled flags, a node’s uptime can be estimated. Obviously, a good routing node must maintain high uptime.

channel sizes, node capitalization - All things being equal, a node with larger channels and more capital allocated is more likely to be able to route successfully than one with less bandwidth.

number of channels - Generally, a node with more connectivity would seem to be a better routing node choice. However, this is only the case if these channels are high-quality, reliable routing channels.

neighboring node scores - Those nodes that are connected to other high-scoring nodes are more likely to be reliable routers themselves. Ideally, this would be a self-reinforcing system in which “good” routing nodes that want to score highly will be selective about the nodes they allow to connect to them and remain connected to them with public channels (private channels aren’t taken into account in node scoring). If a node has poor routing peers in its neighborhood, its score is likely to be negatively impacted. (See the “channel acceptance policies” section below.)

proximity to common payments destinations - A node that is closer to useful destinations is likely to be more useful for routing, since it may reduce the number of hops (and hopefully the fees required) to execute payments.

fee rates - Nodes with reasonable fees are preferable for obvious reasons.

The current “node scoring” we’ve implemented (code-named “Bos Scores”) was designed for end user nodes opening private channels, such as our App, rather than for routing nodes that are seeking to optimize their channel mix for routing volume and fee revenue. However, we’re in the process of extending the node scoring concept to create “routing node scores” as well. These routing node scores will use heuristics that are more relevant for opening public channels to other routing nodes. This system is very much in early development, but one component that has been merged into lnd allows for a routing node operator to provide a ranked list of nodes for Autopilot to attempt connections with. We encourage routing node operators to use this feature to experiment with different heuristics for their own nodes. Feedback in this area is certainly welcome and will help inform further development in lnd.

Channel acceptance policies


As the Lightning routing network evolves, we envision that routing nodes will become more selective about the channel requests they accept. The most basic reason for being selective about inbound channel requests is that a node operator should want capital to be actively deployed, facilitating transactions and earning fees. Allowing channels from low-quality peers that have low levels of activity can result in sub-optimal capital allocation and sub-optimal fee revenue.

To specify which types of nodes will be allowed to open channels to a node, in lnd v0.8-beta, we’ve added a “channel acceptance policy” feature, which combined with node scoring as described above, works as a sort of “reverse autopilot.” Instead of using node scores to determine which peers to extend channels to, the channel acceptance policy uses node scores to determine which peers to accept channel requests from.

Related to this, our current thinking for routing node scores is for them to be somewhat transitive, taking into account the quality and scores of routing peers that a node is publicly connected to (this will generally not apply to end users, merchants, service providers, etc. who use private channels). This transitivity will allow those nodes that have higher scores to attract more channel requests and more capital from other routing nodes, creating a further incentive to operate a reliable, well-capitalized node. Score transitivity will also give nodes an incentive to set their channel acceptance policies to be selective about which peers to accept public connections from.

This feature can require nodes making channel requests to meet certain node score thresholds, channel size thresholds and other criteria before a channel open request will be accepted. Our further hope is that by creating incentives and controls for reliable routing nodes to connect together, end user clients (e.g. mobile and desktop Apps) will be more easily able find well-connected sets of routing nodes which should provide a faster, cheaper and more reliable routing experience.

Note that for the early node operators who are running and joining the network today, the process of evaluating inbound channel requests has been very manual and generally requires reactively closing channels after they’re opened. Before lnd v0.8-beta, lnd would accept any channel request above the configured minimum channel size, and that is still the default as of lnd v0.8-beta. In the near-term, we would recommend that node operators evaluate their public channels and close channels with peers that have poor uptime, poor connectivity, poorly balanced channels, etc. This will begin to accomplish what we expect more automated routing node scoring and channel acceptance policies in lnd will eventually provide.

Possible routing node lifecycle


The following section is an attempt to illustrate how concepts like node scoring and channel acceptance might interact in practice, as we follow the process of a routing node operator joining the network, optimizing their node, and then growing their node to process a higher and higher volume of Lightning payments.

Initial bootstrapping - a new routing node operator decides to join the Lightning Network. The “routing node autopilot” begins to make requests to other routing nodes based on their node scores, looking to create an initial set of channels that has some diversity across the network. Because this new node has a relatively low node score, most of the successful connections will also likely be made to other relatively new and small (in terms of BTC capitalization) routing nodes. (A side note is that this system also makes it more it more difficult to launch nuisance attacks on the network, since larger and more well-established routing nodes will be unlikely to accept channel requests from unknown or newer nodes.)

Initial inbound connections - initially, the new routing node will have outbound channels (initiated by the node), but in order to serve a larger number of users, the node will need to attract end users and other routing node operators to connect “inbound” channels. As a node maintains uptime and channel reliability over time, its node scores should improve, attracting inbound channels and capital from end users, merchants, and service providers (private channels) and other routing nodes (public channels). Early in its life, a routing node will likely only attract relatively small channels, but over time, as a node’s score continues to improve, it should hopefully attract larger, more high-volume channels.

Balancing funds flows - As a routing node becomes established in the network and begins routing traffic both outbound and inbound, patterns in the funds flows will likely emerge. (lndmon is one tool that can be used to monitor these patterns over time.) As mentioned above, ideally, a node will have a somewhat diverse set of users so that while some end users are spending from their channels, others are refilling their channels. Similarly, some connected merchants may be receiving payments while others are “cashing out” to exchanges or paying employees or suppliers. The more a routing node’s users are offsetting each other’s payment flows, the less often the node operator will have to rebalance channels using services like Lightning Loop (more detail below). In the case that some of a node’s channels have flows that are highly correlated, the node operator may choose to close some of those channels or adjust fees in order to pursue more naturally balanced flows.

During this initial establishment phase, as the routing node gets up-to-speed, it’s likely to have more channels opening and closing as the node operator experiments to find a profitable and balanced place in the network.

Pruning and managing channel acceptance - As our routing node protagonist becomes even more well-established, they may have more inbound channel interest than can be reliably served with the amount of capital the node operator has available. At this point, the node operator may choose to become more stringent about the nodes that will be allowed to connect (particularly with public channels). Channels that have poor uptime and low utilization should likely be disconnected or “pruned” and replaced with more reliable, higher-volume peers. Note that channel acceptance policies can be updated via the ChannelAcceptor streaming RPC so can updated in real-time without needing to restart lnd.

Adding capital - As a routing node builds its “reputation” and increases its node scores, an operator may decide to add capital to the node in the hope of earning more BTC. By this time, if the node has been faithfully routing and building volume, there will likely be a larger set of other higher-volume nodes willing to accept connections from it.

As the routing node operator adds capital, another thing to keep in mind is increasing security commensurate with the additional funds stored. Future posts in this series will provide more detail on some of these security considerations.

Moving up in the routing network - Over time, the process of adding capital, accepting higher-quality inbound connections, increasing node scores, routing more transactions and earning more fees can continue. As a node grows in volume, however, more capital, more time and more skill will likely be required. We hope that over time, as a thriving routing ecosystem emerges, there will be room for a large diversity of routing network participants, including hobbyists, professionals, and companies.

Pathfinding: being found and being added to paths


While the node scoring heuristics in lnd are designed to facilitate the process of channel creation, the “pathfinding” system in lnd is designed to find the most efficient route for each individual payment. In lnd, this subsystem is called Mission Control, and as of lnd v0.8-beta it now takes into account previous pathfinding successes and failures when evaluating routes for new payments. Importantly, because source routing is used in the Lightning Network, routing nodes are not directly involved in pathfinding for the vast majority of payments. However, routing node operators should be aware of the pathfinding process so as to be able to create, configure, and maintain channels in a way that makes them more attractive to pathfinding algorithms.

At the high level, an lnd node maintains a copy of the public channels in the Lightning Network. This copy of the graph includes channel parameters such as size, node IDs, fee rates and timelocks. Using this channel graph, lnd employs a modified version of Dijkstra’s algorithm to find a set of paths between the source of a payment and the desired destination that meet general requirements for fee rates, timelocks and other factors. lnd’s Mission Control will then attempt to send the payment to the destination while recording whether the payment failed or succeeded.

Initially, all of the channels in the graph are equally weighted by Mission Control, but over time and after each attempted payment, Mission Control learns more about which channels are reliable and which are not. Starting in lnd v0.8-beta, if a payment succeeded, the channels involved in that payment would begin to be used more frequently in subsequent payment attempts. Likewise, if a payment fails, those channels involved in the failed payment would be less likely to be chosen for future attempts. (In order to query the current state of Mission Control, the set of commands listed under lncli router -h can be used.)

For routing node operators, the key thing to note is that being involved in more successful payments may increase routing traffic, and being involved in failed payments will generally reduce routing volume. Note that because of the way pathfinding in lnd works, payment success and failure impact pathfinding attractiveness for all of a node’s channels, not just the specific channels involved in a particular payment. This provides a further incentive for routing node operators to prefer high-quality peers and channels. Some of the factors involved with success and failure are discussed in more detail below.

Channel balancing


In our earlier post on routing, we discussed the general concept of channel balance. For a routing node to be reliable, channel balance should be managed so that payments can be reliably sent in any direction. Currently, a routing node operator has to manually check channels to ensure that a channel hasn’t become too unbalanced to send payments in both directions (use lncli listchannels or lncli forwardinghistory. In the case that a channel has become unbalanced, the routing node operator can rebalance the channel by sending Lightning funds through to rebalance the channel, or by using a service like Lightning Loop. More specifically, “Loop Out” allows a node to gain more capacity for inbound payments, while “Loop In” refills channels, increasing capacity for outbound payments. In some scenarios, a node operator can also close the channel and reopen a new channel to the same destination, though one of the goals of Lightning Loop is to be significantly cheaper than the cost of closing and reopening channels.

Ideally, a routing node will curate a large enough and diverse enough set of inbound and outbound peers such that flows will stay relatively balanced over time, making the need active rebalancing fairly infrequent. In addition, larger channel sizes will give nodes additional “buffer” to protect against channel unbalancing, saving costs associated with rebalancing.

Fee setting


Currently, the default fees for payment forwarding in lnd specify a one millisatoshi “base fee” as well as a “proportional fee” of 0.0001% of each payment. These values are intentionally set extremely low so that over time, routing node operators can experiment with different fee levels to get an understanding what the fee rates the market will bear as the network grows. Eventually, we believe that those nodes that are reliably operated, well-capitalized and well-connected will be able to command significantly higher fees, but our plan is to let the market develop and converge on those values.

Once a routing node has gained consistent volume, we recommend experimenting with incrementally increasing fees, perhaps by 0.01% or so every few weeks, and determining whether routing volume appears to be impacted. Another parameter that can also be experimented with in the context of fees is the htlc_minimum_msat, which specifies the smallest payment that can be forwarded through a channel. By increasing this value and experimenting with the relationship between base fees and proportional fees, more profitable combinations could potentially be found. To raise fees or to change or htlc_minimum_msat values, lncli updatechanpolicy can be used. Note that to set default fees or HTLC size requirements for new channels, the settings for those parameters should be added to lnd.conf.

Also useful in fee setting are lncli feereport and lncli forwardinghistory, which track fees earned over time. Finally, we’re working to add fee-related panes to lndmon.

Timelocks


Payments in the Lightning Network (represented as HTLCs) have timelocks associated with them that allow the nodes along the payment path to reclaim funds in adversarial situations, such as a node going offline before the HTLC is settled or a node broadcasting an invalid transaction state. The default time_lock_delta (in updatechanpolicy, see above) is 40 blocks, or approximately seven hours. In order to potentially become more attractive for pathfinding, a node operator who is more confident about being able to get a penalty or commitment transaction confirmed more quickly than the default number of blocks can reduce their timelock delta value. In addition, as general Bitcoin on-chain fees change, a node operator can adjust timelocks accordingly (increase CLTV if on-chain fees go up, decrease if on-chain fees go down.)

Changing CLTV delta values can be accomplished by using lncli updatechanpolicy. The network is still quite early in determining where these values may eventually fall, but we encourage active experimentation.

Channel updates


A related note to pathfinding is that an additional implication of the use of source routing in Lightning is that end-user nodes must synchronize and maintain enough of the channel graph to successfully route payments. Because of this, the frequency of updates to channel parameters (such as fees or timelocks set via updatechanpolicy) should be limited so as to make it more likely that end-user nodes will have up-to-date information. (Full synchronization isn’t strictly required, since if the payer attempts to route using stale channel data, up-to-date information will be returned and the attempt will be retried.) Related to this, as well as to the uptime and node scoring concepts discussed above, is that peers and channels that have poor uptime will also trigger channel updates that will potentially increase routing failures and decrease pathfinding scores. This is another reason for routing node operators to avoid unreliable peers.

Some Lightning implementations employ specific limits on the number of updates that will be “gossiped” to the rest of the network in a given time period as well, such as one update per day. A preference for low update frequency may eventually also be incorporated into node scoring algorithms.

Conclusion


In this post, we’ve touched on a few of the key concepts involved in finding peers and being found by pathfinding systems. Clearly, there’s a large amount of building and experimenting to be done in these areas, and we included both thoughts about the future as well as some steps that current routing node operators can take to improve their routing performance. Even though some of these concepts aren’t immediately applicable, we think that by providing information about what we’re building toward, the early routing node community can further experiment and help us validate, invalidate, and improve as we add more routing features to lnd. In the next post in this “Routing Node Guide” series, we’ll be diving a bit deeper into routing node security, with a discussion of the risks involved in running a routing node as well as ways to secure nodes against those risks. As always, if you have questions you’d like to have answered in this series, please feel free to send suggestions to @bvu on Twitter. For other questions, please join the conversation in our Slack community.