R package for efficient calculation of many-to-many pairwise distances on dual-weighted directed graphs, and for aggregation of flows throughout networks.

Three aspects. First, while other packages exist for calculating distances on directed graphs, notably `igraph`

, even this otherwise fabulous package does not (readily) permit analysis of *dual-weighted* graphs. Dual-weighted graphs have two sets of weights for each edge, so routing can be evaluated with one set of weights, while distances can be calculated with the other. A canonical example is a street network, where *weighted distances* are assigned depending on mode of transport (for example, weighted distances for pedestrians on multi-lane vehicular roads are longer than equivalent distances along isolated walking paths), yet the desired output remains direct, unweighted distances. Accurate calculation of distances on street networks requires a dual-weighted representation. In **R**, `dodgr`

is currently the only package that offers this functionality (without excessive data wrangling).

Second, while `igraph`

and almost all other routing packages are primarily designed for one-to-one routing, `dodgr`

is specifically designed for many-to-many routing, and will generally outperform equivalent packages in large routing tasks.

Third, `dodgr`

goes beyond the functionality of comparable packages through including routines to aggregate flows throughout a network, through specifying origins, destinations, and flow densities between each pair of points. Alternatively, flows can be aggregated according to a network dispersal model from a set of origin points and associated densities, and a user-specified dispersal model.

You can install `dodgr`

with:

```
install.packages("dodgr") # current CRAN version
# install.packages("remotes")
remotes::install_github("ATFutures/dodgr") # Development version
```

Then load with

```
library (dodgr)
packageVersion ("dodgr")
#> [1] '0.1.3.3'
```

`dodgr`

networksTo illustrate functionality, the package includes an example data set containing the Open Street Map network for Hampi, India (a primarily pedestrian village in the middle of a large World Heritage zone). These data are in Simple Features (`sf`

) format, as a collection of `LINESTRING`

objects. `dodgr`

represents networks as a simple rectangular graph, with each row representing an edge segment between two points or vertices. `sf`

-format objects can be converted to equivalent `dodgr`

representations with the `weight_streetnet()`

function:

```
class (hampi)
#> [1] "sf" "data.frame"
dim (hampi)
#> [1] 191 15
graph <- weight_streetnet (hampi, wt_profile = "foot")
class (graph)
#> [1] "data.frame" "dodgr_streetnet"
dim (graph)
#> [1] 5845 13
```

The `sf`

-format network contained 191 `LINESTRING`

objects, with the `weight_streetnet()`

function decomposing these into 5,845 distinct edges, indicating that the `sf`

representation had around 31 edges or segments in each `LINESTRING`

object. The `dodgr`

network then looks like this:

`head (graph)`

geom_num | edge_id | from_id | from_lon | from_lat | to_id | to_lon | to_lat | d | d_weighted | highway | way_id | component |
---|---|---|---|---|---|---|---|---|---|---|---|---|

1 | 1 | 339318500 | 76.47489 | 15.34169 | 339318502 | 76.47612 | 15.34173 | 0.1324422 | 0.1471580 | service | 28565950 | 1 |

1 | 2 | 339318502 | 76.47612 | 15.34173 | 339318500 | 76.47489 | 15.34169 | 0.1324422 | 0.1471580 | service | 28565950 | 1 |

1 | 3 | 339318502 | 76.47612 | 15.34173 | 2398958028 | 76.47621 | 15.34174 | 0.0088887 | 0.0098763 | service | 28565950 | 1 |

1 | 4 | 2398958028 | 76.47621 | 15.34174 | 339318502 | 76.47612 | 15.34173 | 0.0088887 | 0.0098763 | service | 28565950 | 1 |

1 | 5 | 2398958028 | 76.47621 | 15.34174 | 1427116077 | 76.47628 | 15.34179 | 0.0093265 | 0.0103628 | service | 28565950 | 1 |

1 | 6 | 1427116077 | 76.47628 | 15.34179 | 2398958028 | 76.47621 | 15.34174 | 0.0093265 | 0.0103628 | service | 28565950 | 1 |

The `geom_num`

column maps directly onto the sequence of `LINESTRING`

objects within the `sf`

-formatted data. The `highway`

column is taken directly from Open Street Map, and denotes the kind of “highway” represented by each edge. The `component`

column is an integer value describing which of the connected components of the network each edge belongs to (with `1`

always being the largest component; `2`

the second largest; and so on).

Note that the `d_weighted`

values are often greater than the geometric distances, `d`

. In the example shown, `service`

highways are not ideal for pedestrians, and so weighted distances are slightly greater than actual distances. Compare this with:

`head (graph [graph$highway == "path", ])`

geom_num | edge_id | from_id | from_lon | from_lat | to_id | to_lon | to_lat | d | d_weighted | highway | way_id | component | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|

47 | 2 | 47 | 338905220 | 76.47398 | 15.31224 | 338907543 | 76.47405 | 15.31241 | 0.0197040 | 0.0197040 | path | 30643853 | 1 |

48 | 2 | 48 | 338907543 | 76.47405 | 15.31241 | 338905220 | 76.47398 | 15.31224 | 0.0197040 | 0.0197040 | path | 30643853 | 1 |

49 | 2 | 49 | 338907543 | 76.47405 | 15.31241 | 2398957585 | 76.47410 | 15.31259 | 0.0213917 | 0.0213917 | path | 30643853 | 1 |

50 | 2 | 50 | 2398957585 | 76.47410 | 15.31259 | 338907543 | 76.47405 | 15.31241 | 0.0213917 | 0.0213917 | path | 30643853 | 1 |

51 | 2 | 51 | 2398957585 | 76.47410 | 15.31259 | 338907597 | 76.47413 | 15.31279 | 0.0221521 | 0.0221521 | path | 30643853 | 1 |

52 | 2 | 52 | 338907597 | 76.47413 | 15.31279 | 2398957585 | 76.47410 | 15.31259 | 0.0221521 | 0.0221521 | path | 30643853 | 1 |

A `"path"`

offers ideal walking conditions, and so weighted distances are equal to actual distances.

The many-to-many nature of `dodgr`

means that the function to calculate distances, `dodgr_distances()`

, accepts two vectors or matrices of routing points as inputs (describing origins and destinations), and returns a corresponding matrix of pairwise distances. If an input graph has columns for both distances and weighted distances, the latter are used to determine the effectively shortest routes through a network, while actual distances returned are sums of actual distances along those routes.

Routing points can, for example, be randomly selected from the vertices of a graph. The vertices can in turn be extracted with the `dodgr_vertices()`

function:

```
v <- dodgr_vertices (graph)
head (v)
```

id | x | y | component | n | |
---|---|---|---|---|---|

1 | 339318500 | 76.47489 | 15.34169 | 1 | 0 |

2 | 339318502 | 76.47612 | 15.34173 | 1 | 1 |

4 | 2398958028 | 76.47621 | 15.34174 | 1 | 2 |

6 | 1427116077 | 76.47628 | 15.34179 | 1 | 3 |

8 | 339318503 | 76.47641 | 15.34190 | 1 | 4 |

10 | 2398958034 | 76.47650 | 15.34199 | 1 | 5 |

For OSM data extracted with the `osmdata`

package (or, equivalently, via the `dodgr::dodgr_streetnet()`

function), each object (vertices, ways, and high-level relations between these objects) is assigned a unique identifying number. These are retained both in `osmdata`

and `dodgr`

, as the `way_id`

column in the above `graph`

, and as the `id`

column in the vertices. Random vertices may be generated in this case through selecting `id`

values:

```
from <- sample (v$id, size = 20)
to <- sample (v$id, size = 50)
d <- dodgr_dists (graph = graph, from = from, to = to)
dim (d)
#> [1] 20 50
```

Alternatively, the points may be specified as matrices of geographic coordinates:

```
from_x <- min (graph$from_lon) + runif (20) * diff (range (graph$from_lon))
from_y <- min (graph$from_lat) + runif (20) * diff (range (graph$from_lat))
to_x <- min (graph$from_lon) + runif (50) * diff (range (graph$from_lon))
to_y <- min (graph$from_lat) + runif (50) * diff (range (graph$from_lat))
d <- dodgr_dists (graph = graph, from = cbind (from_x, from_y), to = cbind (to_x, to_y))
```

In this case, the random points will be mapped on to the nearest points on the street network. This may, of course, map some points onto minor, disconnected components of the graph. This can be controlled either by reducing the graph to it’s largest connected component only:

or by explicitly using the `match_points_to_graph()`

function with the option `connected = TRUE`

:

```
from <- match_points_to_graph (v, cbind (from_x, from_y), connected = TRUE)
to <- match_points_to_graph (v, cbind (to_x, to_y), connected = TRUE)
```

This function returns an index into the result of `dodgr_vertices`

, and so points to use for routing must then be extracted as follows:

```
from <- v$id [from] # or from <- v [from, c ("x", "y")]
to <- v$id [to]
d <- dodgr_dists (graph = graph, from = from, to = to)
```

Flow aggregation refers to the procedure of routing along multiple ways according to specified densities of flow between defined origin and destination points, and aggregating flows along each edge of the network. The procedure is functionally similar to the above procedure for distances, with the addition of a matrix specifying pairwise flow densities between the input set of origin (`from`

) and destination (`to`

) points. The following example illustrates use with a random “flow matrix”:

```
flows <- array (runif (length (from) * length (to)), dim = c (length (from), length (to)))
length (from); length (to); dim (flows)
#> [1] 20
#> [1] 50
#> [1] 20 50
f <- dodgr_flows_aggregate (graph = graph, from = from, to = to, flows = flows)
```

The result is simply the input `graph`

with an additional column quantifying the aggregate flows along each edge:

`head (f)`

geom_num | edge_id | from_id | from_lon | from_lat | to_id | to_lon | to_lat | d | d_weighted | highway | way_id | component | flow |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|

1 | 1 | 339318500 | 76.47489 | 15.34169 | 339318502 | 76.47612 | 15.34173 | 0.1324422 | 0.1471580 | service | 28565950 | 1 | 0 |

1 | 2 | 339318502 | 76.47612 | 15.34173 | 339318500 | 76.47489 | 15.34169 | 0.1324422 | 0.1471580 | service | 28565950 | 1 | 0 |

1 | 3 | 339318502 | 76.47612 | 15.34173 | 2398958028 | 76.47621 | 15.34174 | 0.0088887 | 0.0098763 | service | 28565950 | 1 | 0 |

1 | 4 | 2398958028 | 76.47621 | 15.34174 | 339318502 | 76.47612 | 15.34173 | 0.0088887 | 0.0098763 | service | 28565950 | 1 | 0 |

1 | 5 | 2398958028 | 76.47621 | 15.34174 | 1427116077 | 76.47628 | 15.34179 | 0.0093265 | 0.0103628 | service | 28565950 | 1 | 0 |

1 | 6 | 1427116077 | 76.47628 | 15.34179 | 2398958028 | 76.47621 | 15.34174 | 0.0093265 | 0.0103628 | service | 28565950 | 1 | 0 |

An additional flow aggregation function can be applied in cases where only densities at origin points are known, and movement throughout a graph is dispersive:

`f <- dodgr_flows_disperse (graph = graph, from = from, dens = runif (length (from)))`

For more detail, see the main package vignette, along with a second vignette detailing benchmark timings, showing that under many circumstances, `dodgr`

performs considerably faster than equivalent routines from the `igraph`

package.