Notes for an Orbis-esque Hackathon

On July 18-20, I will join several other “interested digital humanists with an inclination for coding” gathering at the University of Vienna to consider what a generic version of ORBIS: The Stanford Geospatial Network Model of the Roman Empire (ORBIS:Rome here) could or should consist of, and to begin creating it. Several people have expressed interest in beginning a “Silk Roads Orbis” as a first example. I will participate in this effort and I’ve written up these notes to clarify my own thoughts. As the front-end developer for ORBIS:Rome v1 and a witness to v2 development, I may have some helpful insights 🤷🏻‍♂️.

The ORBIS:Rome development team discussed this concept several years ago, referring to it as “Orbis-in-a-Box.” In a 2015 article, ORBIS:Rome’s principal developer, Elijah Meeks, wrote, “the sophistication of ORBIS, which is among the most complex pieces of geospatial information visualization on the web, makes it difficult to replicate” [2]. But the idea that some useful principles and methods could be drawn from that work led Elijah to develop some generic JavaScript-based route-finding functionality, using a D3 mapping layout he had developed (d3-carto-map) and a JavaScript implementation of the Dijkstra algorithm. He published two examples, demonstrating one-to-one (“simple”) and one-to-many (“network flooding”) cases.

Simulated routes over a network vs. “modeling travels”

The idea of Orbis-in-a-Box has been one guiding thought for this meeting, but the announcement for it is titled “Modeling Travels in History.” ORBIS:Rome was all about route simulation, whereas “modeling travels” suggests particular journey events. The time costs associated with network segments should be derived in part from historical accounts of actual travel, but in my view these are two very different undertakings.

Apart from ORBIS-like route simulation, I am keenly interested in modeling events of geographic movement (see the Linked Traces pilot application). In the conceptual model and schema I developed for that with Rainer Simon and Lex Berman, historical movement data falls in three categories: journeys, named historical routes, and flows. Journeys and flows are eventive data; historical routes are courses of travel taken routinely over time by unspecified travelers (sets of segments).

Breaking Down ORBIS:Rome for OiB

The following outlines some system functions and related components.

Functionality

The three principle functional categories in ORBIS:Rome are

Route finding: one-to-one least cost path
Given two places (a.k.a. sites, nodes), calculate the least-cost path for a simulated journey between them across a multi-modal network of roads, paths, rivers, and maritime routes. Journey cost is the sum of weights for its network segments (edges). Segment weights are derived from distance and some combination of “friction” estimates drawn from modern topography and historical sources. Results include duration, distance, and dependent on available data, a derived monetary cost of transporting commodities or people across it. Additional parameter choices for ORBIS:Rome include: season of travel, transport mode (vehicle, animal, foot), and network modes (road, river, open sea, coastal).

Cartograms, contour maps, and regions: one-to-many paths
The k-dijkstra algorithm allows very rapid calculation of costs between one node and many others. Results – ‘fields’ of point values in a large area – can be used to create several geospatial analytic products. Cartograms substitute time or other cost for distance, and distort maps accordingly. Contour maps show bands of roughly equal values for some value associated with point locations; in this context, time or expense for travel between a given point and all others in an area can produce isochrone and isodapane maps, respectively. Given several “start” points, one can also compute regions with clustering and Voronoi algorithms.

Application Programming Interface (API)
The internal API for ORBIS:Rome has not been documented for external access, but for OiB it could be. The number of allowed parameters may determine its usability. You can see what queries ORBIS:Rome generates dependent on UI choices, using the “Network” function in browser developer tools consoles. For example, calculating a route between Constantinople and Jerusalem with various parameter values looks like this:

http://orbis.stanford.edu/new_route.php?v=foot&m=7&s=50129&t=50213 \
&tr=0&ts=0&p=0&ml=road,coastal,upstream,downstream,overseas, \
ferry,self,ferry,transferc,transferf,transfero,transferr \
&el=999,99999

It returns a complex JSON object that front-end code parses for display.

Components

Network Data

  • Places (nodes). Point data for settlements and other related network nodes (e.g. coastal promontories).
  • Segments (edges). LineString geometry for paths; in ORBIS:Rome, principal segment (path) types include road, river, ferry, open sea, coastal. These are further subtyped to differentiate associated weights, which have been pre-calculated: e.g. river segments include upstream, downstream, fastup, and fastdown subtypes. There are also logical transfer segments to allow associating costs with switches between modes (e.g. road to ferry). The main edge table has 53,539 rows, for 7740 distinct node pairs. Arriving at plausible segment weights demands intensive historical research and/or geospatial analysis.
  • Segment weights. Segment costs can vary according to season and direction (upstream/uphill vs. downstream/downhill). We may wish to factor elevation change into weights. For ORBIS:Rome, historical sources indicating travel times between places were relatively sparse compared to what may be available for later periods.
  • Segment restrictions. In ORBIS:Rome, a number of segments are “restricted” – omitted from some calculations – in various circumstances, including “no go” in certain months, or not feasible for some transport modes.

Some data considerations

Temporality. Networks change over time, sometimes significantly. So do vehicle types and capabilities. Travel across the broadly defined terrestrial and maritime Silk Roads occurred over roughly 1500 years. The ORBIS:Rome network “broadly reflects conditions around 200 CE.” How should a generic OiB allow implementations to account for network change over time? With temporal attributes of nodes and edges? With discrete snapshot data subsets? Either?

Granularity. Data density for particular regions within a given OiB scope may be more or less sparse. This variation can affect results significantly. What should be best practice recommendations for this?

Resolution. Vector MultiLineString data can be created at various resolutions. The distance of a given path between two places will reflect how articulated the relevant segment linestrings are. Great variety in resolution can result in misleading results.

Paths versus simple edges. Edge data for ORBIS:Rome is “geographically embedded.” That is, it reflects approximate courses of roads, rivers, etc. It would also be possible to use simple edges that are “partially embedded,” with geometry fixed only at start and end, and to assign weights reflecting distance (among other things), but not representing actual paths. Should this be an option?

Software package design

How will users implement Orbis-in-a-Box for their particular study region and period? What “user stories” or “use scenarios” should drive development?

Ideally (?) a user could a) clone an OiB repository, b) install a relatively few dependencies on their system, c) make some edits to a configuration file for project-specific settings, including data source, d) run a local HTTP server on a specified port, and e) navigate to a GUI with a number of features that allow them to query and visualize their dataset. Once the application looks right and is giving proper results, it could be deployed to a public web server.

What are the features for a v1 of OiB? for a v2?

  • TBD

 


 

[1] Short for ORBIS: The Stanford Geospatial Network Model of the Roman Empire (http://orbis.stanford.edu)
[2] Cf. “Creating an Application like ORBIS” in https://onlinelibrary.wiley.com/doi/full/10.1002/bult.2015.1720410206

Orbis-in-a-Box

In recent days, a conversation has been renewed about the prospects for an “Orbis-in-a-Box” platform (OIB) for simulating historical movement across multi-modal transport networks. The idea holds great interest for me, and this post is a hasty “my two cents.”

Context

ORBIS: The Geospatial Network Model of the Roman Empire (ORBIS:Rome) was initially launched in May 2012 after something less than one year of intense development. A major upgrade to the site was completed in 2015 by its lead developer, Elijah Meeks. At initial launch, the number of visitors to the site wildly exceeded the project teams’ expectations, and six years later there are still on average 8-9,000 distinct user sessions per month. Although there are peaks and lulls in traffic, that number remains remarkably consistent. Walter Scheidel was the project’s principal investigator, and several of his students made substantial contributions. My own role was developing the front end for Version 1 and serving as something like a ‘geographer sounding board.’ Complete credits are found in the “About” section of the site.

In the Fall of 2012, Walter Scheidel hosted a mini-conference at Stanford to discuss the results of the project and speculate about next steps to be taken, if any. At that meeting, and in many settings since, all of the project team have heard inquiries along the lines of, “how can I make an Orbis of _____?” I think it’s fair to say Walter, Elijah and I all thought that was a commendable goal, but there wasn’t the time or funding context for it.

Not that I didn’t try: in 2015 I led the authoring of an NSF proposal titled “The Orbis Initiative,”” submitted by Stanford Libraries, that would have produced, effectively, a generic OIB platform. Although the proposal received high marks from reviewers, it fell short and we didn’t re-submit. In 2016-17, I turned my attention to another aspect of historical geographic movement: modeling journeys, named routes, and flows — specific and aggregated events _occurring on_ the roads, rivers, and sea lanes of multi-modal transport networks. Lex Berman, Rainer Simon and I developed a temporal extension to GeoJSON (GeoJSON-T, now GeoJSON-LDT) and I built a pilot web app, Linked Places to test it out against several kinds of data.

I still firmly believe an OIB platform would be used by many (?) historical scholars and be a valuable contribution. Apparently others feel the same way. As Maxim Romanov has recently suggested, maybe we can collectively take some steps in that direction, absent (for the moment) a big funding source.

The Use Scenario, aka User Story

We don’t have to make one up — the al-ᚎurayyā Project, led by Masoumeh Seydi (U Leipzig) and Maxim (U Vienna) could readily become a first user of OIB. But to state it in more generic terms:

A team of scholars, in the course of researching a particular region, period and themes, has developed a set of historical network data, and wishes to simulate movement along that network to better understand related events and historical processes of the study area. The data consists of named places (nodes) and route segments (edges, typically unnamed). Segments have been assigned costs associated with traversing them by various modes (e.g. vehicle types), possibly with seasonal variations. The costs are best estimates drawn from primary and secondary sources.

The team arranges their data in the format specified by the new OIB platform, downloads the OIB software from GitHub, and stands up an instance in their local development environment. They fill in several parameters in a configuration file specific to their project, including project title and data path, fire up a local web server, and navigate to a new graphical interface to their network. After making numerous adjustments to configuration parameters, and possibly some customizations to code, they deploy their OIB instance to a cloud server, route a domain name to it, and tweet out an invitation for people to use it.

Simple, no?

From Here to There

Some big obvious questions arise from this scenario, including: What functionality must the web interface have, and how generic can it usefully be? That is, what questions are being asked? What data will be required, and how readily can it be developed?

Regarding data, ORBIS:Rome required modeling maritime movement across the Mediterranean, part of the North Atlantic, and the Black Sea. This was enabled by Elijah’s inspired creation of a sea mesh. Assuming this method stands up as something to replicate, other parts of the world would need other meshes. Travel across larger bodies of water were really constrained by seasonal trade winds, as I’m learning reading “Pathfinders: A Global History of Exploration” so global meshes may be unnecessary.

Regarding functionality: in ORBIS:Rome, seasonal segment costs in terms of effort and dinarii are “baked in” — should OIB permit adjustment of these values by users (not only authors/publishers)?

How to Begin?

A few possibilities:

  • Launch an OIB Working Group of Pelagios Commons? The deadline for this years’ mini-grants is Wednesday(!)
  • Collectively decide upon a useful first phase of effort that can be realistically accomplished given time and money constraints.
  • Get that started at a hackathon, pre-work for which might include: surveys of existing ORBISs:Rome code (or not!)
    • The algorithms and data models of Orbis:Rome are not well documented (costs extra, no time!); a pseudo-code representation of them might be a useful starting step. Its PHP code could be ported to a more modern language/platform, and undoubtedly refactored. Fresh eyes by other developers would certainly lead to improvements.
    • A survey of the existing functions, followed by a group assessment of how generic they are, as well as priority v. effort ranking. The same for graphical elements and widgets.

 

 

 

 

 

 

 

 

Linked Paths

Fig. 1 - Linked Places sandbox
Fig. 1 – Linked Paths sandbox

Linked Paths is a sandbox web application for experiments in representing historical geographic movement: journeys, named routes (and route systems), and flows. The term path, (synonymous with course), refers to the spatial-temporal setting for any of these. Linked Paths displays several exemplar datasets formatted as GeoJSON-T, my proposed temporal extension to the venerable GeoJSON.

The site features and functions bear some explanation, as they’re not all immediately apparent.

Historical geographic movement data:
Journeys, Routes, and Flows

Last fall, Lex Berman, Rainer Simon and I came up with draft conceptual and logical models of historical geographic movement, which are described in some depth in blog posts here and here. Briefly, we posit three classes of movement we wish to model.

Journeys

Fig. 2 - Seven journey, flow, and route datasets
Fig. 2 – Seven journey, flow, and route datasets

Journeys are events—individual occurrences of one or more persons moving between two or more places over some period of time. Journeys are often typed according to purpose (pilgrimage, expedition, migration, march, Grand Tour, etc.) or mode of travel (voyage, flight). Spatial data for journeys always includes two or more places (i.e. an itinerary), normally ordered temporally. The actual paths traveled between places may be known, unknown, estimated, or ignored. Similar variation in completeness holds for temporal attributes as well: we might know the year(s) or decade(s) the journey took place, dates for some or all departures and arrivals, durations of segments, or simply sequence. Linked Paths depicts two pilgrimages from the 4th and 7th centuries, and a recent 5-month journey of my own I called “Roundabout.”

Named routes and route systems (hRoutes)

Routes are the named courses of multiple journeys known to have occurred over a period of time (notably, for trade and religious pilgrimage); they are differentiated from the physical media for those journeys (roads, rivers, etc.). That is, a route may comprise segments of multiple roads and rivers. Exemplar route data in Linked Paths are for Old World Trade Routes, Ming Dynasty Courier Routes, and the pilgrimage route described on the Vicarello Beakers.  Other well-known route systems include the Silk Road, the Pilgrimage Routes to Santiago de Compostela, the Incense Route, the Amber Routes.

Flows

Flows are aggregated data about journey events; that is, the movement of something at some magnitude over some period of time. The Incanto Trade flow example in Linked Paths aggregates data about the number of ships involved in 840 individual commercial voyages outward from Venice between the 13th and 15th centuries.

A map and data-dependent temporal visualization

figure3
Fig. 3 – Four types of temporal visualizations

Linked Paths consumes data in GeoJSON-T format and renders it on the fly to a web map and one of four kinds of temporal visualization depending on the nature of the data:

  1. a timeline of events (journeys)
  2. a timeline depicting a relevant period and its immediate context, drawn from PeriodO collections (where period is the only temporal information known)
  3. a histogram indicating the number of segments valid for a period (time-indexed trade routes)
  4. a histogram indicating magnitude of flows per period

The color for journey segments is scaled: earlier=lighter, later=darker

Linked Data

Fig. 4 - Place popup links to external gazetteer, segment search for connections
Fig. 4 – Place popup links to external gazetteer, segment search for connections

Place dialog popups include links to gazetteer APIs, including Pleiades, GeoNames, and the temporal gazetteer (TGAZ) of Harvard’s China Historical GIS.

Period timelines for Courier, Vicarello, and Bordeaux datasets are drawn dynamically from the PeriodO API, rendering the relevant period and adjacent neighbors from a collection.

Search

Query a union index of selected fields in all Place records from the 7 individual project gazetteers. Results are grouped by dataset, and leverage name variant data within Place records. For example, Dubrovnik and Ragusa are known to refer to the same place.

The “Find connections” link in place popups (Fig. 4) queries identifies segments associated with a given place from all 7 datasets.

GeoJSON-T

The GeoJSON-T format is a work-in-progress. Code and preliminary documentation is available at its GitHub repository.

Briefly, GeoJSON-T:

  • Permits adding an optional “when” object to Features in one of two locations
    • as a sibling to “geometry” in a Feature
    • as a sibling to “coordinates” in each member of a GeometryCollection
  • Leverages GeometryCollections for changing geometries over time (similarly to the HistoGraph project) and permits “properties” in GeometryCollection members
  • Will be processed by existing GeoJSON-compatible software, simply ignoring “when” objects and processing geometry and properties found in the standard places

A Case for GeoJSON-T

GeoJSON has become a popular standard format for representing geographic features in web mapping applications. It is supported by the key JavaScript libraries Leaflet, Mapbox, OpenLayers, and D3, and to some extent by desktop GIS software (QGIS, ArcMap). GitHub renders valid GeoJSON as simple maps, and web-based utility applications like geojson.io and GeoJSONLint help users create, edit and validate it.

As the name suggests, GeoJSON-T adds time to GeoJSON. Geographic features, defined broadly[1], include events we want to map and analyze (e.g. births, deaths, battles, journeys, publication). For many analyses and mapping tasks, the temporal attributes of geographic features are as important as their geometry. Furthermore, many non-eventive geographic features–settlements, polities, buildings, monuments, earthworks, archaeological finds and so on–have essential temporal attributes.

linkedplaces-screen
Figure 1 – Xuanzang’s 7c pilgrimage in Linked Places demo

It is hardly controversial that a great many natural and fictional phenomena have a relevant spatial and temporal coverage (cf. Dublin Core), or setting.[2] Shouldn’t the de facto standard for geographic feature data account for time?

It could be (and has been) argued that time can be added to a GeoJSON feature as a member of its “Properties” element, organized however one sees fit. Certainly true, and many have. At issue is whether there should be a simple accepted standard location and format for temporal information within a GeoJSON Feature. If there were, a) new software, or new versions of existing software, could parse those temporal elements and render them to timeline visualizations[3], and b) data from multiple projects could be linked and analyzed by means of period assertions or computed “temporal topology” (e.g. Allen’s interval algebra[4]: equals, overlaps, before, after, starts, finishes, meets).

How would this work?

The first conceptual step is a simple matter: wherever a “geometry” element is required in GeoJSON, an optional adjacent (sibling) “when” element is allowed. Existing software supporting GeoJSON would simply ignore these and function normally. New software, or new versions of existing software, would parse them and offer visualization and analytic functionality. In the Linked Pasts demo prototype, I render “when” elements to a timeline using the venerable if outdated Simile Timeline library, linked to the “geometry” elements rendered traditionally to a Leaflet map (Figure 1).

Developing a standard

It’s well and good to say, “wherever there’s a ‘geometry’ allow an optional ‘when’,” but the devil is in the details. What is required and allowed in that “when?” I’m not experienced at ringleading standards development; what I’ve done for starters is create a provisional standard for discussion, then made the aforementioned demo app as proof-of-concept. The “when” looks like this:

"when": {
  "timespans": [["-323-01-01 ","","","-101-12-31",
     "Hellenistic period"]],
  "duration": "?",
  "periods": [{
    "name": "Hellenistic Period",
    "period_uri": " http://n2t.net/ark:/99152/p0mn2ndq6bv"
 }],
  "follows": "<feature or geometry id>",
}

An explanation of each element:

When

Optional. A sibling of “geometry” in a Feature (a), or of “coordinates” in a member of a GeometryCollection (b)

(a)

{
"type": "FeatureCollection",
"features": [
 {
 "type": "Feature",
 "id": "",
 "properties": {},
 "geometry": {},
 "when": {}
 }
]
}

(b)

"geometry": {
 "type": "GeometryCollection",
 "geometries": [
   {
    "type": "LineString",
    "coordinates": [[93.867,40.35],[108.9423,34.26]],
    "when": {}
   }
 ]
}

Timespans

Required. An array of one or more 5-part arrays, the positions of which are Start, Latest Start, Earliest End, End, Label. Of these, only Start is required. The first 4 positions accept any ISO-8601 temporal expression, with the ‘accepted convention’ of a minus sign for BCE years. Label is an optional short text string that would (presumably) appear alongside a visual representation of the timespan.

Duration

Required. A null value indicates the phenomena occurred (or was valid) throughout the feature’s Timespans. If it occurred only for some part of it/them, enter an integer followed by a single letter code for the increment (d=days; m=months; y=years) or a “?” for an unknown duration. For example, a weeklong festival at some unknown time within a year timespan would be indicated as “duration”:”7d”; a birth as (perhaps) “duration”:”1d”

I anticipate timeline visualizations will be find this distinction essential; a birth for example does not occur throughout a year.

Periods

Optional. An array of Period objects defined in an external period gazetteer (e.g. PeriodO, each with a “name” and “period_uri” that can be dereferenced dynamically.

Follows

Optional. If the Feature or GeometryCollection member is in a meaningful sequence, enter the internal identifier of the element it follows here. Software indicating order or directionality visually or in lists will make use of these values if present.

Next Steps

I’d like to move the development of GeoJSON-T into a more formal process, but perhaps that should follow more informal discussion. A more detailed explanation of GeoJSON-T and its implementation for data about historical movement — journeys, flows and named routes — appears in the Topotime GitHub repo.

Please let me know your views on how we might proceed, by twitter (@kgeographer) or as a GitHub issue or preferably both. In the meantime, I will continue converting exemplar datasets into the provisional format outlined here, and developing software and utility scripts to manage, display, and even analyze it.

[1] A GIScience-ish definition for geographic features: “Phenomena on or near the earth surface for which location and other spatial attributes are integral for understanding and analysis.”

[2] An ontology design pattern for Setting was proposed in Grossner, K., Janowicz, K. and Keßler, C. (2016). The Place of Linked Data for Historical Gazetteers. In R. Mostern, H. Southall, and M.L. Berman (Eds.). Placing Names: Enriching and Integrating Gazetteers. Bloomington: Indiana University Press.

[3] As I have begun demonstrating with Linked Places work (http://topotime.org/linkedplaces)

[4] https://en.wikipedia.org/wiki/Allen’s_interval_algebra

Linking Linked Places

lp-banner
Screenshot from demo web map/timeline app

NOTE: This project has been subsequently renamed, now titled “Linked Paths.”

A little context

The tag line for the Pelagios Commons web site is, “Linking the Places of our Past,” and that project is indeed facilitating the linking of historical place attestations published in digital gazetteers. From my perspective (and many others’) , the initiative is going great, bravo!

There are other ways that places are or have been linked and I’ve been plugging away at a facilitating representations and analysis of those connections in a couple of ways. The first was The Orbis Initiative, an ambitious and sadly unsuccessful NSF grant proposal to develop software and systems for extracting information about roads, rivers, canals, railways, and footpaths–and the places connected by them–from the million or so high-quality scans of historical maps. That data is of the physical channels (a.k.a. media, ways) used for the movement of people and goods across the earth surface. Although the grant wasn’t awarded, I’m happy to say a manageably-sized portion of the work it described was taken up by the CIDR team at Stanford University Libraries, just as I was leaving (amicably) in September. I expect fantastic results!

Since that work on geographic networks is in such good hands, I’ve begun to focus on the other side of that coin, the movement over such networks: individual journeys, named historical routes and route systems, and flows. I’m calling the project Linked Places (GitHub repository), and a mini-grant from Pelagios Commons has helped to jump-start it. It’s part of my larger DH/GIScience research frame, Topotime, which has a broad goal of joining Place and Period in data stores and software for historical research and education.

Enough context, this blog post is intended to describe the status of the Linked Places work products.

Linked Places Phase Two Status

I’ve described the goals of Linked Places and its early results in two blog posts on Pelagios Commons earlier this year (July and October respectively). In Phase One, Lex Berman and Rainer Simon joined me in clarifying a conceptual model for what we wanted to do, refining a provisional spec for a GeoJSON temporal extension (GeoJSON-T), then adapting the GeoJSON-T format for representing route data. We agreed on the term route for an overarching class encompassing journeys, flows, and historical routes and route systems (hRoutes). The conceptual model was then “expressed” in the GeoJSON-T form (Figures 1 and 2).

In Phase Two, I holed up in beautiful Ascoli Piceno to a) convert five exemplar data sets to a generic CSV form, b) write Python scripts to transform that CSV to GeoJSON-T and to populate an ElasticSearch index, and c) build a demo web map application that consumes GeoJSON-T data and puts it through some paces. That app, which mashes up Leaflet/Mapbox map with a Simile Timeline, is not designed as such–it’s been thrown together for discussion about what real apps might be interesting. I will be presenting this now completed Phase 2 work at the Linked Pasts workshop in Madrid, 15-16 December 2016.

Linked Places Work Products

GeoJSON-T

GeoJSON-T simply adds an optional “when” element to native GeoJSON. That “when” is typically placed at the same level as a “geometry” element (the “where”), which can appear in a couple of places: as a top-level attribute of a Feature (Figure 1), or, in the case of routes data, as a member of a GeometryCollection (Figure 2). The GeoJSON GeometryCollection is a relatively infrequently used construct, but is essential to how we represent journeys and hRoutes. There is some more explanation on the Github wiki.

Figure 1. Generic GeoJSON-T Feature, with “when” member in a FeatureCollection (simplified gazetteer record)

geojson-t_syntax02

Figure 2. Route feature (featureType Journey); segments are geometries in GeometryCollection

geojson-t_syntax01

Scripts

I’ve made the assumption that a large proportion of historical route data will be developed in spreadsheet or CSV format natively. Attributes and coding terminology will of course be distinct for every project that develops data. There’s nothing to stop anyone from creating GeoJSON-T route data from scratch, by whatever means, but if a researcher can rearrange their CSV data in a standard form, it can be converted and ingested automatically for use in the existing demo or future GeoJSON-T compatible applications.

At present, one would need to create two CSV files, one for places, and one for route segments. The core fields that are required, but in cases can have null values, are:

PLACES:

[‘collection’, ‘place_id’, ‘toponym’, ‘gazetteer_uri’, ‘gazetteer_label’, ‘lng’, ‘lat’]

ROUTE SEGMENTS:

[‘collection’, ‘route_id’, ‘segment_id’, ‘source’, ‘target’, ‘label’, ‘geometry’, ‘timespan’, ‘duration’, ‘follows’]

Following these, data files can have any number of further attributes/columns, which will appear in various ways within any given app. A complete accounting of these fields, and further details about data preparation and the Python conversion/ingestion scripts (csvToGeoJSON-T.py and elastic.py) will appear on the GitHub repository wiki soon. If you are anxious to play with this stuff before then (or afterwards), get in touch with me directly.

Linked Places Demo App

The GeoJSON-T format and its implementation for route data allows for some interesting display and analysis possibilities. The app so far only explores the visualization side. I’m planning to follow up this work with at least two “real” applications that do more: one for data exploration and discovery across a large distributed corpus/repository, and a second that allows manipulation and analysis of a given network of geographic movement (e.g. commodity flows like Incanto Trade, or route systems like the Ming Courier Routes). I’ve identified a few other exemplar datasets and welcome inquiries for collaboration.

Features

Load one or more datasets; view linked gazetteer records for places; events or optionally “fuzzy” periods rendered on timeline

Linked Places screenshot 01

Search for Places, identify all members of its “conflation_of” set; and all route segments associated with it, from multiple datasets

Linked Places screenshot 02

Rudimentary timeline visualization (Simile Timeline); timeline and map features are linked

Linked Places screenshot 03

Load places and segments for flows and hRoute systems (nodes and links/edges) into D3 force-directed graph; download GeoJSON-T

screen capture, D3 graph visualization

View linked Place gazetteer data (Pleiades, TGAZ, Geonames)

lp-features_06

View linked Period gazetteer data (from Perio.do)

lp-features_05

Summary

The results of this work: a conceptual model for routes (journeys, flows and historical routes/route systems), the GeoJSON-T extension, its implementation for route data and reliance on CSV input, and last but not least the map/timeline mashup, are all provisional and experimental. The models have been tweaked (‘refined’) as requirements come to light, and that should continue for at least a little while longer. I welcome comments — here, on twitter (@kgeographer), via the project GitHub repo, or by email: karl[dot]geog[at]gmail[dot]com.

 

 

 

The Orbis Initiative: A Pelagios for Networks? [Take 2]

NOTE: This a “refresh” of the earlier post of the same title, edited to reflect some new terminology (indicated by red) and replace the conceptual model figure.

data-triptych

A small sampling of historical network datasets

I believe there would be widespread interest in a global collaboratively developed system, organized similarly to Pelagios, aimed at creating and linking data records for attested historical journeys (e.g. itineraries, and flows of people, commodities, information, correspondence) and ways (roads, rivers, canals, sea currents). In this provisional semantics, a journey is evidence of some person(s) or thing(s) moving from here to there (then there, etc.), at a known, approximate or estimated time and/or in a particular sequence, as attested in some source. A way is the physical medium for journeys.

Both journeys and ways can be represented as two or more places and one or more segments (nodes and edges in network parlance). Place nodes are necessarily “geographically embedded” and typically represented by feature centroids. The geometry of ways between nodes for various types of journeys may be known, estimated, or in the case of some flow data, of no concern.

Historical gazetteers in the Pelagios ecosystem represent only named places. Most are point-like features (e.g. settlements, sites); increasingly, polygonal features are included as well (e.g. regions, administrative areas). But what of historical movement—journeys between named places along ways? The simple data models used for the Pelagios interchange format and for most gazetteers do not accommodate journeys and ways.

Not surprisingly, the first early geographic document geo-parsed in Pelagios’ Recogito tool describes an itinerary: “Itinerarium Burdigalense: the Itinerarium Burdigalense (or Bordeaux Itinerary) […] a travel document that records a Pilgrim route between the cities of Bordeaux and Jerusalem.” Although we know each attested place was part of a traveled route, by virtue of its association with a text having “itinerarium” in its title, those relationships are not recorded formally in gazetteers, and therefore not readily discoverable and analyzable as routes and components of networks.

The Orbis Initiative

In February, 2015 I submitted a proposal to the National Science Foundation for a fairly large grant ($1.6m over 3 years) to develop the Orbis Initiative. Although reviews were quite positive, it was not funded. The project was designed to facilitate the creation, archiving, discovery, linking, and analysis of historical geospatial network data for “everywhere and every when” [1-page summary]. The project name was borrowed from an interactive scholarly web application I helped build, originally published by Stanford University Libraries in 2012 and significantly upgraded in 2014, ORBIS: The Stanford Geospatial Network Model of the Roman Empire (hereafter, ORBIS: Rome).

Whereas ORBIS: Rome is a model of travel and transport for a particular region and period aimed at answering the research questions of one Classical scholar, Walter Scheidel–and built by Scheidel and Elijah Meeks–the Orbis Initiative would instead be a system for creating, storing, and linking geospatial network data spanning potentially all places and periods—a distributed repository along with a set of relatively simple interactive web-based tools to facilitate its use. The design and proposed development of the Orbis Initiative is a response to researchers who have expressed a desire to build ORBIS: Rome-like applications for their own areas and periods of study. Importantly, the intent is not to expand the ORBIS: Rome network transport model, but to provide a generic data infrastructure and tools to facilitate development of other models and modeling approaches.

I remain convinced this would be a worthwhile undertaking and subsequently, two opportunities have emerged to begin some of the work described in the grant proposal, at a much smaller initial scale; I’ll discuss one of them here.

A Community of Interest?

Writing the Orbis Initiative grant entailed recruiting collaborators with varied exemplar datasets being developed for ongoing research. Several of those projects are concerned with processes of cultural diffusion and commercial activity—separately and in concert—in East and Central Asia and between Asia and Europe over extended periods. Their aggregated temporal extent is 7th century BCE to 16th century AD. Researchers in those groups, and now a few others, have indicated an immediate pragmatic interest in exposing and linking their data for common benefit. Meetings to discuss next steps have begun.

Something Like Pelagios

An Orbis Initiative would replicate several aspects of the Pelagios Project, which has gained terrific momentum in developing online resources, methods and software for linking historical gazetteers. I believe Pelagios’ success is due in large part to its “ground-up” nature—the fact it answers some immediate requirements of a distinct community of interest for the Classical Mediterranean. Its spatial and temporal extents and software tool development scope are growing organically, expanding upon smallish proofs-of-concept that people find useful. Tools developed so far facilitate data creation (Recogito) and data discovery (Peripleo). The Pelagios approach offers a stark contrast with some “build it and they will come” data repository projects attempted in recent years.

In the same vein, a pragmatic start to an Orbis Initiative could be seeded by meeting the requirements of the above-mentioned community of interest to link (and in a sense gather) their historical geospatial network data: connections by road, river, canal, and sea route between the places attested in Pelagios-compatible gazetteers.

A Conceptual Model

So, networks of journeys and flows are different in kind from place locations as commonly understood, and as such require a different, somewhat more elaborate data model. Furthermore, while all spatial data may include temporal attributes, some network data—itineraries for example—are inherently temporal; in fact they are events. Flows are essentially aggregated movement events.

In my experience a helpful first step in data modeling is to create a conceptual model of the entities and relations of what is being represented—an ontology design pattern if you will. Typically a collaborative undertaking, the resulting visualization provides a basis for the data schemas to follow, be they relational or graph. I’ve taken a first second stab at such a model, borrowing a bit from a recently published trajectory pattern (Hu, et al 2013); input is invited and essential.

journey-way-concepts_construction

Data Format

The GeoJSON data format is in common use and provides a good starting point for a standardized representation of trajectories and paths. Granting that much data is initially gathered in spreadsheets, by and large if it is to be mapped or analyzed spatially, it makes its way into human-readable GeoJSON or the binary shapefile. GeoJSON represents geographic Features in a FeatureCollection, and spatial attributes are represented in a required Geometry object, but time is not accounted for natively. Although temporal attributes of a Feature can be recorded as one or more of a Feature’s Properties there is no norm or best practice for this and mapping software that consumes GeoJSON does not typically look for or make use of temporal attributes.

This can potentially be remedied by an extension to GeoJSON, such as the Topotime format I’ve been developing. Topotime data is valid GeoJSON, but it includes a new, optional When object, and leverages the sparingly used GeometryCollection object that is found in the GeoJSON specification.

One of my tasks at hand—which I welcome collaborative input on—is testing the efficacy of the Topotime model for the several types of historical geospatial network data found in the wild. I’ve begun posting some sample data to the Orbis Initiative GitHub repo.

The Basics of Topotime

Topotime was initially conceived as a means for representing historical temporal data that is vague and otherwise uncertain, for visualization in browser timeline software and for the analysis of probabilistic relationships between and amongst events and periods.

The goals of the Topotime project have recently both broadened and simplified considerably—it is now aimed at extending the GeoJSON format to account for time (including some of the difficult historical cases), without breaking GeoJSON. That is, Topotime data would be recognized as GeoJSON by any software that supports GeoJSON. The work-in-progress described on the Topotime repo is now a little behind samples I’m pushing to the Orbis Initiative repo (kgeographer/oi).

I am working through varied and more complex data examples. When a suitable data format is settled, I’ll write some basic software that accesses Topotime’s unique attributes to browse and search several exemplar datasets.

The following is a snippet to give a sense of it:

topotime-snippet

Next Steps

This effort does not have institutional support at this time, but if enough people feel it’s worth pursuing, we should seek it. UPDATE: A small group of colleagues and I will be submitting grant proposals soon.

As mentioned above, a small group representing several active research projects focused on Asian maritime and land routes will be meeting soon to assess whether Topotime or something like it is appropriate for a “Pelagios for Networks.” We will make our results public for discussion, through this blog and the Pelagios Linked Past SIG forum. More later…and comments are welcome.

The Orbis Initiative: a Pelagios for Networks?

data-triptych

A small sampling of historical network datasets

I believe there would be widespread interest in a global collaboratively developed system, organized similarly to Pelagios, aimed at creating and linking data records for attested historical trajectories (e.g. itineraries, routes, commercial flows, correspondence) and paths (roads, rivers, canals, sea currents). In this provisional semantics, a trajectory is evidence of some person(s) or thing(s) moving from here to there (then there, etc.), at a known, approximate or estimated time and/or in a particular sequence, as attested in some source. A path is the physical medium for trajectories.

Both paths and trajectories can be represented as two or more places and one or more segments (nodes and edges in network parlance). Place nodes are necessarily “geographically embedded” and typically represented by feature centroids. The geometry of paths between nodes for various types of trajectories may be known, estimated, or in the case of some flow data, of no concern.

Historical gazetteers in the Pelagios ecosystem represent only named places. Most are point-like features (e.g. settlements, sites); increasingly, polygonal features are included as well (e.g. regions, administrative areas). But what of historical movement—trajectories between named places along paths? The simple data models used for the Pelagios interchange format and for most gazetteers do not accommodate trajectories and paths.

Not surprisingly, the first early geographic document geo-parsed in Pelagios’ Recogito tool describes an itinerary: “Itinerarium Burdigalense: the Itinerarium Burdigalense (or Bordeaux Itinerary) […] a travel document that records a Pilgrim route between the cities of Bordeaux and Jerusalem.” Although we know each attested place was part of a traveled route, by virtue of its association with a text having “itinerarium” in its title, those relationships are not recorded formally in gazetteers, and therefore not readily discoverable and analyzable as routes and components of networks.

The Orbis Initiative

In February, 2015 I submitted a proposal to the National Science Foundation for a fairly large grant ($1.6m over 3 years) to develop the Orbis Initiative. Although reviews were quite positive, it was not funded. The project was designed to facilitate the creation, archiving, discovery, linking, and analysis of historical geospatial network data for “everywhere and every when.” The project name was borrowed from an interactive scholarly web application I helped build, originally published by Stanford University Libraries in 2012 and significantly upgraded in 2014, ORBIS: The Stanford Geospatial Network Model of the Roman Empire (hereafter, ORBIS: Rome).

Whereas ORBIS: Rome is an authored model of travel and transport for a particular region and period aimed at answering the research questions of one Classical scholar (Walter Scheidel), the Orbis Initiative is instead a system for creating, storing, and linking geospatial network data spanning potentially all places and periods—a distributed repository along with a set of relatively simple interactive web-based tools to facilitate its use. The design and proposed development of the Orbis Initiative is a response to researchers who have expressed a desire to build ORBIS: Rome-like applications for their own areas and periods of study. Importantly, the intent is not to expand the ORBIS: Rome network transport model, but to provide a generic data infrastructure and tools to facilitate development of other models and modeling approaches.

I remain convinced this would be a worthwhile undertaking and subsequently, two opportunities have emerged to begin some of the work described in the grant proposal, at a much smaller initial scale; I’ll discuss one of them here.

A Community of Interest?

Writing the Orbis Initiative grant entailed recruiting collaborators with varied exemplar datasets being developed for ongoing research. Several of those projects are concerned with processes of cultural diffusion and commercial activity—separately and in concert—in East and Central Asia and between Asia and Europe over extended periods. Their aggregated temporal extent is 7th century BCE to 16th century AD. Researchers in those groups, and now a few others, have indicated an immediate pragmatic interest in exposing and linking their data for common benefit. Meetings to discuss next steps have begun.

Something Like Pelagios

An Orbis Initiative would replicate several aspects of the Pelagios Project, which has gained terrific momentum in developing online resources, methods and software for linking historical gazetteers. I believe Pelagios’ success is due in large part to its “ground-up” nature—the fact it answers some immediate requirements of a distinct community of interest for the Classical Mediterranean. Its spatial and temporal extents and software tool development scope are growing organically, expanding upon smallish proofs-of-concept that people find useful. Tools developed so far facilitate data creation (Recogito) and data discovery (Peripleo). The Pelagios approach offers a stark contrast with some “build it and they will come” data repository projects attempted in recent years.

In the same vein, a pragmatic start to an Orbis Initiative could be seeded by meeting the requirements of the above-mentioned community of interest to link (and in a sense gather) their historical geospatial network data: connections by road, river, canal, and sea route between the places attested in Pelagios-compatible gazetteers.

A Conceptual Model

So, networks of trajectories are different in kind from place locations as commonly understood, and as such require a different, somewhat more elaborate data model. Furthermore, while all spatial data may include temporal attributes, some network data—itineraries for example—are inherently temporal; in fact they are events. Flows are essentially aggregated movement events.

In my experience a helpful first step in data modeling is to create a conceptual model of the entities and relations of what is being represented—an ontology design pattern if you will. Typically a collaborative undertaking, the resulting visualization provides a basis for the data schemas to follow, be they relational or graph. I’ve taken a first stab at such a model, using a recently published trajectory pattern (Hu, et al 2013) as a point of departure; input is invited and essential.

path-trajectory-concepts_v2

Data Format

The GeoJSON data format is in common use and provides a good starting point for a standardized representation of trajectories and paths. Granting that much data is initially gathered in spreadsheets, by and large if it is to be mapped or analyzed spatially, it makes its way into human-readable GeoJSON or the binary shapefile. GeoJSON represents geographic Features in a FeatureCollection, and spatial attributes are represented in a required Geometry object, but time is not accounted for natively. Although temporal attributes of a Feature can be recorded as one or more of a Feature’s Properties there is no norm or best practice for this and mapping software that consumes GeoJSON does not typically look for or make use of temporal attributes.

This can potentially be remedied by an extension to GeoJSON, such as the Topotime format I’ve been developing. Topotime data is valid GeoJSON, but it includes a new, optional When object, and leverages the sparingly used GeometryCollection object that is found in the GeoJSON specification.

One of my tasks at hand—which I welcome collaborative input on—is testing the efficacy of the Topotime model for the several types of historical geospatial network data found in the wild.

The Basics of Topotime

Topotime was initially conceived as a means for representing historical temporal data that is vague and otherwise uncertain, for visualization in browser timeline software and for the analysis of probabilistic relationships between and amongst events and periods.

The goals of the Topotime project have recently both broadened and simplified considerably—now essentially aimed at extending the GeoJSON format to account for time (including some of the difficult historical cases), without breaking GeoJSON. That is, Topotime data would be recognized as GeoJSON by any software that supports GeoJSON.

Work-in-progress is described, with a few toy examples, at https://github.com/kgeographer/topotime. I’m planning to work through varied and more complex data examples soon, then write some basic software that accesses Topotime’s unique attributes.

The following is a snippet to give a sense of it:

topotime_smal-example

Next Steps

This effort does not have institutional support at this time, but if enough people feel it’s worth pursuing, we should seek it.

As mentioned earlier, a small group representing several active research projects focused on Asian maritime and land routes will be meeting soon to assess whether Topotime or something like it is appropriate for a “Pelagios for Networks.” We will make our results public for discussion, possibly through the Pelagios SIG infrastructure. More later… and comments are welcome.