Of twitter and mastodon

I joined Twitter almost ten years ago (@kgeographer) in order to connect with members of my professional communities, Digital Humanities and Geographic Information Science. It has been occasionally annoying but extremely beneficial, and over 1200 folks have chosen to follow my feed.

In 2015, when a certain politician became a presumptive nominee for US President, another community became equally important to me, the #NeverTrump crowd, because I view that utterly disgraceful individual as a mortal threat to the “noble experiment” that is the United States of America. Over the years I’ve been a frequent critic of the US and many of its past policies and action, but I learned that I am much more patriotic than I thought.

Twitter has become essential to me, and I am hard pressed to imagine doing without it. Because a growing number of my professional colleagues and friends are moving to Mastodon servers from Twitter, I have established the account, @kgeographer@mast.o. In the coming weeks I plan to gradually follow the hundreds of colleagues and friends I follow who have made this switch, or like me, are transitioning to it. Let’s see how this goes.

But I will remain on Twitter, and reserve my politically-oriented tweets for that forum. Any Mastodon followers I gain will see the stuff I post related to #DigitalGeoHumanities, #Place, #WorldHistoricalGazetteer, travel photos, etc. They will be spared those responses of outrage to what’s happening in my (current) country.

GeoJSON-T: adding time to GeoJSON

GeoJSON-T is a proposed extension to the GeoJSON data standard (spec) widely used “for encoding a variety of geographic data structures,” principally in web maps. GeoJSON-T is described in the README file of its GitHub repository, but there is not yet a versioned specification. Now would be a good time to make refinements and write one. To join that discussion, see Issue #3 in the GeoJSON-T GitHub repository.

GeoJSON-T was initially developed in 2017 [1], motivated particularly by requirements of historical researchers. Many geographic features of interest have a temporal scope, and a standard way of describing geographic and temporal extents together would benefit creators of web applications meant to view and analyze such data [2]. A pilot map+time application, now called Linked Paths, was developed at that time to test and demonstrate its implementation.

The temporal attributes of “typical” geographic features such as countries, regions, provinces, cities, buildings, and monuments are important for understanding change over time. But we also routinely map and analyze spatial patterns in many other kinds of historical phenomena: events, and “event-like” features including conflict, births and deaths, finds of archaeological objects, as well as many kinds of geographic movement, including journeys and flows.

Just use “properties”?

The temporal attributes of a GeoJSON Feature can be represented as among its “properties,” and web map applications are routinely built to parse temporal data there and use it to display dynamical change. For example:

{ "type": "Feature",
  "geometry": {"type":"Point", "coordinates": [0.0,0.0]},
  "properties": {
    "name": "Null Island",
    "start": "1492",
    "end": "1500",
    "prop1": "some value", ...
  }
}

But there are several shortcomings inherent in this approach, including:

1) A given feature might change location or shape over time; this can currently only be handled by creating multiple Features for the same thing/place, each with different time properties. Better to put all of a place’s geometries in a GeometryCollection and let each have its own “when.”

2) Likewise, other properties of a feature might change over time, for example name and type (e.g. villa to town to city). These too can be temporally scoped with their own “when.”

3) The labels used for temporal property keys (‘start’ vs. ‘begin’ vs. ‘year’ etc.) vary per dataset, so data from one project can’t be linked to or combined with data from another unless there is prior coordination. Without a standard vocabulary and structure, there can’t be a generic, open-source “time map” library for rendering maps with various temporal visualizations [2].

4) There are no conventions for expressing the various types of uncertainty found in historical data, including vagueness, imprecision, unknown values.

5) Movement features such as journeys, lifepaths, and flows can contain multiple nodes and edges, each with distinctive associated timestamps or intervals. In cases, sequence is known but dates are not.

Just add “when”

What GeoJSON-T does is specify vocabulary and structure for a “when” object, which can be added in several locations outside of the “properties” element required by GeoJSON (foreign member in the vocabulary of the GeoJSON spec):

1) At the level of a Feature, applying to all geometries within it (simple example here; for all options see the draft specification):

{ "type": "Feature", 
  "geometry": {"type":"Point", "coordinates": [0.0,0.0]}, 
  "properties": { "name": "Null Island", "prop1": "some value", ... },
  "when" {
    "timespans":[{"start": "1492", "end": "1500", }]
  }
 }

2) At the level of an individual geometry within a GeometryCollection, e.g.

{ "type": "Feature", 
  "geometry": {
    "type":"GeometryCollection", 
    "geometries": [
      {"type":"Point", 
       "coordinates": [0.0,0.0],
       "when": {"timespans":[{"start": "1492", "end": "1495", }]}
      },
      {"type":"Point", 
       "coordinates": [1.0,2.0], 
       "when": {"timespans":[{"start": "1498", "end": "1500", }]} 
      }]          
    },
    "properties": { "name": "Null Island", "prop1": "some value" }
}

3) At the level of a FeatureCollection, applying to all of its Features

A “when” object can include optionally one or more timespans, and/or one or more named periods from a time period gazetteer. These and other optional properties are described in the repo README. Some proposed changes are listed in Issue #3 in the repo, which is open for comment.

Existing GeoJSON-compatible apps and libraries will simply ignore “when” objects, wherever they might be. Support for “when” would be included in any new software and libraries supporting the GeoJSON-T extension.

Current and planned adoption

In recent months, GeoJSON-T has received increasing attention, a couple of test implementations, and plans for more. These have highlighted some issues, and the draft format needs a closer examination for it to become a versioned specification of a standard format.

Linked Places format
The GeoJSON-T patterns for time were adopted for Linked Places format (LPF), developed in 2018 for contributions to the World Historical Gazetteer and Pelagios project’s Recogito platforms. LPF is not only valid GeoJSON, but valid JSON-LD, a syntax of RDF, and it standardizes the representation of several additional dimensions of Place beyond geometry and when: names, types, relations, descriptions, depictions, depictions, and links to corresponding external place records.

Linked Places format is therefore a specialized superset of GeoJSON-T, intended for historical gazetteer platforms specifically.

WebMaps-T
This early-stage project led by the British Library [3] seeks to develop a standard, customizable library for map+time visualizations in web applications, and would render data formatted as GeoJSON-T.

IIIF Maps Community Group
The International Image Interoperability Framework (IIIF) now has a working group dedicated to “defining best practice in associating geographical information with IIIF materials.” The Maps group is “explor(ing) creating a JSON schema [meeting] the needs of the IIIF community” specifically for map images, and is at least informally evaluating GeoJSON-T toward that end.


Notes

[1] In 2016, supported by a small Resource Development Grant from the Pelagios project, I met with colleagues Lex Berman and Rainer Simon to outline a GeoJSON format extension that could handle features representing historical geographic movement. GeoJSON-T and the Linked Paths pilot were products of that work. That project, titled “Linking Linked Places,” is documented in this blog post.

[2] The Timemap.js library developed by Nick Rabinowitz in 2008 joined the SIMILE Timeline library with several web mapping libraries of that era. It has fallen into disuse due to outdated dependency issues.

[3] An initiating hackathon for WebMaps-T took place in London in 2019, hosted at the British Library by Gethin Rees and Adi Keinan-Schoonbaert, with funding from a Pelagios Working Group small grant. Work to refine its early products continues, albeit slowed by lack of further funding. WebMaps-T is intended to have a modular structure permitting several types of temporal visualizations, including multiple timeline styles and histograms.

linkedplaces.draw (part 2)

In Part 1 of this posting, I introduce the linkedplaces.draw (LPDraw) tool and briefly explain its motivation. This is a pilot software project, and has been put together fairly quickly to support historical mapathons we are undertaking at the University of Pittsburgh’s World History Center (WHC). These will produce historical place data that can be contributed to the World Historical Gazetteer platform. Figure 1 illustrates most functionality of the tool itself. Figure 2 below shows some screens in the “Dashboard” section of the app.

We are testing this alpha draft version now, and it will undoubtedly undergo changes and additions in weeks to come. In time, I plan to package the tool so that anyone hand with Django and PostgreSQL can stand up their own instance. But all this is simply a demonstrator for a “proper” crowd-sourcing app, which to realize will require a) design, b) funding and c) developer(s). Please get in touch with ideas about that.

Fig. 1 – linkedplaces.draw (LPDraw) main screen

One piece of a puzzle

The WHC workflow that the LPDraw tool will play a part in goes like this:

  1. Identify one or several maps, data from which will be useful for some research domain—typically of a particular region or region/period combination. These could be historical maps downloadable from an online resource like David Rumsey Map Collection, or via Old Maps Online. Or they could be paper maps, possibly from a print historical atlas.
  2. Georeference each map image, resulting in a GeoTIFF file for each.
  3. Create an xyz map tileset for each GeoTIFF. We are using MapTiler software for this. Note tileset details like the minimum and maximum zoom and bounds.
  4. Upload the tileset(s) to a web-accessible location
  5. Register as a user in the LPDraw app
  6. Create a project record and map records for each individual map in the project
  7. Identify, for the project as a whole or for each map, the feature types to be digitized, and a timespan representing temporal coverage of the project and/or each individual map.
    • Feature types will be presented as options in the interface, and timespans will be automatically added to each digitized feature (if that option is checked in the map record)
    • Feature types are not by default restricted to a particular geometry; points, lines, and polygons are all options
  8. Assign other registered users as collaborators able to create and edit features for the project.
  9. In the “draw” screen, choose a project and map from dropdown menus. The map loads and digitize features as desired.
    • Use the opacity control to view the underlying map, as an assist to proper placement
    • Enter a name or names in the popup, according to the LP-TSV format convention, e.g. separating variants by a ‘;’
  10. Download options available so far are Linked Places format (GeoJSON-compatible) and TSV.

Fig. 2 – linkedplaces.draw “Dashbord” screens (preliminary)

linkedplaces.draw (part 1)

In Part 2, I describe the functionality in linkedplaces.draw to date. At some point, a collaboratively authored functional spec for a ‘proper’ crowd-sourcing tool will come together on GitHub.

I have been building a pilot tool for digitizing features from georeferenced historical maps (and maps of history such as found in historical atlases), tentatively named linkedplaces.draw. Its immediate intended use is for what we at the University of Pittsburgh World History Center have been calling “historical mapathons.” We are planning to facilitate, stage and encourage such events, in which individuals and small groups can virtually gather to harvest temporally scoped features from old maps, for their immediate use and for preparing contributions to World Historical Gazetteer (WHG) [1]. We have begun testing it by digitizing features for settlements, archaeological sites, regions, dynasties, and ethnic groups from the highly regarded “An Historical Atlas of Central Asia” (Bregel 2005).

linkedplaces.draw app (June 2020 alpha)
Fig. 1 – linkedplaces.draw app (June 2020 alpha)

Maps v. Texts as Gazetteer Sources

Old maps represent a largely untapped storehouse of information about geographies of the past. Digitizing features from maps that can be geo-referenced (a.k.a. geo-rectified, or warped) without too much distortion provides an estimated geometry that is invaluable. The most immediate use scenario driving development of WHG is mapping place references in historical texts and tabular datasets. But to make a digital map you need geometry, however approximate. Discovering suitable coordinates for lists of place names drawn from historical texts is far and away the most difficult and time-consuming task in this scenario. If your source texts are of a region and period it makes sense to build a gazetteer of that region and period using maps made at the time–and then to contribute the data to WHG so no one has to ever do it again! Certainly, the feasibility of deriving useful geometry in this way is reduced the further back in history one goes.

Maps made prior to the 18th century–beautiful, instructive, and useful as they may be–normally don’t have sufficient geodetic accuracy for this purpose (Fig. 2b). Obtaining geometry for place references made in earlier periods will require a different approach, e.g. capturing topological relations like containment. Also, one can also digitize features from maps of history, as we are doing with the Bregel atlas mentioned above.

a) S. America 1812, b) "Aphrica" (& Arabia) 1600
Fig. 2 – a) S. America 1812, b) “Aphrica” (& Arabia) 1600

The GB1900 Proof-of-Concept

Although hundreds of thousands of old maps have been scanned and made available for viewing and download by map libraries around the world [2], the names and estimated coordinates of features on them have not been transcribed in any quantity. The recent project GB1900 provided a successful proof-of-concept (cf. “The GB1900 project–from the horse’s mouth“). Crowd-sourcing map transcription software was custom built for the public at large to work with a single Ordnance Survey map, and over a period of months millions of names and geometries were digitized. Preliminary results can be viewed at http://geo.nls.uk/maps/gb1900/; analytical products are sure to follow. Unfortunately the software used for GB1900 cannot be re-purposed for other maps and general use.

Unfortunately, although the code for the GB1900 crowd-sourcing software is available, it is not re-usable; at least I and others have been unable to revive it. Hence, linkedplaces.draw, which will hopefully serve as a demonstrator that could be used in finding funds to build a sturdy open-source platform that can be used by groups of any size–including “the crowd”–to do this valuable work.

Related Tasks and Software

Several existing free software packages and web sites provide users the capability to perform these tasks related to old map feature digitization, in some combination: a) georeference a map image and save the rectified result as a GeoTIFF, b) create web map tilesets from a GeoTIFF file, and c) display single images or tilesets as overlays on modern web base maps. Typically viewers for these provide an opacity control, allowing comparison of old and modern geography. This is all great, but what is missing is the capability to draw or trace features from the rectified images.

Coda (for the moment)

For years, computer scientists and others have explored the possibility of automated feature extraction. There are a few such efforts under way right now. I wish them godspeed, and do believe machine methods will ultimately be able to extract a list of names from some relatively recent map series having especially clear cartography, but also that they will never handle maps like Figure 2a, and will never successfully extract the estimated geometry of even point features. Yes I know, never say never. In the meantime…


[1] The World Historical Gazetteer project is building a web platform for developing, publishing, and aggregating data about places drawn from historical sources by members of the broad community of interest studying the past within and across numerous disciplines. A Version 1 launch is planned for June/July 2020. See the About pages and Tutorials as http://dev.whgazetteer.org for details.

[2] The extraordinary David Rumsey Map Collection has many extended features and direct hi-res downloads; Old Maps Online is a “gateway to historical maps in libraries around the world.”

Asked Twitter historians about ancient epidemiology and…

…got a ton of valuable information, references and links recently. A colleague I hadn’t seen in a while reached out to me on behalf of a student of his at Lafayette College, due to my involvement with the ORBIS project. The student is “interested in modelling the spread of disease in the ancient Roman world” and he “suggested that she investigate building cost surfaces with ORBIS and see if that could be used as an input in some sort of epidemiological model.” He asked me:

“…are you aware of any scholars…using ORBIS while investigating epidemiology in the ancient world?”

I tweeted an inquiry and within minutes, a stream of answers came back. I’ve pulled together most of them in quick and dirty fashion here, and made this post for quick reference by my friend and his student…and whomever. Also to note the generosity of people in this community I’ve had the enormous pleasure to be somewhat connected with.

From Monica Green (https://asu.academia.edu/MonicaHGreen) @monicaMedHist

The field of historical epidemiology is in the middle of a paradigm shift. Retrieval of pathogen #aDNA (for the ancient world, that includes thus far #YersiniaPestis & #HBV & 2 malaria parasites) is the new gold standard for identifying historical disease confidently. Beyond that

… we can make plausible inferences from paleopathological indicators (e.g., bone lesions, intestinal parasite eggs) about the types of disease that were present. A major project is underway at Harvard to research the ancient Mediterranean:  At Princeton (archaeoscience.org)

… a new database has been launched that focuses on plague in late antiquity:  I can’t say anything specific about the mapping components of either project yet. Deciding when & where specific diseases are found in specific populations is next challenge. (climatechangeandhistory.princeton.edu/justinianic-pl…)

You’ll need to communicate w/ the folks at Princeton about what their plans are. The Keller et al. study shared all their data on sites where they checked for #YersiniaPestis but couldn’t retrieve any. (Doesn’t mean there was no plague, only that DNA capture wasn’t successful.)

For an overview of what’s going on in #aDNA pathogen research, see this: annualreviews.org/doi/abs/10.114…, and this: nature.com/articles/s4157…. I’d also recommend your student dig deep into the Supplemental Data of Keller et al. 2019: pnas.org/content/116/25…. It’s a trove of information.

I would add that if your student has not yet discovered the network analysis work of @Byzanzforscher (e.g., academia.edu/40238623/Small…), it would be very good to look at. Again, applying this work to disease history is only as good as our disease data, which is mostly inferential.

From Johannes Preiser-Kapeller (http://oeaw.academia.edu/JohannesPreiserKapeller) @Byzanzforscher

I dealt with some aspects of using the Orbis Network for epidemic Diffusion here: . Of interest is also the work of Marek Vlach, who presented a paper on modelling the Diffusion of the Antonine Plague in Brno last year: (arxiv.org/abs/1809.08937) (arup-cas.academia.edu/MarekVlach/Pap…)

He gave a paper last year at the historical Network conference in Brno on the Antonine Plague, but as far as I see, it is still unpublished. But it was really impressive, maybe one can contact him directly, if interested.

reply from Monica Green

Thanks very much for this, Johannes. Obviously, there will be need to talk about these issues sooner rather than later, as dialogue grows btw palaeogenetics & the other historical fields. We’re trying to launch a #BlackDeath Digital Archive (#BDDA) which aims to move beyond

… the centuries of plague histories that keep parroting the same data & go back to original sources. The project just won the @CARMENmedieval Prize. In the meantime, on the issue of data quality there’s this must-read from our contributor @joris_roosen: (wwwnc.cdc.gov/eid/article/24…)

BTW, Johannes, just to confirm: this () is the study by Vlach you were referring to, right? It only mentions the #AntoninePlague once, but yes, I can see where the possibilities of this might be leading. (academia.edu/38083198/Demog…)

From Ryan Horne (https://rmhorne.org/) @RyanMHorne

Not so much in the ancient world, but we did have a @WHCPitt intern do some geospatial modeling of Ebola. A good primer for the networked aspect of epidemiology: (cs.cornell.edu/home/kleinber/…)

Using some @BigAncientMed code w/@Gephi is a good place to start

I have used an effective combination of @Gephi / @cytoscape with network / r/outes data from #ORBIS / @AWMC_UNC, #pgRouting, and @BigAncientMed / custom code to combine force-directed graphs w/ spatial data to study this type of thing. I can send an article I am finishing up for #Classics@ about this very thing (great timing!) that is a fairly high-level overview about social-spatial networks, and can send more code / bibliography about the component parts if you want.

 

Notes for an Orbis-esque Hackathon

On July 18-20, I will join several other “interested digital humanists with an inclination for coding” gathering at the University of Vienna to consider what a generic version of ORBIS: The Stanford Geospatial Network Model of the Roman Empire (ORBIS:Rome here) could or should consist of, and to begin creating it. Several people have expressed interest in beginning a “Silk Roads Orbis” as a first example. I will participate in this effort and I’ve written up these notes to clarify my own thoughts. As the front-end developer for ORBIS:Rome v1 and a witness to v2 development, I may have some helpful insights 🤷🏻‍♂️.

The ORBIS:Rome development team discussed this concept several years ago, referring to it as “Orbis-in-a-Box.” In a 2015 article, ORBIS:Rome’s principal developer, Elijah Meeks, wrote, “the sophistication of ORBIS, which is among the most complex pieces of geospatial information visualization on the web, makes it difficult to replicate” [2]. But the idea that some useful principles and methods could be drawn from that work led Elijah to develop some generic JavaScript-based route-finding functionality, using a D3 mapping layout he had developed (d3-carto-map) and a JavaScript implementation of the Dijkstra algorithm. He published two examples, demonstrating one-to-one (“simple”) and one-to-many (“network flooding”) cases.

Simulated routes over a network vs. “modeling travels”

The idea of Orbis-in-a-Box has been one guiding thought for this meeting, but the announcement for it is titled “Modeling Travels in History.” ORBIS:Rome was all about route simulation, whereas “modeling travels” suggests particular journey events. The time costs associated with network segments should be derived in part from historical accounts of actual travel, but in my view these are two very different undertakings.

Apart from ORBIS-like route simulation, I am keenly interested in modeling events of geographic movement (see the Linked Traces pilot application). In the conceptual model and schema I developed for that with Rainer Simon and Lex Berman, historical movement data falls in three categories: journeys, named historical routes, and flows. Journeys and flows are eventive data; historical routes are courses of travel taken routinely over time by unspecified travelers (sets of segments).

Breaking Down ORBIS:Rome for OiB

The following outlines some system functions and related components.

Functionality

The three principle functional categories in ORBIS:Rome are

Route finding: one-to-one least cost path
Given two places (a.k.a. sites, nodes), calculate the least-cost path for a simulated journey between them across a multi-modal network of roads, paths, rivers, and maritime routes. Journey cost is the sum of weights for its network segments (edges). Segment weights are derived from distance and some combination of “friction” estimates drawn from modern topography and historical sources. Results include duration, distance, and dependent on available data, a derived monetary cost of transporting commodities or people across it. Additional parameter choices for ORBIS:Rome include: season of travel, transport mode (vehicle, animal, foot), and network modes (road, river, open sea, coastal).

Cartograms, contour maps, and regions: one-to-many paths
The k-dijkstra algorithm allows very rapid calculation of costs between one node and many others. Results – ‘fields’ of point values in a large area – can be used to create several geospatial analytic products. Cartograms substitute time or other cost for distance, and distort maps accordingly. Contour maps show bands of roughly equal values for some value associated with point locations; in this context, time or expense for travel between a given point and all others in an area can produce isochrone and isodapane maps, respectively. Given several “start” points, one can also compute regions with clustering and Voronoi algorithms.

Application Programming Interface (API)
The internal API for ORBIS:Rome has not been documented for external access, but for OiB it could be. The number of allowed parameters may determine its usability. You can see what queries ORBIS:Rome generates dependent on UI choices, using the “Network” function in browser developer tools consoles. For example, calculating a route between Constantinople and Jerusalem with various parameter values looks like this:

http://orbis.stanford.edu/new_route.php?v=foot&m=7&s=50129&t=50213 \
&tr=0&ts=0&p=0&ml=road,coastal,upstream,downstream,overseas, \
ferry,self,ferry,transferc,transferf,transfero,transferr \
&el=999,99999

It returns a complex JSON object that front-end code parses for display.

Components

Network Data

  • Places (nodes). Point data for settlements and other related network nodes (e.g. coastal promontories).
  • Segments (edges). LineString geometry for paths; in ORBIS:Rome, principal segment (path) types include road, river, ferry, open sea, coastal. These are further subtyped to differentiate associated weights, which have been pre-calculated: e.g. river segments include upstream, downstream, fastup, and fastdown subtypes. There are also logical transfer segments to allow associating costs with switches between modes (e.g. road to ferry). The main edge table has 53,539 rows, for 7740 distinct node pairs. Arriving at plausible segment weights demands intensive historical research and/or geospatial analysis.
  • Segment weights. Segment costs can vary according to season and direction (upstream/uphill vs. downstream/downhill). We may wish to factor elevation change into weights. For ORBIS:Rome, historical sources indicating travel times between places were relatively sparse compared to what may be available for later periods.
  • Segment restrictions. In ORBIS:Rome, a number of segments are “restricted” – omitted from some calculations – in various circumstances, including “no go” in certain months, or not feasible for some transport modes.

Some data considerations

Temporality. Networks change over time, sometimes significantly. So do vehicle types and capabilities. Travel across the broadly defined terrestrial and maritime Silk Roads occurred over roughly 1500 years. The ORBIS:Rome network “broadly reflects conditions around 200 CE.” How should a generic OiB allow implementations to account for network change over time? With temporal attributes of nodes and edges? With discrete snapshot data subsets? Either?

Granularity. Data density for particular regions within a given OiB scope may be more or less sparse. This variation can affect results significantly. What should be best practice recommendations for this?

Resolution. Vector MultiLineString data can be created at various resolutions. The distance of a given path between two places will reflect how articulated the relevant segment linestrings are. Great variety in resolution can result in misleading results.

Paths versus simple edges. Edge data for ORBIS:Rome is “geographically embedded.” That is, it reflects approximate courses of roads, rivers, etc. It would also be possible to use simple edges that are “partially embedded,” with geometry fixed only at start and end, and to assign weights reflecting distance (among other things), but not representing actual paths. Should this be an option?

Software package design

How will users implement Orbis-in-a-Box for their particular study region and period? What “user stories” or “use scenarios” should drive development?

Ideally (?) a user could a) clone an OiB repository, b) install a relatively few dependencies on their system, c) make some edits to a configuration file for project-specific settings, including data source, d) run a local HTTP server on a specified port, and e) navigate to a GUI with a number of features that allow them to query and visualize their dataset. Once the application looks right and is giving proper results, it could be deployed to a public web server.

What are the features for a v1 of OiB? for a v2?

  • TBD

 


 

[1] Short for ORBIS: The Stanford Geospatial Network Model of the Roman Empire (http://orbis.stanford.edu)
[2] Cf. “Creating an Application like ORBIS” in https://onlinelibrary.wiley.com/doi/full/10.1002/bult.2015.1720410206

Orbis-in-a-Box

In recent days, a conversation has been renewed about the prospects for an “Orbis-in-a-Box” platform (OIB) for simulating historical movement across multi-modal transport networks. The idea holds great interest for me, and this post is a hasty “my two cents.”

Context

ORBIS: The Geospatial Network Model of the Roman Empire (ORBIS:Rome) was initially launched in May 2012 after something less than one year of intense development. A major upgrade to the site was completed in 2015 by its lead developer, Elijah Meeks. At initial launch, the number of visitors to the site wildly exceeded the project teams’ expectations, and six years later there are still on average 8-9,000 distinct user sessions per month. Although there are peaks and lulls in traffic, that number remains remarkably consistent. Walter Scheidel was the project’s principal investigator, and several of his students made substantial contributions. My own role was developing the front end for Version 1 and serving as something like a ‘geographer sounding board.’ Complete credits are found in the “About” section of the site.

In the Fall of 2012, Walter Scheidel hosted a mini-conference at Stanford to discuss the results of the project and speculate about next steps to be taken, if any. At that meeting, and in many settings since, all of the project team have heard inquiries along the lines of, “how can I make an Orbis of _____?” I think it’s fair to say Walter, Elijah and I all thought that was a commendable goal, but there wasn’t the time or funding context for it.

Not that I didn’t try: in 2015 I led the authoring of an NSF proposal titled “The Orbis Initiative,”” submitted by Stanford Libraries, that would have produced, effectively, a generic OIB platform. Although the proposal received high marks from reviewers, it fell short and we didn’t re-submit. In 2016-17, I turned my attention to another aspect of historical geographic movement: modeling journeys, named routes, and flows — specific and aggregated events _occurring on_ the roads, rivers, and sea lanes of multi-modal transport networks. Lex Berman, Rainer Simon and I developed a temporal extension to GeoJSON (GeoJSON-T, now GeoJSON-LDT) and I built a pilot web app, Linked Places to test it out against several kinds of data.

I still firmly believe an OIB platform would be used by many (?) historical scholars and be a valuable contribution. Apparently others feel the same way. As Maxim Romanov has recently suggested, maybe we can collectively take some steps in that direction, absent (for the moment) a big funding source.

The Use Scenario, aka User Story

We don’t have to make one up — the al-Ṯurayyā Project, led by Masoumeh Seydi (U Leipzig) and Maxim (U Vienna) could readily become a first user of OIB. But to state it in more generic terms:

A team of scholars, in the course of researching a particular region, period and themes, has developed a set of historical network data, and wishes to simulate movement along that network to better understand related events and historical processes of the study area. The data consists of named places (nodes) and route segments (edges, typically unnamed). Segments have been assigned costs associated with traversing them by various modes (e.g. vehicle types), possibly with seasonal variations. The costs are best estimates drawn from primary and secondary sources.

The team arranges their data in the format specified by the new OIB platform, downloads the OIB software from GitHub, and stands up an instance in their local development environment. They fill in several parameters in a configuration file specific to their project, including project title and data path, fire up a local web server, and navigate to a new graphical interface to their network. After making numerous adjustments to configuration parameters, and possibly some customizations to code, they deploy their OIB instance to a cloud server, route a domain name to it, and tweet out an invitation for people to use it.

Simple, no?

From Here to There

Some big obvious questions arise from this scenario, including: What functionality must the web interface have, and how generic can it usefully be? That is, what questions are being asked? What data will be required, and how readily can it be developed?

Regarding data, ORBIS:Rome required modeling maritime movement across the Mediterranean, part of the North Atlantic, and the Black Sea. This was enabled by Elijah’s inspired creation of a sea mesh. Assuming this method stands up as something to replicate, other parts of the world would need other meshes. Travel across larger bodies of water were really constrained by seasonal trade winds, as I’m learning reading “Pathfinders: A Global History of Exploration” so global meshes may be unnecessary.

Regarding functionality: in ORBIS:Rome, seasonal segment costs in terms of effort and dinarii are “baked in” — should OIB permit adjustment of these values by users (not only authors/publishers)?

How to Begin?

A few possibilities:

  • Launch an OIB Working Group of Pelagios Commons? The deadline for this years’ mini-grants is Wednesday(!)
  • Collectively decide upon a useful first phase of effort that can be realistically accomplished given time and money constraints.
  • Get that started at a hackathon, pre-work for which might include: surveys of existing ORBISs:Rome code (or not!)
    • The algorithms and data models of Orbis:Rome are not well documented (costs extra, no time!); a pseudo-code representation of them might be a useful starting step. Its PHP code could be ported to a more modern language/platform, and undoubtedly refactored. Fresh eyes by other developers would certainly lead to improvements.
    • A survey of the existing functions, followed by a group assessment of how generic they are, as well as priority v. effort ranking. The same for graphical elements and widgets.

 

 

 

 

 

 

 

 

User Stories for the World-Historical Gazetteer

My work designing and developing the World-Historical Gazetteer (WHGaz [1]) is under way. This NEH‑funded 3‑year project is based at the University of Pittsburgh World History Center and directed by Professor Ruth Mostern. David Ruvolo is Project Manager, and Ryan Horne will contribute in his new post-doc role at the Center. I’m very pleased to serve as Technical Director, working from Denver.

The project actually comprises more than a gazetteer.  An official description of the project’s goals and components is forthcoming; in the meantime, its deliverables include:

A gazetteer, defined in the proposal as “a broad-but-shallow work of common reference consisting of some tens of thousands of place names referring to places that have existed throughout the world during the last five hundred years.”

Interfaces to the gazetteer, including

  • a public API;
  • a public web site providing graphical means for data discovery, download, and visualization, and serving as a communication venue for the community of interest;
  • a web-based administrative interface for adding and editing data

An “ecosystem”, described as “a growing and open ended collection of affiliated spatially aware world historical projects,” seeded by two pilot studies concerning the Atlantic World and the Asian Maritime World

Models, formats, vocabularies. The conceptual and logical data models, data formats (e.g. GeoJSON-T), and controlled vocabularies (e.g. place types) developed for the project will be aligned with solid existing resources and published alongside data

Documentation. Software developed for the project will be maintained in a public GitHub repository. Additional documentation will be produced in the form of research reports published on the website and scholarly articles appearing in relevant journals.

What, for whom, and why

One of our first steps is developing “user stories” for the project, an element of the Agile development method that is a simple and effective way of capturing high-level requirements from users’ perspectives. I polled developers of some of our cognate projects (Pelagios, PeriodO, Pleiades) and added ideas stemming from their experiences to my own in creating the following preliminary list. If you can think of others that aren’t accounted for, please add them in a comment or email me. In my own streamlined version of Agile (Agile-lite?), user stories lead more or less directly to schematic representations of features supporting functions, then to coding. Evidence of streamlining is found in the detail already in place under items 18 and 19 (thanks, Ryan Shaw).

The next appearance of the features suggested by these stories will be in ordered lists of GitHub “issues” – coming soon.

Users
user: anyone of the following
researcher: academic or journalistic
editor: of WHGaz data
developer: anyone building software interfaces to WHGaz services
hobbyist: amateur historians, genealogists, general public
teacher: at any level

User stories

  1. As a {user}, I want to {view results of searches for place names in a map+time-visualization application} in order to {discover WHGaz contents}
  2. As a {user}, I want to {discover resources related to a search result} in order to {learn more about the place and available scholarship about it}
  3. As a {user}, I want to {learn about the WHGaz project: its motivations, participants, methods, work products, timeline} in order to {determine its quality and relevance to my purposes; see where my tax dollars are going}
  4. As a {user}, I want to {suggest additions to the WHGaz} in order to {make the resource more complete/useful}
  5. As a {researcher} I want to {publish my specialist gazetteer data for ingest by centralized index(es)} in order to {make my data discoverable by place and optionally, by period}
  6. As a {researcher} I want to {search a geographic area (i.e. space rather than place)} in order to {find sources relating to places in this area}
  7. As a {researcher} I want to {find historical source documents, incl. by keyword search} in order to {identify which places they refer to}
  8. As a {researcher} I want to {compare historical sources} in order to {see how they might be related to another through common references to place}
  9. As a {researcher} I want to {compare the geographical relationships (and names) represented in ancient texts with historical and modern representations}
  10. As a {researcher/developer} I want {different options for re-using data (from data downloads, to APIs and embeddable widgets} in order to {enrich my own work/online publication}
  11. As a {researcher/developer} I want to {locate individual or multiple authority record identifiers for toponyms tagged in source material} in order to {find related research data}
  12. As a {researcher/developer}, I want to {retrieve WHGaz data in any quantity (filtered set, complete dump) according to multiple search parameters, using web form(s) or a RESTful query} in order to {re-use the data for any purposes, according to WHGaz license terms}
  13. As a {researcher/developer}, I want to {learn how to construct API queries} in order to {incorporate WHGaz data in my analyses/software}
  14. As a {researcher/hobbyist} I want to {embed a WHGaz map in a wordpress blog}
  15. As a {researcher/hobbyist} I want to {display places and movements (!) presented in specific texts} in order to {understand the spatial-temporal context of a text}
  16. As a {teacher} I want {quick lookup tools linked to authoritative information} in order to {use the data in teaching}
  17. As an {editor}, I want to {add and edit place records} in order to {make the WHGaz resource more complete/accurate/useful}
  18. As a {developer} I want to {query WHGaz programmatically, returning GeoJSON/GeoJSON-T features in JSON lines format, each having 1) a “properties” object including (a) an identifier, (b) one preferred label and one or more alternate labels (w/optional language tags), (c) name and URLs of the gazetteers to which it belongs; 2) a geometry object; and 3) a “when” object describing temporal extent} in order to {use external gazetteer data in the my (PeriodO) client interface}
    • Allow querying by:
      • providing text to be matched against feature labels
      • specifying a rectangular bounding box (option to include all intersecting features or only those contained within it
  1. As a {developer} I want to {query WHGaz as above via a GUI, with option to filter results by gazetteer} in order to {browse and/or download records}
    • Entering text into the text input should display a list of matching feature labels, in sections titled by gazetteer name
    • Hovering results list should display/highlight feature on map; zoom to feature (?)
    • Selecting a particular result from the list should raise popup with info about it
    • The map display needs to support custom tile sets including the Ancient World Mapping Center’s.

[1] WHGaz is an unofficial short form used in this post; official naming will undoubtedly ensue

Linked Paths

Fig. 1 - Linked Places sandbox
Fig. 1 – Linked Paths sandbox

Linked Paths is a sandbox web application for experiments in representing historical geographic movement: journeys, named routes (and route systems), and flows. The term path, (synonymous with course), refers to the spatial-temporal setting for any of these. Linked Paths displays several exemplar datasets formatted as GeoJSON-T, my proposed temporal extension to the venerable GeoJSON.

The site features and functions bear some explanation, as they’re not all immediately apparent.

Historical geographic movement data:
Journeys, Routes, and Flows

Last fall, Lex Berman, Rainer Simon and I came up with draft conceptual and logical models of historical geographic movement, which are described in some depth in blog posts here and here. Briefly, we posit three classes of movement we wish to model.

Journeys

Fig. 2 - Seven journey, flow, and route datasets
Fig. 2 – Seven journey, flow, and route datasets

Journeys are events—individual occurrences of one or more persons moving between two or more places over some period of time. Journeys are often typed according to purpose (pilgrimage, expedition, migration, march, Grand Tour, etc.) or mode of travel (voyage, flight). Spatial data for journeys always includes two or more places (i.e. an itinerary), normally ordered temporally. The actual paths traveled between places may be known, unknown, estimated, or ignored. Similar variation in completeness holds for temporal attributes as well: we might know the year(s) or decade(s) the journey took place, dates for some or all departures and arrivals, durations of segments, or simply sequence. Linked Paths depicts two pilgrimages from the 4th and 7th centuries, and a recent 5-month journey of my own I called “Roundabout.”

Named routes and route systems (hRoutes)

Routes are the named courses of multiple journeys known to have occurred over a period of time (notably, for trade and religious pilgrimage); they are differentiated from the physical media for those journeys (roads, rivers, etc.). That is, a route may comprise segments of multiple roads and rivers. Exemplar route data in Linked Paths are for Old World Trade Routes, Ming Dynasty Courier Routes, and the pilgrimage route described on the Vicarello Beakers.  Other well-known route systems include the Silk Road, the Pilgrimage Routes to Santiago de Compostela, the Incense Route, the Amber Routes.

Flows

Flows are aggregated data about journey events; that is, the movement of something at some magnitude over some period of time. The Incanto Trade flow example in Linked Paths aggregates data about the number of ships involved in 840 individual commercial voyages outward from Venice between the 13th and 15th centuries.

A map and data-dependent temporal visualization

figure3
Fig. 3 – Four types of temporal visualizations

Linked Paths consumes data in GeoJSON-T format and renders it on the fly to a web map and one of four kinds of temporal visualization depending on the nature of the data:

  1. a timeline of events (journeys)
  2. a timeline depicting a relevant period and its immediate context, drawn from PeriodO collections (where period is the only temporal information known)
  3. a histogram indicating the number of segments valid for a period (time-indexed trade routes)
  4. a histogram indicating magnitude of flows per period

The color for journey segments is scaled: earlier=lighter, later=darker

Linked Data

Fig. 4 - Place popup links to external gazetteer, segment search for connections
Fig. 4 – Place popup links to external gazetteer, segment search for connections

Place dialog popups include links to gazetteer APIs, including Pleiades, GeoNames, and the temporal gazetteer (TGAZ) of Harvard’s China Historical GIS.

Period timelines for Courier, Vicarello, and Bordeaux datasets are drawn dynamically from the PeriodO API, rendering the relevant period and adjacent neighbors from a collection.

Search

Query a union index of selected fields in all Place records from the 7 individual project gazetteers. Results are grouped by dataset, and leverage name variant data within Place records. For example, Dubrovnik and Ragusa are known to refer to the same place.

The “Find connections” link in place popups (Fig. 4) queries identifies segments associated with a given place from all 7 datasets.

GeoJSON-T

The GeoJSON-T format is a work-in-progress. Code and preliminary documentation is available at its GitHub repository.

Briefly, GeoJSON-T:

  • Permits adding an optional “when” object to Features in one of two locations
    • as a sibling to “geometry” in a Feature
    • as a sibling to “coordinates” in each member of a GeometryCollection
  • Leverages GeometryCollections for changing geometries over time (similarly to the HistoGraph project) and permits “properties” in GeometryCollection members
  • Will be processed by existing GeoJSON-compatible software, simply ignoring “when” objects and processing geometry and properties found in the standard places