How gaming helped us tile the world

Not everyone’s a gamer. But gaming influences a surprising number of Data Scientists – and the reason’s clear when you think about it. The way virtual environments model the real world; the way articles and characters interact and transform; how relationships are defined within a virtual environment.

Replace “articles and characters” with “data”, and you’ve described a fair chunk of data science. That’s why gaming was one of the themes in Tilo CDO Stefan Berkner’s recent Unstructured@Tilo talk (on YouTube here). His favourite games teach players how to write assembly code, build cities, and modularise hardware in silico (Stationeers, Cities Skylines, and Factorio) – not exactly the sort of games popular on Twitch. But their principles – rules to follow, limiting parameters, pitfalls and problems – can be applied to a range of problems in the real world. One is geomatching – overlaying a geographic map onto unstructured data to make more sense of it.

If a dataset contains J Smith, John Smith, and J R Smith, you can’t assume they’re a single person: the name’s too common to make that judgement. But if all three identities spend 9AM-6PM in the same small section of a city, and 8PM-7AM in another small section of that same city? It’s likely all these J’s are the same John Smith, and he commutes between home and work.

So geomatching can recognise multiple customer identities are in reality a single person, by data matching and record linkage – known as “entity resolution”. This approach is the idea behind TiloRes. But as with all approaches to geomatching, there are nuances. In this article we’ll explore how it works.

Covering ground: more than one way to mark a map

First rule of geomatching is choosing how you divide up a landscape like a city map, a nation, or even the whole Earth. Complicated by little things like the Earth not being flat. (There are even those who’d argue with this.)

Why radius is rong, er, wrong

For many, the simplest approach is to draw a radius around a feature – a city block, postal district, or national border. The trouble is this doesn’t have unlimited scalability; in fact it scales poorly.

If you want to know how many records are within a kilometre of Potsdamer Platz, a radius will tell you. But what about the relationships within that area, such as our multiple Mr Smiths? And you can’t cover a plane with circles; you’d need to overlap, redraw, and rescale constantly. So that’s radius dead and buried.

Latitude and longitude: not an all-rounder

A better approach is a latitude and longitude grid. It’s familiar to everyone, and lets the Earth be mapped onto a flat plane. But there’s trouble here too, as a glance at the most famous world map demonstrates.

In the Mercator projection – the one most of us know – the frozen emptiness of Greenland looms larger than South America; in reality it’s just one-eighth its size. While Brazil doesn’t look that huge compared to its giant northern neighbours, but in fact its land area is greater than the USA’s Lower 48.

The reason: lines of latitude meet at the poles, meaning the grid squares carved out by the great circles get narrower the further north and south you go. So to fit on a sheet of paper, projections like Mercator stretch them to absurd sizes. (Alaska isn’t as big as you thought, either.) These narrower grid cells at the poles create computational complexity, with each “square” needing what amounts to a custom lookup, with much slower response times.

So conventional grid squares aren’t the answer. Your algorithm may rule that multiple identities sharing a grid square in equatorial Nairobi are the same entity, but the same won’t be the case in the frozen north of Finland, where the grid squares look like stacked pizza boxes seen from the side. Which means your findings won’t be consistent.

A hex on it: the Goldberg variation

What we need is a shape that can consistently “tile” the Earth’s surface – or any part of it you want. And there’s a solution: Goldberg Polyhedra. They’re hexagonal in shape, so they fit together nicely, even on the surface of a 3D sphere like Earth. It’s an approach with much real-world applicability; Uber’s H3 mapping model uses it. But even this approach has problems.

Source: https://en.wikipedia.org/wiki/User:Tomruen

First, each hexagonal “tile” is harder to address with any simple co-ordinate system, since the tiles don’t line up horizontally or vertically. Choose one tile at random: what do you call the hexagonal below, but slightly to the right? Hexagon H-B-R? And what if you shift your view to another hexagon: does your target hexagon’s name change? Some of these features are useful – and there are mathematical methods for doing it – but addressability in this model is non-trivial.

For the free man: Dyson Sphere Program

What we need is a best-of-all-worlds model – and there’s one in a game that involves many worlds: Dyson Sphere Program. It tasks players with building a vast energy-collecting sphere in space. Its tiling model still uses equally-spaced lines of latitude, creating “belts” around the sphere, but divides each belt into smaller and smaller numbers of tiles the closer they get to the poles: 200 at the “equator”, but just four at the extremes of north and south.

This means each tile can be very close to a square, which means easy addressability: the simplicity of a normal grid. While the tiles don’t line up precisely, it’s not too complicated to solve this problem. (Far simpler than with Goldberg polyhedra.) And because each square is the same size, even at the poles, rules derived from any dataset using it will have applicability worldwide: no fudge factors for different sizes and shapes of tile.

Success at scale

Finally, the Dyson game’s tiling is subdivisible. Each square is made up of smaller squares, typically a 5 x 5 grid, but it can be of any number (and hence any level of detail). This creates a useful logic for geomatching – letting you resolve an entity in whatever level of zoom you require, whether it’s a city-state or a single building.

That’s why TiloRes “tiles the world” – our world, not an exoplanet! – with Dyson Sphere-style squares. It makes the maths simpler, and therefore it’s much less computationally expensive to scale. In fact, TiloRes has unlimited scalability.

Use Cases for TiloRes

Of course, TiloRes uses this tiling for geomatching, not building orbital energy grids in distant galaxies. (Yet.) But even in the down-to-Earth world of data science, this enables countless use cases for marketers – because it makes more sense of your data. Yes, there are a million multiple-instance Mr Smith problems to resolve – and that, in itself, is incredibly useful to almost any company of scale. But geomatching has applications wherever it’s useful to match, locate, and track an identified entity over time.

Resolving different identities into a single entity

Entity resolution can pin down Mr Smith. But it can go further: attach his other identities from different websites and organisations, and combine them into a single customer profile. From Big Data comes Big Insights.

Within constraints of national laws and privacy legislation, marketers can build a deeper and more informative understanding of each individual on their database. Do people who buy Product A also like golf, or travel by plane regularly, or buy cryptocurrencies? By producing special editions of products, or producing content that intersects with a prospect’s other interests, marketers can appeal to new demographic and psychographic groups, even define entirely new customer segments that expand their markets. Geomatching is a driver of profitable growth.

Communicate the reliability of data

Geomatching can also add what much data lacks: nuance. You’ve used data matching to link six records into a single entity – but with what certainty? TiloRes isn’t all-or-nothing; it can be configured to the customer's needs.

Banks and other financial institutions may need a very high score, or degree of certainty. As should governments. But consumer goods marketers? A summer festival? Less so. And of course it’s a living database, with low costs of computation: as new data comes in a match can be broken and remade without a sharp increase in resource needs.

Detect suspicious transactions in real time

Again with banks, entity resolution can reduce actual fraud and question out-of-pattern transactions that suggest it. The technology to do it has been around for years. But TiloRes makes it easier thanks to the low computational requirement. Any data matching and record linking in TiloRes happens in under 150ms, and it can work on datasets of any size, thanks to fundamental features like the simple grid model and scalable squares to zoom into.

Of course, it can also work the other way. Merchants lose large amounts of money not to fraud, but to “declines” – where a bank wasn’t certain the transaction was suspicious but refused to complete it anyway. A surer picture of who the customer is, where he is, and what he’s doing can raise the level of confidence and enable an unusual but legitimate transaction to go through smoothly.

Maintain quality even with quantity

As your masses of data grow, the quality of each individual record tends to decay; that’s the nature of big datasets. (Especially when the data is unstructured and doesn’t lend itself to easy administration.) But because the TiloRes model scales without pain, it can work on larger and larger datasets without losing quality or taking more resources.

In fact, the tendency with TiloRes is for each record to increase in quality and usefulness, because with extra data being linked to it, the depth and detail of that record grows. In brief, your Big Dataset is doing what it should – becoming a more valuable business asset as it grows.

Stay in sync with the law

In Europe, GDPR is making its presence felt in a big way. Even giant Google’s analytics application has been banned in several countries, with more to follow. And with privacy laws evolving to include the “right to be forgotten”, even dormant data can lead to fines and sanctions if undeleted.

This is another use case for entity resolution. When you can link records with geomatching in real time, a customer requesting deletion can be totally deleted. Keeping you totally compliant.

CONCLUSION: geomatching gamifies the real world. And that’s a good thing.

It’s a cool buzzword, gamification. But it’s really not so strange that real-world usefulness like geomatching can draw its concepts from videogames. Look into the history of any technological field, and you’ll often find its initial ideas in the realm of the imagination.

For many, today’s web has its roots in William Gibson’s idea of cyberspace. The nascent “Metaverse” has its origins in Neal Stephenson’s novel “Snow Crash”. Quantum teleportation isn’t exactly teleportation, but everyone understands what the sci-fi word means for that very science-y concept. And it goes way back: the first submarine was inspired by Jules Verne, humanoid robots by the writer Karel Capek, and everything from mobile phones to medical scanners draws design influences from 1960s Star Trek.

And to bring this article full circle: hundreds of open-world RPG games today, from Skyrim to Assassin’s Creed, owe a debt to Gary Gyrax’s Dungeons & Dragons boardgame … which itself draws its ideas from JRR Tolkien. And what do you find on the first page of his Lord of the Rings trilogy?

A map.

So: it may be inspired by gaming, but geomatching with TiloRes is serious business, with applications everywhere. Ready to learn more? Check out TiloRes.

Posts

Explore Similar Articles

The API to unify scattered customer data in real-time.