Feb 13, 2021

Proposal: Distributed Linked Data Repository

A worldwide repository of linked data

Structured data which is interlinked with other data so it becomes more useful through semantic queries

In order to realize the potential of linked data for spreading knowledge worldwide, the following are required:

Widespread deployment of an incentivized content production and distribution system
Availability of powerful, accessible user interfaces that people of all backgrounds can use to make use of the network

The Problem

Knowledge Repository Centralization

Wikipedia and Stack Overflow, to name two examples, are widely used centralized knowledge repositories. They have their own identity, content moderation, and hosting platform controlled by a single organization. Despite their clear utility in making knowledge accessible, their centralized nature makes them vulnerable to attack or censorship. Additionally, central hosting concentrates cost burden on a single organization, which must continuously seek donations to remain operational.

Lack of Information Structure

Using Wikipedia as an example, its data is semi-structured. Viewed as a knowledge graph where entities are Wikipedia entries, there are links between entities via hyperlinks, but those links do not have machine-readable semantic meaning. Entries do have properties, such as “birth date” for a person or “release date” for a movie, but the schema for these properties is loosely-defined.

The Solution

A solution which delivers structured information to the world must have these characteristics.

Cryptographically-provable author identity and content signing

Any large body of information will contain false or contradictory information. In debates, there are multiple points of view. In a content system contributed to by anyone in the world, in order to establish trust, there needs to be a way track and prove author identity and authenticity of content. A solid identity and authenticity system will allow filtering information to trusted sources and avoiding pollution of the knowledge store via impersonation of trusted individuals.

Incentivized information production and storage

The creation and curation of content requires the expending of time and energy by individuals. In order for information to proliferate, the production and maintance of that information must be incentivized. Similarly, the provision of hardware to host network content requires time and energy and must be similarly incentivized. One current effort in this space to check out is LBRY.

Traceable revision of content

Maintenance of the knowledge repository (or knowledge graph) will require a process to suggest alteration of content, perform the alteration, and commit the alteration. This will require an interacive annotation and revision feature on top of the knowledge base.

Flexible schemas and schema mappings

Formalized vocabularies such as those at schema.org will be useful tools to help search and federate content in the graph. However, because the shape of information constantly changes, schemas will proliferate. This will make it necessary to easily add schema modifications, as well as map among schemas, to avoid fragmentation or rigidity of the graph’s semantic structure.

A collection of intuitive UIs

To spur adoption of the project, a collection of intuitive user interfaces (UIs) must be developed. This will likely be a collection of UIs, rather than just a single UI, because the information in the repository will be consumed in a variety of ways by people with a variety of preferences.

Language translation and “projection” of knowledge into a variety of forms

To maximize usability of information worldwide, it will be necessary for the information to be consumable in a variety of languages. More generally, as a given piece of knowledge becomes more abstract, and as tooling around consuming that knowledge becomes more powerful, we can think about language translation as being a part of the larger concept of “knowlege projection”. For example, different aspects of a given piece of knowledge may be expressable graphically, as prose, or as a set of mathematical equations.

The ability to “project” knowledge into any form desired is a long term goal, and one that likely can be improved ad-infinitum. The aim of this project is to start simple, but we should not lose sight of this long-term vision of making information as readily-consumable as possible.

Let’s build it!

In my opinion, there is room for a community project, with the goal of producing free software to realize this vision of a decentralized, accessible, and structured store of human knowledge.