Knowledge Graph知识图谱—2. RDF & 3. RDFS

2023-12-13 05:52:25

2. Resource Description Framework (RDF)

Graph
Directed vs. undirected
Labeled vs. unlabeled
Homogenous vs. heterogeneous nodes
Cyclic vs. acyclic
Knowledge Graphs are Graphs
Directed, labeled graph, Heterogeneous node types (and edges), Need not be cycle free
Node types (“classes”) and edge types (“properties”) are also referred to the “schema” of the graph (aka “ontology”) e.g. an edge of type “author” links a publication to a person

Metadata on the Web
Goal: more effective rating and ranking of web contents
Metadata on the Web: Dublin Core
Dublin Core

2.1 What is RDF?

RDF = Resource Description Framework
Description of arbitrary things
RDF

A knowledge graph consists of multiple sentences

We usually think of knowledge graphs as densely connected graphs (Objects of one statement become subjects of another)

2.2 Basic Building Blocks of RDF

2.2.1 Resources

  • denote things
  • are identified by a URI
  • can have one or multiple types
  • A resource can be a subject itself

Types: All resources (not literals) can have a type. Types can be arbitrarily defined. The predefined predicate rdf:type* defines the type of a resource.

2.2.2 Literals

  • are values like strings or integers,;
  • A literal is an atomic value, can only be objects, not subjects or predicates(graph view: they can only have ingoing edges)【和resource的不同点】
  • can have a datatype or a language tag (but not both)

Datatypes for Literals
(Almost) all XML Schema datatypes may be used, Exception: XML specific types, The underspecified type “duration”, and sequence types.
There are no default datatypes (not even “string”!)

Language Tags for Literals
Literals may be defined in different natural languages: “München”@de, “Munich”@en
Those can be marked
Knowledge Graphs can be multilingual!

Example:
:Munich :hasName “München”@de .
:Munich :hasName “Munich”@en .
:Munich :hasPopulation "1356594 "^^xsd:integer .
:Munich :hasFoundingYear “1158-01-01”^^xsd:date .

以下是3种不同的literal
– “München”
– “München”@de
– “München”^^xsd:string .

2.2.3 Properties (Predicates)

  • Link resources to other resources and to literals

2.3 Triple Notation

Triples consist of a subject, predicate, and object.
An RDF document is an unordered set of triples.

Example:
Literal with language tag:
http://www.dws.informatik.uni-mannheim.de/teaching/semantic-web
http://purl.org/dc/elements/1.1/subject
“Semantic Web”@en .

Type literal:
http://www.dws.informatik.uni-mannheim.de/teaching/semantic-web
http://www.uni-mannheim.de/mhb/creditpoints
“6”^^http://www.w3.org/2001/XMLSchema#integer .

2.4 Turtle Notation

A simplified triple notation
Turtle(可读性三元组链接语言)是一种更具可读性的RDF表示方式,它使用简洁的语法来描述RDF数据。Turtle Notation

2.5 Notation RDF/XML

Encodes RDF in XML
Suitable for machine processing (plenty of XML tools)
Notation RDF/XML1

Notation RDF/XML2

Notation RDF/XML3

2.6 JSON-LD Notation

JSON-LD: Standard for serializing RDF in JSON2.6 JSON-LD Notation
JSON-LD in HTML

2.7 Blank Nodes

Blank Nodes

Blank Nodes in Turtle
Blank Nodes in Turtle

Application of Blank Nodes: n-ary Predicates
RDF predicates always connect a subject and an object, i.e., in the sense of predicate logic, they are binary predicates.
Sometimes, n-ary predicates are needed
– has_ingredient(Recipe, Sugar, 100g)n-ary Predicates

2.8 Semantic Principles of RDF

AAA principle: anybody can say anything about anything, also used for RDF knowledge graphs
Non-unique name assumption: there is not just one name for each thing. Just that two things have different names does not mean that they are different!
Open World Assumption: there may be more information on a resource than what we have

2.9 RDF and HTML

The Semantic Web uses RDF
The “classic” Web uses HTML

Explicit reference to a RDF version: an agent stumbling on the HTML page can download the RDF data file

Content Negotiation
Content Negotiation1
Content Negotiation2

MIME: Multipurpose Internet Mail Extensions

Link to RDF DocumentContent Negotiation
Can be done with a simple HTML editor; No special server configuration neededRequires particular server setup; One URI can be used for different representations

Both cases require: two different representations + “double bookkeeping”
→ Potential source of inconsistencies!

2.10 RDF in Attributes (RDFa)

Idea of RDFa:
Why not encode HTML and RDF in one document? The essential information only has to be encoded once
RDFa combines XHTML with RDF
RDFa Language Constructs
RDF in Attributes (RDFa)

2.11 Microdata

Alternative to RDFa
Adding structured information to web pages
– By marking up contents
– Arbitrary vocabularies are possible
– Introduced with HTML5
Microdata1

Microdata2

RDFa vs. Microdata
Commonalities
– Arbitrary classes/predicates are possible
– Although Microdata is mainly used with schema.org
Differences
– Microdata is slightly less expressive
– No URIs, only blank nodes
– No cycles in the resulting RDF graph
– No reification (see later)

2.12 RDFa, MicroFormats, and Microdata

MicroFormats: fixed vocabularies for persons, addresses, etc
WebDataCommons: Large-Scale Extraction of RDFa, MicroFormats, Microdata, JSON-LD from the Web

2.13 RDF Tools

Storage: relational databases, graph databases
Validation: validating parsers checking consistency
Visualization: mostly graph based visualization
Reasoning: inference over graphs
Programming: APIs

2.14 Metadata for RDF

Dublin Core was designed as Metadata for the Web. Knowledge graphs may have metadata as well.
Most prominently: provenance
– Where does the data come from?
– Who created it?
– When was it created?
– What was the process creating it?

2.15 Reification

In RDF: Statements about statements
“Peter says that Rome is the capital of Spain.”
Implementation:
RDF Statements are considered resources themselves. Can be subject or object of other statements.Reification in RDF

Implementing Reification as Standard RDF

Encoding Reification in Turtle

Wrap Up

3. RDF Schema (RDFS)

RDF vs. XML

Schemas and ontologies bring semantics to knowledge graphs

3.1 Semantics

How do Semantics Work?

3.1.1 Lexical semantics

Meaning of a word is defined by relations to other words. / Defining semantics by establishing relations between words
Lexical Semantics

3.1.2 Extensional semantics

Meaning of a word is defined by the set of its instances.
Extensional semantics

3.1.3 Intensional semantics

e.g., feature-based semantics. Meaning of a word is defined by features of the instances
Intensional semantics

Intensional vs. Extensional Semantics
1. Intensionally different things can have the same extension. Classic example: morning star and evening star, both have the same extension (i.e., Venus)
2. The extension can change over time without the intension changing
3. Intension may also change over time
4. Extension may also be empty

3.1.4 Prototype semantics

Meaning of a word is defined by proximity to a prototypical instance.
Intensional and extensional semantics are based on boolean logics, but prototype semantics is a more fuzzy variant.

Semantics define the meaning of words
That is what we do with ontologies — using methods from lexical, intensional, and extensional semantics

3.2 Ontology

An ontology is an explicit specification of a conceptualization.
Ontologies encode the knowledge about a domain
They form a common vocabulary and describe the semantics of its terms
In computer science (with a or the)
– a formalized description of a domain
– a shared vocabulary
– a logical theory

Essential Properties of Ontologies

  • Explicit: Meaning is not “hidden” between the lines
  • Formal: e.g., using logic or rule languages
  • Shared: An ontology just for one person does not make much sense
  • Partial: There will (probably) never be a full ontology of everything in the world

3.3 Encoding Simple Ontologies: RDFS

RDFS 是建立在 RDF 之上的一个扩展,用于定义 RDF 中的资源和它们之间的关系的模式。RDFS 引入了类(Class)和属性(Property)的概念,允许用户定义资源的类型和属性的含义。RDFS 还提供了一些关于资源和属性的基本推理规则,例如,如果资源 A 是类 B 的实例,那么资源 A 也是类 B 的子类的实例。RDFS 用于建模资源之间的层次结构和基本约束,以增强 RDF 数据的语义。

3.3.1 Most important element: classes

Classes form hierarchies
Classes form hierarchies
类(Class):
类定义了一组资源,这些资源具有相似的特
征或属性。类用于将资源分组或分类,以便
更好地组织和描述数据。类是一种元数据,用于描述数据模型中的概念和实体类型。例如,你可以定义一个类 “Person” 来表示所有人的概念。类可以形成层次结构,其中一个类可以是另一个类的子类。这种层次结构允许你建立更具体和抽象的类之间的关系。例如, “Student” 可以是 “Person” 类的子类。类还可以有实例,这些实例是该类的具体成员。例如,“Alice” 和 “Bob” 可以是 “Person” 类的实例。

3.3.2 Properties in RDF Schema

resemble two-valued predicates in predicate logic
Properties also form hierarchies
属性(Property):
属性用于描述资源的特征或关系,它们定义了资源之间的关系。属性在RDF数据模型中起着关键作用,用于连接主体(subject)和宾语(object)。属性可以表示资源的某种性质,例如,“hasName” 属性可以用于表示一个资源的名称。属性也可以表示资源之间的关系,例如,“isFriendOf” 属性可以用于表示两个资源之间的友谊关系。类型属性(rdf:type)用于将资源与类关联起来,指示资源的类型

Domains and Ranges of Properties
In general, properties exist independently from classes.
Defining the domain and range of a property
:capitalOf rdfs:domain :City . [Domain defines the type of the instance that is in the subject position]
:capitalOf rdfs:range :Country . [Range defines the type of the instance which is in the object position]
Domain and range are inherited by sub properties

Predefined Properties
rdf:type, rdfs:subClassOf, rdfs:subPropertyOf, rdfs:domain, rdfs:range, rdfs:label, rdfs:comment, rdfs:seeAlso(Links to other resources:), rdfs:isDefinedBy(Link to defining schema)

URIs vs. Labels
A URI is only a unique identifier, and it does not need to be interpretable.
Labels are made for human interpretation, and can come in different languages

3.4 RDF Schema and RDF

Every RDF Schema document is also an RDF document. -> All properties of RDF also hold for RDFS! (Non-unique Naming Assumption & Open World Assumption)

3.5 Reasoning with RDF

T-Box: Definition of the Terminology
A-box: Definition of the Assertions
RDF Schema allows for deductive reasoning on RDF (given facts and rules, derive new facts)
The corresponding tools are called reasoner

Opposite of deduction: induction
deriving models from facts, e.g.data mining and machine learning

Interpretation and Entailment
Entailment: The set of all consequences of a graph
Mapping a graph to an entailment is called interpretation
Simplest Interpretation

This interpretation creates all statements explicitly contained in the graph. But the implicit statements are the interesting ones!

Interpretation using Deduction Rules
RDF interpretation can be done using RDFS deduction rules. Those create an entailment
using existing resources, literals, and properties, creating additional triples like <s,p,o>.

Deduction rules are an interpretation function
Simple reasoning algorithm (a.k.a. forward chaining)Deduction Rules RDF Schema (Selection)

Forward Chaining

Multiple Domains/Ranges
Multiple Domains/Ranges

3.6 What we Cannot Express

There is no negation in RDF and RDFS.
We cannot produce any contradictions
The missing negation perfectly fits the AAA principle and Open World Assumption.
Any new knowledge will always fit to the knowledge that is already there. This principle is called “monotonicity
RDF Schema is not very powerful but free of contradiction.

文章来源:https://blog.csdn.net/weixin_45012798/article/details/134895050
本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。