Knowledge Graph知识图谱—9. Data Quality and Linking

2023-12-13 04:08:29

9. Data Quality and Linking

9.1 How well are the linked open data in practice?

Linked Open Data Best Practices
Provide Derefencable URIs
Set RDF links pointing at other data sources1

Set RDF links pointing at other data sources2

Use terms from widely deployed vocabularies1

Linked Open Vocabularies(LOV) project
– analyze usage of vocabularies

Make proprietary vocabulary terms dereferencable1

Make proprietary vocabulary terms dereferencable2

Map proprietary vocabulary terms to other vocabularies

Provide provenance metadata

Provide licensing metadata

Provide data-set-level metadata

Refer to additional access methods1

Refer to additional access methods2

More Indicators

9.2 Quality

Linked Data Conformance vs. Quality
Conformance: – i.e., following standards and best practices, technical dimension, can be evaluated automatically

Quality: – i.e., how complete/correct/… is the data, content dimension, hard to evaluate automatically

Quality of Knowledge Graphs

Issues with Automatic Evaluation1

Issues with Automatic Evaluation2

Example: Crowd Evaluation of DBpedia

The Quality of Linked Open Data is far from perfect: conformance & content
Improving the quality is an active field of research
– Survey 2017: >40 approaches
– since then: a lot of work in KG embeddings

9.3 Links

Previously on Knowledge Graphs

  • Integrate data from different sources
  • Make connections between entities in those sources
  • Facilitate cross data source queries
  • Overcome data silos

Why do we need Links?
Why do we need Links?

How do we Create the Links?
How do we Create the Links?

数据太多,很多将自己的跟其他数据集互连

9.3.1 Tool Support

A plethora of names
Mostly used for schema level:

  • Ontology matching/alignment/mapping
  • Schema matching/mapping

Mostly used for the instance level:

  • Instance matching/alignment
  • Interlinking
  • Link discovery

9.3.2 Automating Interlinking

Automating Interlinking1

Automating Interlinking2

Summary and Takeaways

Basic Interlinking Techniques
Basic Interlinking Techniques

Sources for Interlinking Signals

Sources for Interlinking Signals

Simple String Based Metrics

  • String equality
    e.g. foo:University_of_Mannheim, bar:University_of_Mannheim
  • Common prefixes
    e.g. foo:United_States, bar:United_States_of_America
  • Common postfixes
    e.g. foo:Barack_Obama, bar:Obama
  • Typical usage of prefixes/postfixes: |common|/max(length)
    foo:United_States, bar:United_States_of_America → 12/22
    foo:Barack_Obama, bar:Obama → 5/12

Edit Distance
Edit Distance

N-gram based Similarity
N-gram based Similarity

Typical Preprocessing Techniques
Typical Preprocessing Techniques述

Language-specific Preprocessing
Language-specific Preprocessing

Using External Knowledge
Using External Knowledge

From Matching Literals to Matching Entities
From Matching Literals to Matching Entities

Preprocessing and Matching Pipelines
Preprocessing and Matching Pipelines

9.4 Schema Matching

Schema Matching1
Schema Matching2

Schema Matching3

Schema Matching4

Schema Matching5

9.5 Instance based Matching

Instance based Matching

Enforcing 1:1 Mappings
Enforcing 1:1 Mappings
Schema Matching6

Schema Matching

9.5 Matcher Combination

Matcher Combination1

Matcher Combination2

Matcher Combination3

Evaluating Matchers
Evaluating Matchers

Challenges in Matching
Challenges in Matching

Summary and Takeaways

Unifying Large Language Models and Knowledge Graphs: A Roadmap

文章来源:https://blog.csdn.net/weixin_45012798/article/details/134958596
本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。