Knowledge Graph知识图谱—9. Data Quality and Linking

2023-12-13 04:08:29

9. Data Quality and Linking

9.1 How well are the linked open data in practice?

Linked Open Data Best Practices
Provide Derefencable URIs
Set RDF links pointing at other data sources1

Set RDF links pointing at other data sources2

Use terms from widely deployed vocabularies1

Linked Open Vocabularies(LOV) project
– analyze usage of vocabularies

Make proprietary vocabulary terms dereferencable1

Make proprietary vocabulary terms dereferencable2

Map proprietary vocabulary terms to other vocabularies

Provide provenance metadata

Provide licensing metadata

Provide data-set-level metadata

Refer to additional access methods1

Refer to additional access methods2

More Indicators

9.2 Quality

Linked Data Conformance vs. Quality
Conformance: – i.e., following standards and best practices, technical dimension, can be evaluated automatically

Quality: – i.e., how complete/correct/… is the data, content dimension, hard to evaluate automatically

Quality of Knowledge Graphs

Issues with Automatic Evaluation1

Issues with Automatic Evaluation2

Example: Crowd Evaluation of DBpedia

The Quality of Linked Open Data is far from perfect: conformance & content
Improving the quality is an active field of research
– Survey 2017: >40 approaches
– since then: a lot of work in KG embeddings

9.3 Links

Previously on Knowledge Graphs

Integrate data from different sources
Make connections between entities in those sources
Facilitate cross data source queries
Overcome data silos

Why do we need Links?

How do we Create the Links?

数据太多，很多将自己的跟其他数据集互连

9.3.1 Tool Support

A plethora of names
Mostly used for schema level:

Ontology matching/alignment/mapping
Schema matching/mapping

Mostly used for the instance level:

Instance matching/alignment
Interlinking
Link discovery

9.3.2 Automating Interlinking

Automating Interlinking1

Automating Interlinking2

Summary and Takeaways

Basic Interlinking Techniques

Sources for Interlinking Signals

Sources for Interlinking Signals

Simple String Based Metrics

String equality
e.g. foo:University_of_Mannheim, bar:University_of_Mannheim
Common prefixes
e.g. foo:United_States, bar:United_States_of_America
Common postfixes
e.g. foo:Barack_Obama, bar:Obama
Typical usage of prefixes/postfixes: |common|/max(length)
foo:United_States, bar:United_States_of_America → 12/22
foo:Barack_Obama, bar:Obama → 5/12

Edit Distance

N-gram based Similarity

Typical Preprocessing Techniques

Language-specific Preprocessing

Using External Knowledge

From Matching Literals to Matching Entities

Preprocessing and Matching Pipelines

9.4 Schema Matching

Schema Matching1
Schema Matching2

Schema Matching3

Schema Matching4

Schema Matching5

9.5 Instance based Matching

Instance based Matching

Enforcing 1:1 Mappings

Schema Matching6

Schema Matching

9.5 Matcher Combination

Matcher Combination1

Matcher Combination2

Matcher Combination3

Evaluating Matchers

Challenges in Matching

Summary and Takeaways

Unifying Large Language Models and Knowledge Graphs: A Roadmap

文章来源:https://blog.csdn.net/weixin_45012798/article/details/134958596
本文来自互联网用户投稿，该文观点仅代表作者本人，不代表本站立场。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如若内容造成侵权/违法违规/事实不符，请联系我的编程经验分享网邮箱：veading@qq.com进行投诉反馈，一经查实，立即删除！