Skip to content

Consolidation principles

Question answering and natural language inference tasks require knowledge from heterogeneous sources. To enable their joint usage, the sources need to be harmonized in a way that will allow straightforward access by linguistic tools, easy splitting into arbitrary subsets, and computation of common operations, like (graph and word) embeddings or KG paths. We devise five principles for consolidatation of sources into CSKG, driven by pragmatic goals of simplicity, modularity, and utility:

  1. Embrace heterogeneity of nodes One should preserve the natural node diversity inherent to the variety of sources considered, which entails blurring the distinction between objects (such as those in Visual Genome or Wikidata), classes (such as those in WordNet or ConceptNet), words (in Roget), actions (in ATOMIC or ConceptNet), frames (in FrameNet), and states (as in ATOMIC). It also allows formal nodes, describing unique objects, to co-exist with fuzzy nodes describing ambiguous lexical expressions.
  2. Reuse edge types across resources To support reasoning algorithms like KagNet, the set of edge types should be kept to minimum and reused across resources wherever possible. For instance, the ConceptNet edge type /r/LocatedNear could be reused to express spatial proximity in Visual Genome.
  3. Leverage external links The individual graphs are mostly disjoint according to their formal knowledge. However, high-quality links may exist or may be easily inferred, in order to connect these KGs and enable path finding. For instance, while ConceptNet and Visual Genome do not have direct connections, they can be partially aligned, as both have links to WordNet synsets.
  4. Generate high-quality probabilistic links Inclusion of additional probabilistic links, either with off-the-shelf link prediction algorithms or with specialized algorithms, would improve the connectedness of CSKG and help path finding algorithms reason over it. Given the heterogeneity of nodes , a 'one-method-fits-all' node resolution might not be suitable.
  5. Enable access to labels The CSKG format should support easy and efficient natural language access. Labels and aliases associated with KG nodes provide application-friendly and human-readable access to the CSKG, and can help us unify descriptions of the same/similar concept across sources.