Previous article Next article TOC: Nr. 2, 1999 Previous Issue Next Issue About HØit
HØit Nr. 2-99

Information Modeling


Ky Van Ha

We need languages for describing things. Computers need languages to talk to each other. We are in a period that could be called the "post-Internet" one; everything that is not related to Internet is shown as an uninteresting thing. The future is programmable networks, intelligent networks, and mobile agent systems. We would like to share our knowledge with the whole world, and we would like to use programs, knowledge of any one else. In such a world, the way we make our information understood is important. How our agents can share their work, how different computers in the world can communicate with each other? We need a model for representing our knowledge; i.e. our information needs to be modeled. We would like, in this issue of HØIT and may be in the other ones in the future, to discuss this important topic, how we can model our information. Or more important, how the other model their information. Can we get something that we can share with?

Knowledge Representation

There are many different kinds of knowledge we may want to represent, simple facts or complex relationships, mathematical formulas or rules for natural language syntax, associations between related concepts, inheritance hierarchies between classes. Each type of knowledge places special requirements on both human comprehension and computer manipulation. Knowledge representation is not a one-side that fits all propositions. Choosing a knowledge representation for any particular application involves tradeoffs between the needs of people and computers. A good knowledge representation must be easy to use, and must be easily modified and extended, either by changing the knowledge manually or through automatic machine learning techniques.

There are three popular approaches for storing knowledge in computers:

  1. Procedural Representation: Procedural code not only encodes facts but also defines the sequence of operations for using and manipulating those facts. Programs written in scripting languages such as VB, Java Script, and Lotus Script are examples of a procedural knowledge representation. It is a declarative knowledge representation, i.e., a user simply states facts, rules, and relationships that represent pure knowledge. However, it needs to be processed by some procedural code.
  2. Relational Representation: Another way to represent knowledge is in the relation form, such as that used in relational database system. Records of information about an item are used to represent knowledge. Each record contains a set of fields or columns defining specific attributes and values of that item. By storing a collection of information in a table, we can use relational calculus to manipulate the data, based on the relations defined, and query the information stored in the table. Structured Query Language (SQL) is the most popular language for manipulating relational data.
  3. Hierarchical Representation: Another type of knowledge is inheritable knowledge, which centers on relationships and shared attributes between objects in the world. The strength of object inheritance is that it allows for compact representation of knowledge and allows for reasoning algorithms to process at different levels of abstraction. A taxonomy or hierarchy of objects or concepts is a useful way to organize collections or categories, because it allows us to reduce complexity and think at higher levels of abstraction where possible. Using objects to model the world and to represent knowledge is becoming increasingly popular.
There are many methods one can use to represent knowledge such as predicative logic, resolution, unification, frames, semantic nets, and representing uncertainty. Each method uses a basic language to express the messages. In any basic language, there is a level - the syntax and format of the messages. And there is a deeper level - the meaning or semantics. While the syntax is often easily understood, the semantics are not.

When our agents need to talk to each other, they can do it in a variety of ways. They can talk directly to each other, provided they speak the same language. Or they can talk through an interpreter, translator or facilitator, providing they know how to talk to the interpreter, and the interpreter can talk to other agents. The agents need to have a shared vocabulary or words and their meaning. This shared vocabulary is called ontology. In this paper, we would like to review some main features of ontology and the KIF, one of frameworks for constructing ontologies.

Ontology

In the Guest Editors' Introduction of [1], W. Swartout, and Austin Tate state "An ontology provides the basic structure or armature around which a knowledge base can be built". Ontology provides a set of concepts and terms for describing things in some domain, while a knowledge base uses those terms to represent what is true about some real or hypothetical world. Given a domain, its ontology forms the heart of any system of knowledge representation for that domain. Thus the first step in devising an effective knowledge representation is to perform an effective ontological analysis of the ontology domain. The popular technique is the object-oriented modeling. Why ontological analysis is important? It is because different systems use different concepts and terms for describing domains. These differences make it difficult to take knowledge out of one system and use it in another. If we could develop ontology that could be used as the basis for multiple systems, they would share a common terminology that would facilitate sharing and reuse. Sharing and reuse is the important factors for any information modeling techniques. It is the way, the ARPA Knowledge Sharing Effort [3] envisioned in 1991, in which intelligent systems could be built. They proposed the following:

"Building knowledge-based systems today usually entails constructing new knowledge based from scratch. It could be done by assembling reusable components. Systems developers would then only need to worry about creating the specialized knowledge and reasons new to the specific task of their system. This new system inter-operates with existing systems, using them to perform some of its reasoning. In this way, declarative knowledge, problem-solving techniques and reasoning services would all be shared among systems. This approach would facilitate building bigger and better systems cheaply."

The Knowledge Sharing Effort currently involves participants from over a dozen different research centers around the United States, as well as a small number of centers abroad. It is organized around four working groups:

  • Interlingua: concerned with translation between different representation languages, with sub-interests in translation at design time and at run-time.
  • KRSS (Knowledge Representation System Specification): concerned with defining common constructs within families of presentation languages.
  • External Interfaces: concerned with run-time interactions between knowledge based systems and other modules in a run-time environment, with sub-interests in communication protocols for KB-to-KB and for KB-to-DB.
  • Shared, Reusable Knowledge Bases: concerned with facilitating consensus on contents of sharable knowledge bases, with sub-interests in shared knowledge for particular topic areas and in topic-independent development tools/methodologies.
Constructing ontologies is an on-going research enterprise. Ontologies range in abstraction, from very general terms that form the foundation for knowledge representation in all domains, to terms that are restricted to specific knowledge domains. For example, space, time, part, and subparts are terms that apply to all domains; malfunction applies to engineering or biological domains; and hepatitis applies to only to medicine [6]. There is no sharp division between domain-independent and domain-specific ontologies. For example the terms object, physical object, device, engine, and diesel engine all describe objects, but in an order of increasing domain specificity.

Let us look at an example, the ontology for mathematical modeling in engineering, EngMath, developed by Thomas R. Gruber and Gregory R. Olsen [4]. The ontology includes conceptual foundations for scalar, vector, and tensor quantities, physical dimensions; units of measure, functions of quantities, and dimension quantities. The conceptualization build on abstract algebra and measurement theory but if designed explicitly for knowledge sharing purposes. The ontology is being used as a communication language among cooperating-engineering agents, and a foundation for other engineering ontologies. The following is a simple example taken from the Gruber's paper.

Assume that Agent A is a specialist in the design of springs, and agent B is a specialist in quantity algebra. Agent A needs a solution to a set of equations relating spring and material properties that include the following:

Where k is the spring rate, d is wire diameter, D is spring diameter, N is number of turns, and G is the shear modulus of elasticity.

Agent A sends the following message to agent B:

(scalar-quantity k) 
(= (physical.dimension k) 
   (/ force-dimension length-dimension))
(scalar-quantity d) 
(= (physical.dimension d) length-dimension) 
(scalar-quantity dm) 
(= (physical.dimension Dm) length-dimension) 
(scalar-quantity N) 
(= (physical.dimension N) identity-dimension) 
(scalar-quantity G) 
(= (physical.dimension G) 
   (* force-dimension
      (expt length-dimension -2))) 
(= k (/ (* (expt d 4) G) (* 8 (expt Dm 3) N)))
(= G (* 11.5 (expt 10 6) psi))
This type of information allows agent B to perform algebraic manipulations such as solutions of simultaneous equations or numerical evaluations of individual parameters. The vocabulary used in this interaction, such as the function constant "physical.dimension" is independent of a domain theory for springs. Messages or information used in EngMath is a set of KIF (Knowledge Interchange Format) sentences. The example above uses some objects defined by KIF definitions or KIF axioms. Let us look at an example. The class "constant-quantity" is defined in [4] as follows:

A constant-quantity is a constant value of some physical-quantity, like 3 meters or 55 miles per hour. Constant quantities are distinguished from function-quantities, which map some quantities to other quantities. For example, the velocity of a particle over some range of time would be represented by a function-quantity mapping values of time (which are constant quantities) to velocity vectors (also constant quantities). All real numbers (and numeric tensors of higher order) are constant quantities whose dimension is the identity-dimension (i.e., the so-called 'dimensionless' or dimensionless-quantity).

This conceptualization is defined by the following KIF axiom:

(<=> (constant-quantity ?X) (and (physical-quantity ?X) (not (function-quantity ?X))))

In the next section we will discuss some features of KIF syntax and give some examples how one can write a KIF message.

The Knowledge Interchange Format - KIF

The KIF is a language that was expressly designed for the interchange of knowledge between agents. Based on predicate calculus, KIF is a flexible knowledge representation language that supports the definition of objects, functions, relations, rules, and metaknowledge (knowledge about knowledge). In the past ten years, KIF has emerged as the preferred language in efforts to have a standard knowledge representation format for use between a variety of intelligent agents. KIF is not a programming language. KIF is a language designed for use in the interchange of knowledge among disparate computer systems created by different programmers, at different times, in different languages, and so forth.

The following categorical features are essential to the design of KIF [5]:

  • The language has declarative semantics. It is possible to understand the meaning of expressions in the language without appeal to an interpreter for manipulating those expressions.
  • The language is logically comprehensive -- at its most general, it provides for the expression of arbitrary logical sentences.
  • The language provides for the representation of knowledge about knowledge. This allows the user to make knowledge representation decisions explicit and permits the user to introduce new knowledge representation constructs without changing the language.
The KIF if formally defined and is the result of several years of effort by the Defense Advanced Research Projects Agency (DARPA) Knowledge Sharing environment workgroup. The basis for the semantics of KIF is a conceptualization of the world in terms of objects and relations among those objects. The following basic objects occur in KIF as well as in every universe of discourse:
  • All numbers, real and complex.
  • All ASCII characters
  • All finite strings of ASCII characters.
  • Words
  • All finite list of objects
  • Bottom: denotes an undefined object in the universe.

Except for characters following \, the lexical analysis of words in KIF is case insensitive. The word abc is the same as ABC. The word a\bc is the same as AbC but it is different from ABC.

There are two ways to refer to characters. The first method is use of character reference (charref) syntax. A character reference consists of the characters #, and \, followed by the character to be presented. But it is difficult to write out a non-printing character; another method can be used. KIF defines the function char-code and code-char to represent the relationship between characters and their numerical codes:

(= (char-code #\cn) n)
(= (code-char n) #\cn)

Code of a character cn is a 7-bit integer n and cn is a character that has code n. Character references allow us to refer to characters as characters and differentiate from the one-character symbols, which may refer to other object.

A string is a list of characters. There are three ways to refer to a string:

  • Quotation: "abc"
  • Block syntax: #3qabc (# + n + q + characters).
  • A list of characters: (listof #\a #\b #\c)

All three above examples refer to a string "abc".

A word is a contiguous sequence of normal characters or characters preceded by \. A word in the KIF syntax can be split into three major groups: variables, operators, and constants.

Variables: There are two types of variables, individual variables that begin with the character '?' such as ?x, ?y, and sequence variables that begin with the @x, @y.

Operators: are used in forming complex expressions of various sorts. There are three types of operators in KIF: term operators, sentence operators, and definition operators.

  • Term operators: are used in forming complex terms or objects. Here are some examples of term operators:

    KIF Sentence Meaning
    (if (> a 0) a (-a)) if a > 0 then a else -a. It denotes the absolute value of a
    (listof a b c d) The list of objects a, b, c, and d
    (+ 2 3) 2+3, it denotes the value of the constant object 5

  • Sentence operators: are used to construct complex sentences. They can use the logical operators such as =, /=, and, not, or, => (implication), <= (reverse implication), <=> (equivalence), forall, exists.
  • Definition operators: are used to define object, function, relation and logical operators. The definition operators in KIF allows us to state sentences that are true "by definition" in a way that distinguishes them from sentences that express contingent properties of the world. Examples:
  • KIF Sentence Meaning
    (defobject nil := (listof) ) The constant object nil denotes the empty list
    (defobject origin :=
    (list 0 0 0) )
    The constant object origin is defined as the list (0,0,0)
    (defrelation null (?l) :=
    (= ?l (listof) ) )
    A variable l is a null object if l is equal to an empty list
    (defrelation single (&l) :=
    (exists ?x (= ?l (listof ?x))) )
    The list l is a single object if it there is a variable x such that the list of x is equal to l.
    (defrelation even (?x) :=
    (integer (/ ?x 2) )
    The variable x is even if x / 2 is an integer.
    (deffunction abs (?x) :=
    (if (>= ?x 0 ) ?x (- ?x) ) )
    The function abs (x) denotes the absolute value of the variable x
    (deffunction cons (?x ?l) :=
    (if (= ?l (listof @l ) )
    (listof ?x @l ) ) )
    cons(x) function adds x to the front of the list l. If l is a list then make a list of x and l.

Constants: All the KIF words that are not variables and operators are called constants. KIF provides distinction for several different types of constants. All numbers, characters, and strings are basic constants in KIF. There are four other categories of constants in KIF. Object constants denote individual objects. Function constants denote functions on those objects, relation constants denote relation, and logical constants express conditions about the world and are either true or false. The difference between these categories of constants is entirely semantic; any constant can be used where any other constants can be used. A definition associates with the constant being defined a defining axiom. Here are some examples of the defining axiom:
 (= origin (list 0 0 0))
 (=> (father ?x) (exists ?y (child ?y ?x)))
 (= (grandfather ?x) (father (father ?x)))
 (<=>  (Rational-Number ?x)
     (and (Real-Number ?x)
         (Exists (?y)
           (and (Integer  ?y)
         (Integer (* ?x ?y))))))       
The last sentence defines a variable x is a rational number if it is a real number and if it exists a variable y such that x*y is an integer.

Terms, sentences, and definitions are the three different types of expression in KIF. Definitions and sentences are called forms. A knowledge base is a finite set of forms.

Ontologies and constructing ontologies are important in attempt to construct the large systems using shared knowledge. Several researchers and system developers have become more interested in reusing and sharing knowledge across systems. The DARPA, Knowledge Sharing Library (KSL) [3], provides a digital library of papers, email discussion lists, software, and pointers to related projects. Reusable ontologies that were formerly found at this URL are now available interactively through the Ontolingua Server. The Knowledge System Lab at Stanford University, Department of Computer Science focused the following works:

  • design and development of knowledge servers
  • develop multi-use ontologies, knowledge bases, and knowledge system modules
  • Computational environments for modeling the structure, behavior, and functionality of physical devices.
  • Compositional modeling
  • Architectures for adaptive intelligent systems
  • Knowledge-based systems for science, engineering and defense applications
Some software libraries are constructed:
  • Generic Frame Protocol: A Lisp API for generic accesses to frame representation systems.
  • Kifparser: a KIF parser written in C++
  • ProLogic: a Common Lisp knowledge representation and reasoning system compatible with KIF.
Going on to something standard shows it's the right way to decrease development time while improving the robustness and reliability of the resulting knowledge bases. However, we are still far from the ultimate objective. Knowledge reuse by means of ontologies faces three main problems:
  • There are no standardized ways to identify features that characterize ontologies from the user point of view.
  • There are no Web sites using the same logical organization, presenting relevant information about ontologies.
  • And the search for appropriate ontologies is hard, and time-consuming.

References

[1] IEEE Intelligent Systems & their application, January/February 1999.

[2] The Ontology Page, http://www.kr.org/top

[3] Knowledge Sharing Library on the World Wide Web, http://www-ksl.stanford.edu/knowledge-sharing

[4] An ontology for Engineering Mathematics, Tomas R. Gruber and Gregory R. Olsen, in Fourth International Conference on Principles of Knowledge Representation and Reasoning, ed. Jon Dolyle, Piero Torasso & Erik Sandewall, Bonn, 1994

[5] Knowledge Interchange Format, draft proposed American National Standard, http://logic.standford.edu/kif/dpans.html

[6] What are ontologies, and why do we need them, B. Chandrasekaran, John R. Josephson, and V. R. Benjamins, in [1].

Previous article Next article TOC: Nr. 2, 1999 Previous Issue Next Issue About HØit
HØit Nr. 2-99

Copyright: 1998, 1999, Høgskolen i Østfold. Last Update: November.99, Jan Høiberg.