Conceptual Modeling & Object Identity - Part 1

Objects can be other than THINGs

Hello all - this paper arose from my teaching an Information Modeling Course at Arizona State University (Polytechnic campus) using Terry Halpin's text Information Modeling and Relational Databases 2nd ed. I intend this material to be of most interest to students looking for general principles to guide their modeling efforts, but it might also provide a review for seasoned modelers as well. The focus here is how language perspectives and insights can help the modeling effort, especially at the conceptual level.

Since Object Role Modeling and Fact Based Modeling in general, emphasize language interactions with clients (verbalization), I thought it would be helpful to dig a little deeper into knowing a bit about language syntax and semantics. This language investigation led to examining the works of various cognitive psychologists/linguists, to see they how might contribute.  I found the linguist Ray Jackendoff particularly helpful and have taken his basic ideas and applied them to the ORM Conceptual Schema Design Process (CSDP), as described by Halpin. So, I hope you are excited by this combination of abstract cognitive insights and their practical application within ORM tools. Enjoy!

Rob Rucker
2015-08-08
Mesa, Arizona

Cut to the Chase

  • Identify top level conceptual categories that include objects for inclusion or reference in models
  • Identify expressions in natural language that identify and refer to these categories
  • Relate these categories and referring expressions to the Object Role Modeling process (Conceptual Schema Design Process, Halpin, 2008)

The goal of this paper is to help information analysts identify/describe the very highest level conceptual categories of objects as well as note their basic internal structure. (To distinguish this highest level of concepts, I will refer to them as topcat elements and write them in all-caps). These will be the categories (and their subtypes) that can occur within a client's area of interest (Universe of Discourse (UoD)), and, being aware of these categories and their “internal structure” will enable us to achieve a more extensive and accurate representation of that UoD.

  • Examples of topcat categories/types we are already familiar with include:{THING, LOCATION, NUMBER, TIME}. We often find such categories and their object instances by looking for or hearing noun phrases, involving things: “who is that person” or “what an interesting building”; locations: such as “123 State Street” or numbers: like “she is six feet tall” or time: “she was born in 1970 CE”. It turns out that other verbal/written grammatical expressions, coupled with visual evidence, will lead us to additional categories identified and referenced by not only noun phrases, but verb phrases, adverbial phrases, adjective phrases, prepositional phrases, and whole sentences. That is, we will see that we have to expand our notion of what can be an object – it is not just THINGs. As a consequence of this analysis, we will be able to extend our modeling reach to include these categories and their consequent kinds of instances. (Note, while topcats are the very highest categorical level - there may be many subcategories of these involved in a particular model). For example: THING ← Person ← Employee ← HourlyEmployee.

This document, is the first of a series, and is intended to just introduce this topic with known topcats and some grammatical and visual evidence for their inclusion. (I am sure you already know all of these concepts. I simply want to bring them to awareness so they become part of your everyday modeling deployment kit).

Here is the topcat list. This is not an exhaustive list, but the evidence it pretty clear for these:

{ACTION, EVENT, STATE,

MANNER, NUMBER, AMOUNT,

SOUND, SMELL, IMAGE ,

THING/OBJECT, LOCATION/PLACE, PATH/DIRECTION,

TIME, PROPERTY}

The next paper, will expand on the internal structure of some of these categories, with a view to more detailed modeling using ORM notation. In that paper I will show how Object Role Modeling (ORM), with its emphasis on linguistics in general, and its' particular emphasis on verbalization, can incorporate these additional features within the Conceptual Schema Design Procedure (CSDP). As a consequence of the approach initiated here, the reader may look forward to a richer exploration of a modeling language beyond FOPL (First-Order Predicate Logic)!

Note: my use of the term object follows that of the Fact Based Modeling Community, and Object Role Modeling (ORM version II) in particular. This community considers the category object to have two sub categories, entity object and value object. So, I will consider object to be at a higher level, above both entity object and value object. From this perspective then, I will take the conceptual category objects to be the very highest in the concept hierarchy.  

Credit is due to Ray Jackendoff for the basic ideas presented here (Jackendoff, 1985, 1990, 2010). 

A Preliminary ORM Example: Consider a “data use case” expressed by an academic officer as follows: “We enroll students for a  given course on a certain date. Subsequently, that student achieves a certain position or ranking in that class.” One attempt to model this use case is via the ORM diagram below. I show a nested/objectified relationship, “Enrollment”, that combines Person, Course, Date into one ternary abstraction I have called “Enrollment”. Relating to that, I have linked up a binary relation involving Position. The external constraint says Enrollment and Position pairs are unique. Let me look more closely at the types and objects involved here and relate them to topcat.


This ORM diagram above shows several conceptual category subtypes such as: Person, Course, Date, and Position. These though, are themselves subtypes of the highest level categories (topcat) as noted above. Of most interest is the nested/objectified type “Enrollment”. As a first cut, it simply encapsulates a many to many to many ternary relationship linking Person, Course, and Date into an abstract unit, named “Enrollment”. This named abstraction allows us then to treat it as an object in its own right. We used this idea to construct a binary relation to link Enrollment and Position.

The topcat types are:

Person → THING
Course → THING
Date   →  TIME
Position → NUMBER
Enrollment → EVENT

The category of EVENT, however, has an ACTION subcomponent, the enroll action. (This is what I meant by there being “fine structure”associated with these topcats). 

Notes from grammar:  We will find that the overall relation, as represented by the sentence: “A Person is enrolled in a Course on a certain Date”, is an EVENT. Note that a test for an EVENT is to complete the statement: “What happened was that . . . “ If you can then fill in a following grammatical phrase, then you have an EVENT. In this case you would get: What happened was that “A Person enrolled in a Course on a certain Date”.

In general, our language contains lots of distinctions that we generally don't model explicitly, but at least knowing about them lets us decide what to include and what not to include. What tends to happen is that people (we modelers in particular :-) ) take the easy route and try to jam most everything into just a few categories whose subtypes include many of the simple entities we deal with, such as Person, Building, Vehicle, Movie, Language, etc. The trouble is, that works initially, but we have lost some essential distinctions that will come back to bite us later on. 

How This All Works

When we undertake a modeling job, we usually start with lots of data, example documents, forms and peripheral information, plus our own experiences and those of our clients. From that environment we start identifying the objects of interest and the types they belong to. We then go up the type hierarchy to generalize, and down the hierarchy to specialize. At the very highest level of conceptual structure we find categories such as those listed above and in the table below. These concepts have properties other than those of THINGs and, consideration of these various types of concepts will provide additional insight (and constraints) on developing object relations, organizational services, and processes. I will use the properties of grammatical and visual structures to deduce and then confirm these additional categories. This means that before I can talk about what I see, I must have constructed a language conceptual structure and a visual conceptual structure that are compatible. The name given to this compatible structure will be one of the topcat categories and comprises one of the components of this top level conceptual mental information structure. For example, “ I just sold that” (pointing). The language constructed conceptual structure corresponding to 'that', and the visually constructed conceptual structure interpreting “that”, leads to a shared conceptual structure constituent, denoted by THING. This is how we can talk about what we can see.

To expand the discussion above in a little more detail, the most basic assumption employed here is that we humans have a fundamental mental level of information to be called conceptual structure (abbreviated as topcat) at which level all our modalities (sight, hearing, language, motion, smell, etc. . .) share compatible information structures. If we didn't have such a level, we couldn't talk about what we see, feel, touch, or smell. This being so, we can use that compatibility to tease out a valuable set of these conceptual structure categories.

(Note: This idea is due to Ray Jackendoff and is his Conceptual Structure Hypothesis).

We will do this by linking language grammar understanding to their visual correspondences. How we can talk about what we see provides a crucial approach for teasing out these categories and is a consequence of the compatibility of these two modes within conceptual structure. Using our natural grammar to tease out these categories though, is not as simple as just looking for the “nouns” in documents, user stories, or in conversations with our organizational colleagues. We will see that our language identifies objects that aren't just inferred from their position in Noun Phrases. It turns out that Adjective Phrases, Verb Phrases, Prepositional Phrases, and Sentences themselves can also identify objects. Combining some grammar with the corresponding visual structures allows us to find more categories. This approach is called “using pragmatic anaphora”).

The table below presents evidence, using natural language and vision simultaneously, for discovering categories of objects other than THINGs. Doing this lets us extend and enrich our representation of an organization's information structures.

Using Language and Vision Together to Discover Categories

Category

Example Sentence combining language & sentient capacity

LOCATION /PLACE

Your book is here [pointing] and your phone is there[pointing].

Where are my books?

I parked by the bookstore over there.

PATH / DIRECTION

They went thataway (pointing). The stock market is up.

The locations are shown on the chart and they make up a path to follow. Follow the yellow brick road.

Which way did they go?

The path to the river is a difficult one.

ACTION

Can you do that [pointing]? Can you do this? [demonstrating]

Joe did the same thing as Kim did.

What did you do?

MANNER

You mix the vodka and rum like so [demonstrating].

The Person  with  Surname 'Kim' . . . the MANNER in which the Person is referenced is by using the string 'KIM') , this is a Halpin example.

How did you make the soup?

NUMBER

The truck has length of 15 meters. The value of my IQ is just average.

How long was the trunk?

AMOUNT

The weight of this truck is 3500 lbf.

How heavy was it?

THING (this is the most commonly stated category but, is no more important than the others).

Joe was here. Joe is a Person but a Person can be considered a subcategory of THING.

The building has 50 floors. Building, is a  subcategory of THING.

That boat won't float!

What was that? (This is ambiguous and requires more input to disambiguate).

Who did you see there?

What did you purchase?

TIME

I am out of time. The time it took was inversely proportional to the distance.   

What time is it?

When did that happen?

It happened at 3:23 A.M.

EVENT

Boy, you should have seen the fight. The fight event card didn't begin to  describe the actual event.

This better not happen here again.

The cat fell out the window!

What happened then?

IMAGE

That is a pretty picture.

SOUND

Did you hear that?

SMELL

Oooh, what is that sweet smell?

STATE

Max is happy. NOTE: the verb BE is the flagship within English grammar that suggest an overall STATE. (Be is an intensive verb that says something about the subject – Max is happy). “Happy” has the function of a subject-predicate

PROPERTY

 Kim wears a red hat.

“Red” is the property here within this sentence

 

A Few More Examples of Topcats

In our Halpin text (and in daily life!) we have seen types/categories such as Person, Date, Number, String. So far, so good and, we all learned that a Noun (phrase), for example, could refer to a “Person, Place, or Thing”. When analyzing a form, document, or even a user story within the agile methodology Extreme Programming (XP),  the mantra was: “Look for the nouns and identify them as the objects”. That's okay, but we need to go beyond this and realize that additional grammatical structures such as Adjective phrases, Verb phrases, and Sentences themselves can also be referring structures to objects.  What I mean is that our natural language grammar gives us strong clues as to what conceptual category is being referred to. Notice that the use of some categories, such as LOCATION, has a fine structure that involves a nested THING as a reference entity.

Example 1: My book is on the table. “My book” is a Noun Phrase that refers to a THING, book. The phrase “on the table” refers to a LOCATION, with a nested reference object of “table”, a THING, further specifying the LOCATION. The sentence as a whole, refers to a STATE.

Example 2: The car hit the railing. The car is a Noun Phrase that refers to a THING, car. 'hit the railing' is a Verb Phrase the denotes an ACTION, while the overall sentence denotes an EVENT.

Example 3: Suppose you stand on the corner of a busy intersection. What do you see? For sure you can pick out the THING objects like Cars, Traffic Lights, Pedestrians, Drivers, etc. But, you can also pick out EVENTS, for example, “That car ran a red light”. Expanding on that observation you might want to record multiple EVENTs (note that we just quantified over events! Here is a case where you are not dealing with a THING at the highest level, it turns out that the EVENT has fine structure that does involve THINGs such as a car, an ACTION, and a LOCATION. The top most category though, is an EVENT.

Example 4: You are in a warehouse whose items are to be “categorized” as to location and item-id.

To model this environment you could start with the LOCATION of each item, and then further categorize the item as a THING, at that location and use a reference value to represent it. You would, of course, have sub categories of THING that would provide a more precise description of the item, but the idea here is that LOCATION is fundamentally different from the item - THING.

Example 5: Go two blocks up the street, and then turn right . This is clearly a PATH instruction. Think of directions provided by GOOGLE Maps as being path instructions. The idea is that PATHs might need to be stored, manipulated, and retrieved and so ought to be explicitly so noted. PATH has a particularly rich component structure which will be considered in the next paper.

References

Burton-Jones, Noel ( 1975) Analyzing Sentences, Longman
Halpin, Terry and Tony Morgan(2008) Information Modeling and Relational Databases 2nd , ed. Kaufman
Jackendoff, Ray ( 1977) X Bar Theory, MIT Press
Jackendoff, Ray (1985) Semantics and Cognition, MIT Press
Jackendoff, Ray (1990) Semantic Structures, MIT Press
Jackendoff, Ray (2010) Foundations of Language, Oxford Press
Levin, Beth (1995) English Verbs and Alternations