Site Map  Search

OOA Home OOD content UML Corrections book Code object Exercise Solutions oriented Resources tutorial Miscellany textbook

Support Resources

[Lecturers who have registered with Pearson Education's can access further supplements.]

There is also the miscellany page for topics with a more tenuous connection with the book.

Theory ruminations

The term "entity"

The term "entity" has been with us (software system modelers) since Peter Chen and the mid 1970s. It meant, paraphrasing only a little, something in a model that wasn't a relationship and which entered into relationships. It was widely accepted thoughout the 1980s that:

  • An entity exhibits identity or individuality, in a way that attributes don't.
  • An entity plays an important role in the context it inhabits.
  • An entity is described by the values for its attributes and the identities of its associates.

I, in common with others (e.g.*), would prefer not to change the definition. If one has a new concept, one shouldn't take an existing term and start using it in a different way. And what is more, as you'll know from the book, I find that the original definition still works for what I find most useful to put into an object-oriented analysis (or subject-oriented) model.

What is the term "entity" taken to mean in circles where the UML might be used? UML 2 (ptc-04-10-02) says, in the standard profile, "A persistent information component representing a business concept." I find that quite useless. What does "persistent" mean? Why is the word "component" being used? And what is a "business concept"? (Lest that sounds unduly harsh, let me say that I do admire enormously the people working on the UML specifications. I wouldn't want to do the job. And I think they've achieved an immensely successful and worthwhile result. And there's no way, of course, that they can please all of the people all of the time.

Is there some background to enlighten us more? Well it's fairly clear that this is influenced by Jacobson's definitions. Ivar Jacobson was one of the three original contributors to the UML back around the UML 1.0/ UML 1.1 days. We can understand his use of the term better by putting "entity" back with the other two kinds of model classifiers he was contrasting it with:

"In the analysis model, three different stereotypes on classes are used: «boundary class», «control class» and «entity class». [Interface classes like a Cash Dispenser or a Cashier] are boundary classes that in general are used to model interaction between the system and its actors (i.e. users and external systems). [A Cash Withdrawal, for example] is a control class that is generally used to represent coordination, sequencing, transactions, and control of other objects—and is often also used to encapsulate control related to a specific use case. [An Account, for example] is an entity class that in general is used to model information that is long-lived and often persistent."

What I sense from that, is that Jacobson might have started off with the same suspicions as I did: that there was something special about analysis model denizens that did things as opposed to those that recorded things. However, we haven't arrived at quite the same conclusions. First of all, if it's important to respect the Model/View separation in design (which it is), it's even more important to respect it in analysis. While an actor or a system–actor boundary creature might well be very good sponsor/user/reader hooks that help them understand requirements documents, it isn't necessarily a valuable aid for modeling the system-to-be's subject matter or for suggesting good objects for the system-to-be. The criteria for the latter are different from the criteria for the former. Also, as you will know if you've got any way into the book, I believe that outside of the now-much-rarer computerization activity, the patterns of control, coordination and sequencing transaction among the creatures of the subject matter or context illuminate nothing as to the control, coordination and sequencing among the objects to be. So, while I might such distinctions in requirements, in business systems analysis or in workflow analysis, and while I might need to express sequential constraints (in state machines, say), and while I might use algorithmic detail of how sequencing and transacting work when I'm designing how a concrete method of a class might work late into the detailed design of version 1 of the system-to-be, I'm not going to find controler and bounary creatures useful in a computer system's first, subect-oriented (i.e. analysis by the book's definition) model.

So I want to locate and model those subject matter creatures that do make sense of the subject matter and am fairly positive that such creatures will make good suggestions as to the objects of the system-to-be. The only criteria that seem to make any sense for homing in on these kinds of creatures, are that they exhibit identity, they play an important role in the subject matter and their instances can be described by the values of their attributes and the identities of other instances they are related to.

Which is more-or-less where we came in.

I suspect that the creatures I'm after are very close in spirit to Jacobson's "entity classes". I don't understand what "long-lived" means, however, nor do I understand how "long-lived" differs from "persistent" (in the subject matter at least). I also note that "class", perhaps unfortunately, but certainly undoubtedly, has a strong technical connotation for many developers, so it's pleasing that the UML just uses «entity».

So I end up with all my analysis model classifier boxes labeled with an «entity» stereotype. If I find that this unduly upsets a particular reader or client, I switch to my second best tactic, which is to leave analysis classifier boxes without stereotype labels and to exhaustively ensure that all design model classifier boxes have «interface», «interface», «implementation class» or «abstract class» stereotypes. (And of course the last one isn't official, so if that upsets someone I reluctantly (and inconsistently in my opinion) resort to the {abstract} constraint instead.

* If the link has moved, search for "Frans Bouma's blog" entry on "Entity: why do some people who write IT books re-invent definitions?"


There is a spectrum of possibilities for the interpretation of association relationships. And that's the problem. Most people use them without understanding exactly what they mean. And without checking if a designer takes the same meaning that the analyst thought (s)he was using.


At one end of the spectrum is "well these kinds of things have got something to do with these kinds of things, and it seems to involve a few of those and a few of those". At the other end of the spectrum is "an instance of this would hold a pointer to an instance of that in a variable named thusly.

One would probably argue both against the imprecision of the former and against the low level of the latter. We want to strike a balance where we get associations with an unamiguous and useful implementation interpretation for typical programming languages, yet where we can work at a richer level than pointers.

The UML sits surprisingly far towards the pointer end of the spectrum in that its model for associations is a tuple. Let me explain. FINISH

The example of tables and bookings is quite a good illustration. Our model might have to tell the reader which of at least three possible schemes is the one we have in mind:

  • A table accepts several bookings that are separated in time. Each booking will be allocated to a particular, single table.
  • A table accepts several bookings that are separated in time. A larger-than-normal booking might need to occupy more than one (adjacent) table.
  • Our diners do not mind if they share their table(s). To achieve maximum occupancy, having allocated a booking to a group of tables, we might allocate another booking of the same time period to that same group of tables.

Lets make our analysis "pointers" slightly more powerful than the typical "I store your address" that's available in typical languages. Let's have "pointers" that can branch and can be used in both directions if necessary. It seems to me that these are the possible configurations our notation much unambiguously distinguish among: