home site map

Developer's Guide to UML 2 — A UML Tutorial

<previous  1  2  3  next>

Structural Diagrams

"Class" diagrams

The mainstay of the structural diagram is the box, i.e. solid rectangular outline. It is used for many, many purposes, only one of which is classes, hence the laughing quotes around "class". As one of the themes of this book is clarity as to what one is doing, and should be doing, I strongly recommend that everyone in the team gets into the habit of saying "entity diagram", "type diagram", etc., or of saying "structural diagram". Don't say "`class' diagram" unless you mean "class diagram".

The UML does provide a keyword to tell us what purpose a box serves in a given diagram; it puts it at the top of the box, above the name, as a quoted label. The default interpretation of a box is a class, but you should always be specific and put in the label. The particular quote marks used are guillemets . (The kind of quotation marks used, for example, in French and Norwegian). See Example boxes.

(Strictly speaking a box symbol usually denotes what the UML terms a classifier , and classes and types, for example, are varieties of classifier. You will frequently see the word "stereotype" used as the term denoting the type label in guillemets. It was a peculiar term and has been given a much tighter definition in UML 2. Stereotypes are now concerned with defined variants of UML called profiles .)

The typical types of box are:

  • «implementation class» This is what is perhaps known as a concrete class in many programming cultures. Officially an «implementation class» is a programming language class in which an instance may not have more than one class. (Since that is the norm in our example programming languages - Java, C ++ , C# and Smalltalk - indeed in all but the most exotic of languages - I have narrowed its meaning slightly in this book. While the above will be true, an «implementation class» in this book is taken to mean a class that can have, and is meant to have instances . In other words it is not an abstract class and it is not an interface. The other possibility for indicating concrete classes, which is more practical than the unwieldy «implementation class» , is that a classifier box with no keyword and with no {abstract} constraint is assumed to be a concrete class, especially if it has methods and instance variables with specified visibilities. I like things to be a bit more explicit than that however.)
  • «interface» This specifies a declared set of services that the classes we define can implement. Typically, an interface declares a set of accepted messages.

(At the time of writing there is a puzzle here. This important variety of box is neither in the list of official stereotypes in the specification appendix, nor is it in the list of stereotypes that have been retired. However, it is certainly the case that we sometimes represent an interface in multibox form, and there are certainly example diagrams in the UML specification showing «interface» in the expected position. And, in fact, on page 115 (ptc/03-08-02) it is clearly stated that a box representation would have the «interface» stereotype.)

  • «type» This specifies a domain of objects together with the operations applicable to the objects, without defining the physical implementation of those objects. You might wonder how exactly this is different to an interface. In the UML specification I am working from (ptc/03-08-02) the stereotypes appendix doesn't mention «interface» but does mention «type» ; and the main chapters don't mention «type» but have plenty on «interface» !

However, general practice, and the practice of this book is to treat «interface» as more of a mechanism - something we are going to code up. For our purposes it could be a Java or C# interface , or a C ++ pABC . In the context of component-based development, EJB for example, an interface could represent the home interface of an enterprise bean.

A «type» on the other hand represents something that might not have a direct, single implementation. In this book we have used it as an element of early design when we were still getting clear on the objects we would need and before we were able to figure out what classes and interfaces we would actually need to implement.

More exotic possibilities are:

  • «entity» In this book, the «entity» stereotype has been treated as a basic keyword; and the meaning has been altered ever so slightly - partly for pedagogical reasons, but also because I prefer and use the amended version.

Officially an «entity» is a persistent information component representing a business concept. (Because I think that assumes too naïve a view of the exactitude and obviousness of correspondence between the software creatures and the subject matter creatures, and since that leaves us talking about "classes" in the very first model, I have suggested using «entity» for the subject matter creature itself, when it first surfaces, in an "analysis" model. Such a creature will evolve into a «type» in an intermediate model, and into «interface» and «implementation class» creatures in the final, design, model. Thus, we keep it clear that the real world isn't actually object-oriented and we end up with good, practical designs for real programming languages.)

  • «metaclass» This is found in languages like Smalltalk where classes are themselves objects A metaclass is a class whose instances are classes. (In Java, classes are also objects but their origins are partially obscured, whereas a Smalltalk metaclass is nearly a perfectly ordinary class.)
  • «utility» A utility class is something like the Math class of Java and C#. Such a class wouldn't be instantiated but serves to gather a number of related functions that would otherwise be closer to global functions. The methods of such classes are all class methods ( static methods in C ++ , C# and Java). The righteousness of such devices is dubious. Languages could just as easily present such functionality via special singleton instances; then the object-orientation would be purer and not mixed up with class orientation. Many years' experience with Smalltalk showed that having too many class methods is confusing and unnecessary.

As well as actually marking your use of a box with a stereotype, you should carefully consider the meaning, or semantics, of your box in the kind of model or diagram you are creating. The main chapters of this book have gone to great lengths to make the interpretation and usefulness as clear as possible and to maximize the return on your investment in drawing them. Avoid thinking that these boxes or "classes" are much the same wherever and whenever they appear.

Note that while some particular form of class gets their own label, like «utility» , other forms, like abstract do not. Abstract classes are officially denoted by putting the name in an italic face and by writing {abstract} , a styling that is often used to denote a constraint. Abstract methods also use italic face and have {abstract} written as part of their description. (I find this rather inconsistent, and don't mind at all when I find developers have decided to define their own stereotype of «abstract class» or «ABC» , as shown in Official and unofficial abstract class symbol.)

Multiboxes are also used if a graphical definition of an enumeration is preferred, as in Graphic depiction of an enumeration.


While it is perfectly OK to have a box with just a name, we often want to portray some of the characteristics or features of a classifier ; in other words, the attributes of an entity, the messages of a type or interface, or the methods and instance variables of a class. We divide up our box into compartments . The name and type go in the top compartment and the list of each different kind of feature goes in a separate compartment. Compartments has some example compartments.

Because the UML likes to put instance variables first (and tends to call them attributes), while many object-oriented developers prefer to list the methods first, and because there can be more than two substantive compartments, and because some developers (and this book) like to be quite careful about the interpretation of the contents of the compartments, it's a good idea to use compartment names . There's a hint that such labels, with laudable consistency, are also enclosed in guillemets.

Notice that for an analysis «entity» , a design «type» and a design «interface» , visibility isn't shown as it would be fairly meaningless; whereas for a class the visibilities are shown.

The possible, increasing, visibilities are:

  • - or private

visible to class methods alone;

  • # or protected

visible to class methods and subclass methods;

  • ~ or package

visible to classes in the same package;

  • + or public

visible to all.

If the keyword is used, it is used as a subsection heading (C ++ style) as in Alternative visibility notation. If the symbols are used, they are used as prefixes, like the class in Compartments.

The styling is official: bold centered name; initial upper-case letters only for object type names; all names use capitalization to indicate constituent word boundaries rather than underscores; features are plain face and left justified; names of abstract classes and methods are italic.


An interface , like that of Interface, defines a set of features and obligations forming a coherent service that a conforming class will implement. For our purposes in this book, interfaces are very important: they give us the ability to define pure abstract types. They are the interface of Java and C#, and the pABC of C ++ .

Interfaces define messages ( operations in UML's terminology).

Interfaces can also define "attributes", as shown in Interface "attributes" or properties. Be careful with that term: it is horribly overused. What it means is that interfaces can define what are perhaps more often called properties in commonly encountered programming languages like C#, Object Pascal, Eiffel (and Visual Basic). There is no direct implementation in Java or C ++ . When we define a property or "attribute" as part of an interface , what we mean is that while a class might not actually store such an item (i.e. would not necessarily have an instance variable representing the property), to the outside world it appears as though it has such a property. As I said, this can be done directly (and very nice it is too) in Eiffel, Object Pascal and C#; in Java and C ++ it must be provided via a pair of methods such as:

  • Date date() and void Date(Date) , (CORBA style (and my preference));
  • Date getDate() and void setDate(Date) , ("Bean" style).

There are several ways in which an interface can be depicted. When it is the details of an interface that are interesting, the box with compartment, like Interface, is probably best. An interface can be reduced in impact if it's not the details that are important. An interface offered and required can be shown as ball and socket, emphasizing the intermediary and decoupling value of an interface. In Interface as ball and socket, the Invoice class is implementing the StatutoryDocument and Debt interface and the Customer class uses the Debt interface somewhere, as one of its instance variable, local variable, parameter or return types.

It is also possible to show the details of an interface and show a class that uses the interface. An alternative to the socket is the dependency arrow that we will be detailing later, as shown in Interface dependency.

Where an interface is used as the type of a parameter or instance variable, it can of course appear textually, as can class types and primitive types.

On its own, an interface can do nothing, since by definition, it has no implementation. Classes come alone and promise to realize or implement interfaces.4 The symbol was the fairly intuitive, dashed line with triangle arrowhead, pointing from the implementing class to the implemented interface, as depicted in Implementing an interface. It was intuitive because like just about all appearances of the triangular arrowhead, it too denotes type sharing; but it is brings lower coupling than the solid inheritance line. Other writers on UML continue to use this depiction; I shall continue to use this depiction, but strangely it has all but disappeared from the standard (ptc/03-08-02). Let's hope its omission was an oversight.

Although I seem to have been unlucky with many of the examples in this book, it is suggested, quite reasonably, that typical interface names, especially of what you might call "passive" objects, will often end in -able or -ible, Serializable or Comparable , for example.5


The compartments of our classifiers list the features of the instances they classify. Features can be structural or behavioral . In this book we go further and have distinguished the attributes of the entities of the subject matter model from the messages of types and interfaces, and from the instance variables, methods and constructors of implementation (concrete) classes.

Attributes and instance variables

UML doesn't distinguish these, tending to use the term "attribute" for both. They are examples of what the UML calls properties , which are structural features. The intrinsic measurable properties of subject matter entities, however, are rather different from the instance data defined by implementation classes, so in order to give the different model the correct focus, this book has used the traditional term attribute for the former and instance variable for the latter (also known as fields or data members in our example languages). The basic notation is the same:

visibility name: type = default-value {property-string}

for example, as an attribute of a subject matter entity:

weapons certified: boolean = false

Visibility for an attribute, of a subject matter entity, is pretty meaningless and hasn't been shown in this book. The default value is always optional. The most likely property of an attribute or instance variable would be readOnly . For example, as an instance variable of a class:

- dateOfBirth: Date {readOnly}

There are two more depiction frills. The first is where a property is important enough to be shown but is actually derived from more basic properties. This would be shown with an oblique at the front:

- /ageAtDeath: int

The way the property is derived could be given via OCL in the model repository. The decision as to which of a group of properties is derived is often quite arbitrary, and the symbol can often alternatively be read as a constraint between sets of properties' values.

The second frill applies where the property is represented by a collection of undifferentiated values, in which case the multiplicity can be given and the following extra properties become relevant to describe the nature of the collection (although I am paraphrasing as the current descriptions (ptc/03-08-02) are difficult to follow):

  • {bag} - simplest collection with no ordering and duplicates allowed
  • {set} - no duplicates
  • {seq} - ordered sequence

Multiplicity will be covered later, when we get to association relationships. For example, as an instance variable of a class:

- curvature: SpacePoint[3..*] {seq}

Messages and methods

The basic notation looks like this:

visibility name(parameter-list): return-type {property-string}

Visibility has already been covered.

Parameter list is a comma-separated list of parameter descriptions, each of which looks like this:

direction name: type [multiplicity] = default value {property-string}

C ++ programmers will know that default parameter values can get you into trouble, and are probably best avoided. I can only think of one parameter property: something like readOnly ; but given that we already have a direction and, as anyone who has compared and struggled with C ++ 's const and Java's final parameters will know, the interpretation of readOnly parameters is rather subtle, so the parameter property string is probably best avoided as well.

The direction , whose assumed default is in , can be one of:

  • in - passed in only (pass by value)
  • inout - passed in and out again (pass by reference)
  • out - passed in uninitialized, initialized and passed back by reference (Ada and CORBA for example, but not usually available)

The description of the return type is very muddled in the UML at the moment. UML 1.x had the return as it has been shown here. Other writers on UML 2 (like Martin Fowler in "UML Distilled", [Fowler 04]) also have the return as shown here. In the current (ptc/03-08-02) documentation, however, this kind of return depiction only shows up in an example. The main text seems to imply that returns are shown as parameters, in the parameter list, with another direction type, of return , which I don't like at all.

A message (or operation) would be a behavioral feature of a type or interface, and might look like this:

knownAssociates(in forPeriod: Duration): Person[0..*] {set}

A method would be a behavioral feature of a class, and might look like this:

+ dangerousAndNotToBeApproached(): boolean {query}

As you can see, the visibility of the message was not given. A message is a message; there is certainly no obvious meaning to a private or protected message; even a self-message isn't perceived as private since half the elegance of self-messaging is that you don't know where the implementation is.

The main new property of a message or method is {query} . This is important for a several reasons:

  • We saw that they are the main message suggestions to emerge from the analysis.
  • It is semantically interesting to note anyway.
  • As query messages or methods do not change the state of the system, they can always be re-ordered, and the more of them that there are in the interface, the less likely it is that you will need to provide a protocol state machine.
  • If the language can be told, as C ++ can for example, they are an important safety device.

Perhaps you will be kind enough to bear with me on another terminology rant. The UML often correctly distinguishes what one finds in a «type» or an «interface» - message signatures - from what one often finds in an «implementation class» - methods or behaviors . (Of course, the UML tends to say "operation" rather than message signature.) However, the UML also quite often uses the term operation when it surely means method . Here is an example referring the kind of thing we saw in Alternative visibility notation: "Attributes or operations may be presented grouped by visibility." As some of the examples accompanying this point are private, they cannot possibly be talking about operations.

I suggest that designers of typical object-oriented systems and users of typical object-oriented languages stick consistently to the terms message (or operation if you absolutely must) for a requesting perception and method for an implementing perception.

Class variables and class methods (static features)

The normal object-oriented way to get things done is for object instances to run their methods on their own instance variables. This is what we mean by object -orientation. Sometimes (perhaps too often in some designs) we get the class to execute a method or to hold some data. If you liked the term "instance variable", then perhaps you'll also like the terms class variable and class method for these class devices.

However, borrowing from C for some reason, C ++ , Java, C# and the UML refer to them as statics . This tends to confuse everyone unless they happen to know how storage is allocated in C and C ++ . Static does not mean that they don't change. In C and C ++ it means that the amount of storage required for them is fixed and known at compile time. There we are though; we are stuck with this dreadful word. However, I would not be alone if I advised you to pronounce the term static as "class".

The depiction of "statics" is also somewhat vexing. We underline static features, as shown in "Static" features. That would be fine except that when we underline the name of a class we mean almost the opposite. Read on.

Depicting instances

In interaction diagrams and sometimes in structural diagrams, we want to show an example instance of a class or entity. We indicate that a classifier box is presenting an instance rather than a class by underlining the label, and by changing to a name: type style label as shown in Object instance depiction. The UML distinguishes diagrams that contain only example instances and their relationships as object diagrams , although it doesn't give them a section of their own, and neither will I.

Typically the "name" is fairly meaningless in a diagram and the instance is depicted as an anonymous instance .

Active objects

A typical object instance does no work until a message brings it a thread of control. If, however, we have managed to arrange for an object that starts to do work as soon as it's created, it is called an active object in the UML. The Runnable start object of a Java thread might be designated an active object. An active object might "pull" its data from a queue rather than wait for message arguments.

Previously, active objects were depicted with a thick border but that was too difficult to discern (the boxes in this book have fairly thick borders but that's pure aesthetics. Today an active object (or class) is depicted with double box sides, as shown in Active object.


There are several possibilities for depicting constructors. The UML isn't very clear on its recommendations. My attitude is that constructors are neither methods nor static methods and if I intend to show them at all, I prefer to have them in a compartment of their own, with the compartment name «constructors» .

If you (or your CASE tool) prefer to put constructors in the methods compartment then there are several possibilities for distinguishing them. Firstly they would normally have no return type since the result of running a constructor is implicitly an initialized object of the class. This would only truly distinguish constructors, however, if you had guaranteed always to show returns for ordinary methods, including void to explicitly indicate a decision to return nothing. Secondly there is the name of the constructor, which can either be the same name as the class, which is what our example languages all do, or alternatively could be the name new() . Additionally, the UML disagrees with me and takes the position that constructors are static members of the class and thus would be underlined. Finally, some older UML descriptions suggested a stereotype of «constructor» at the start of the constructor's entry in the compartment.

All in all, as I say, I prefer a separate «constructors» compartment.


Although this symbol can appear on any kind of diagram, in the UML it is defined in the structural part. A comment is a box with a "dog-ear". If a particular element is the subject of the comment, then a line can call attention to the element. See Comment.


A package is a particular kind of namespace - a simple kind of namespace that doesn't do much other than act as a namespace. (Be careful when the UML is talking about namespaces and packages, as both terms are meaningful. Programmers who use both Java and C ++ might have got into the habit of treating namespace and package as synonymous. In the UML they are not synonymous: packages act as namespaces (and do little else), but there are other kinds of namespaces: a class is a namespace and an interface is a namespace, for example. Of course, that's actually exactly how things are in Java even if namespace isn't a keyword in Java.)

A simple depiction of a package resembles a folder with a tab, as at the top left of Package. The contents of the package can be listed, in which case the package name goes in the tab, as at the top right of Package. Items belonging to a package can be shown outside of the package symbol, in which case they are connected using a line beginning with a circled plus going from the package to the packaged item, as at the bottom of Package. (Why yet another style of line? Well it is different to the other kinds of lines; it is depicting name scoping or name resolution .)

Packages can contain classifiers (entities, classes, interfaces, etc.) and other packages, and thus act to organize groups of packageable model elements. When something is defined in a package (or any kind of namespace) its name is also no longer a global name, thus aiding documentation and reducing name clashes. An element defined in a namespace must either be fully named, logistics::Customer for example, or its diagram can declare (import) the package if it is focusing on one package in particular.


Associations are characterizing, inter-instance relationships. The basic depiction is with a solid line. Although normally depicted between classifiers in diagrams (entities, classes, etc.) each association, like the one shown in Association, represents a set of links between the instances of the classifier.

We need to be very careful that we properly understand associations because the target audience of this book will inevitably use pointers to implement them but the UML describes them in tuple terminology (part of the terminology of mappings in mathematics, and of relational database theory). Each link in the set of links represented by an association, links exactly one instance from each end of the association relationship. (Theoretically associations can have more than two ends; more on that later. We will focus on associations with two ends; indeed it is probably best to ban associations with more than two ends.)

The above tells us that each and every link in a normal two-ended association also has two ends. That might seem rather obvious, but it means we don't use a picture of branched links even for one-to-many associations or star links for many-to-many associations. Links can therefore be fairly easily implemented by pointers in an object-oriented programming languages; pointers also cannot branch.

Since the ends of an association serve to link instances and not the classifiers themselves, it is perfectly feasible to have an "involute" or "reflexive" association, as shown in Association with same classifier at both ends.

Links can be segmented if they have to go round corners; each segment is usually straight and there is no semantic significance to the segments or vertices.

As mentioned above, an association can have more than two ends. Such associations are known as n-ary associations, as shown in N-ary association.

In fact, any association can have a diamond in the middle but we don't bother for two-ended associations; therefore as most developers will vow not to have n-ary associations, we won't see any diamonds in the middles of lines.

Association names

It is possible to give a name to an association, using a label in the center of the line, but role names are usually preferable. To help interpret the name correctly a little triangle has to tell us about the reading direction. In Association name (in the middle) we know to say "Officer investigates Crime" rather than "Crime investigates Officer"

One of the reasons association names are confusing, and role names better, is that the reading direction is fairly trivial and can be the reverse of the far more important navigability direction. More on navigability in a moment.


Associations have ends and those ends can be named. A more indicative synonym for the name of an association end is role name .

Associations will end up as pointers in the typical implementation languages. Pointers are simple things without intrinsic names of their own; instead the instance doing the pointing has an instance variable to hold the pointer (or to hold the collection object that holds the pointers) and it's that that has the name. It therefore makes more sense to name associations as the associates see them - using role names. Role names go at the ends of associations where the associate is playing a role for the instance at the other end. In Role names we see that crime instances play two roles for an officer instance: one crime is possibly the crime the officer is investigating , and there are other crimes that the officer has investigated .

Notice that if there are two associations, we show them as such. If either association were omitted, the model would lack important information.

If there is mutual characterization, in other words if each plays a role for the other, then we put role names at both ends. Try to ensure that it truly is mutual and not two single role associations in complementary directions. And try not to name one role in terms of the other; for example, "investigates" and "is investigated by"; the likelihood is that one hasn't thought carefully enough if one does that kind of thing.

Careful choice of the tense of the verb, if it is a verbish role name, can distinguish moment-of-now associations (- ing ) from associations that gather as time passes (- ed , - es ).


Many things can have a multiplicity as part of their specification, but the typical use is with characterizing relationships ( association , aggregation and composition ). In the same way as the role names the way in which one end's instances characterize the other, and in the same "direction", the multiplicity describes the number of instances at its end that are characterizing the other end. Imagine that the instance at one end writes both the role name and multiplicity, for the way in which it sees the instances at the other end, on a sticky note, and that it applies the sticky to that other end, as in Role name and multiplicity "direction".

Typically, as we have seen in many of the examples up until now, a dotted range is used, representing the "minimum" and the maximum. If a single value is given it represents the maximum number of instances.

Typical minima are 0 or 1, and typical maxima are 1 or many, many being indicated with a symbol. If there is a more specific multiplicity, however, it can be shown: 3..7 for example.

The lower bound is slightly more artificial (or subtle) than the upper bound. 0 is usually read as optional - "I am a perfectly valid instance even if my investigating link isn't actually linked to a Crime ". 1 is usually read as mandatory (i.e. not zero): "I would be a strange Officer if I had never investigated a single Crime ."

One must establish, as a project standard, whether start up conditions are to be encompassed by the multiplicities. If they are, it pretty much makes the lower bound useless as it could always be zero at some point.

The multiplicities that we saw previously, as parts of textual specifications, were enclosed in square brackets. When they are not part of a bigger text string, there are no brackets.

Earlier versions of the UML supported comma-separated, discontinuous multiplicities. I'm not quite sure why they have disappeared. One struggles to find many examples - 2, 4, 6, 8 legs (excluding gastropods) for example - but I can imagine that I might need them, and I don't think they did any harm, so I think I would prefer to keep them.

Association properties and constraints

In Role names we note that the one Crime the officer is investigating might also be one of the many investigated . If it is, we can show that. UML 1.x used to draw a little dashed arrow from one association to the other, whereas UML 2 uses text, as shown in Subset.

The other main properties (or constraints which is how UML 1.x tended to refer them), are relevant when the multiplicity is many . The current descriptions of the {set} , {bag} and {seq} (and {ordered} ) are a little unclear. The project standard should declare which you intend to use, and exactly what you mean by them. Although there is no graphical depiction, you would probably want the model repository to say what the order criterion was.

Derived associations

We can indicate that an important association is derived from more basic model elements with the same oblique at the beginning of the name of the element as is usually used. This would be typically at the front of a role name or association name or where the name would have been if the name isn't actually given.

However, unlike derived attributes, derived associations can lead to bad thoughts . Most explanations of derived associations that I have seen, would lead instantly to serious violations of the encapsulation and information hiding principles of object-orientation. I can't come up with an explanation of derived associations that doesn't violate object-oriented best practices, so while I might have them in analysis models, I wouldn't have them in design models.6


In contrast with the last point, navigability is an important thing to get nailed down for design diagrams, and isn't really worth trying to do definitively at analysis time. (Although, as we have said, the role names mean that analysis models can give valuable direction clues.) The question is, using object-oriented phrasing: given an association relationship between object instances, who do we suggest would be able to message whom directly?

In Role names, the role names hint that Officer instances would be aware of Crime instances, but that the inverse does not apply. If that turned out to be true (which we find out during design, particularly during activities like CRC), and Crime pointers would be held in Officer instances, and messages were going to go from Officer instances to Crime instances, then we would add navigability arrowheads as shown in Navigability and role names.

If we convinced ourselves that navigability was in both directions, i.e. the Crime being investigated would need to message an Officer instance, then arrowheads would go at both ends. The UML makes navigability arrows optional, but I would make them compulsory for design diagrams. In UML 2, non-navigable ends can be made explicit as well, with a cross at the end of the line, although so far I've always been happy to assume that a plain end meant non-navigable.

Navigable ends of associations can be adorned with extra information - basically all the information that could be given for an attribute or instance variable. Of these, the type is probably the most useful. Putting a type allows us to separate, in an example diagram, the pointer's type from the type of the object at the end of the pointer, which is very important for the polymorphism pillar of object-orientation. (In UML 1.x, such types were known as interface specifiers .) In Polymorphic typing we see that while the type of the pointer held in the Office class is still Crime , an example object that it is pointing at comes from the UnsolvedCrime class, which presumably implements the Crime type.

Effectively this means that there are four ways that an interface can show up: as a multibox, as a lollipop (a ball on a stick) as a dependency target and as a type in property text. This is good. As we might already have mentioned, interfaces are important.

One thing to be careful about is that you might see, for some inexplicable reason, the visibility of such roles shown as public , thus breaking the prime directive of object-orientation, which is that all instance variables, including those that hold pointers, should be private .


One way of thinking about qualifiers is that they are a bit like a key. In Qualifier, within a given flight, a seat number would lead you to exactly one Passenger . Without the qualifier the flight would have been in a many-to-many association with the passenger. However, qualifiers pose more questions than they answer. Is seat number a property of Flight ? Is it a property of Passenger ? Is it a property of an entity we don't see? In common with many others, I try to do without qualifiers; whatever they get replaced with is probably an improvement.


Like association, aggregation is a characterizing, inter-instance relationship, but with the specific semantics of objects forming other objects. One could say that the diamond was an iconic representation of the role name "has" or "contains" or any of their many synonyms.

In this book's example languages, with the notable exception of C ++ , there isn't an obvious way to differentiate the two syntactically; but the distinction, often established during analysis, is usually a useful one. Establishing the distinction can be difficult however, so for both analysis and design models an easier differentiation is often sought. A common one - mine at least - is existence-dependence.

In Aggregation and existence-dependence the modeler was first tempted, possibly just because of the collective nature of the noun "gang", to model the top relationship as aggregation. On reflection however it was decided that Criminal instances were regular instances that could live an independent life of their own or that could be in association with Gang instances. Criminal instances that ended up in the relationship were not brought into being solely to help form a Gang instance; they were not existence-dependent. A PointOfSimilarity , on the other hand, serves no purpose other than to help form a FingerprintMatch ; they would not go wandering off on their own; they are existence-dependent.

The small diamond goes at the aggregate end rather than the component end. If you find that counter-intuitive, you can do what many do, which is to put a navigability arrow in, which would always go from the aggregate - the characterized - to the components - the characterizers, as in Aggregation and navigability.

(If you do put the navigability arrow on aggregation relationships, then you can use the mnemonic for remembering the direction of all the arrows in the UML: "the arrow goes from the one that knows". In all7 the directed relationships in the UML there is always one end that, in some sense, is aware of its participation whereas the other end isn't. One participant declares the other, messages the other, instantiates the other, etc.)

It is normal to omit the multiplicity of the aggregate, 1..1 being assumed. The aggregated multiplicity would be provided as normal. If the only role name you can think of is has or its synonyms, don't put it in. If, on the other hand, you can think of a better, existence-dependent role name, then by all means put it in.


There is a "stronger" form of aggregation - composition - depicted with a filled-in diamond, that's easier to define. It's where the component end physically forms part of the composite. In C ++ the memory of a component instance can exist within the memory of the composite instance. Most other object-oriented programming languages can't do that however (they only have objects referencing, i.e. pointing at, other objects); but we can still find a meaning for the filled-in composition diamond though - coincident lifetime . If the component instances are constructed during the construction of the composite instance, and if the component instances are destroyed as the composite is destroyed, we can still use the filled-in composition diamond. See Composition or coincident lifetime.

In the ordinary aggregation, while the lifetime of the "component" exists with the lifetime of the "composite", the lifetimes are not coincident.

This is usually a design distinction. Even when it is possible to make the distinction in the subject matter, which it often isn't, it doesn't help us a great deal.


Constraints are expressions that refer to elements of the model, and that must evaluate true. If you have programmed with assert macros or keywords to provide your code with preconditions, postcondition or invariants, you have used constraints. The UML doesn't specify the constraint expression language, but OCL is an obvious candidate. One could use a programming language like Java, but it only has simple logic whereas languages like OCL have higher-order logics, as explained in Object Constraint Language.

Typically constraints are important enough that they might have a graphical presence. The expression would be enclosed in curly braces.

It is very difficult to write model constraints so that they are interpreted correctly. Use them when you really need them. Use them carefully. Get them checked. My preference is that the constraint expression would have the same, unambiguous interpretation even if it were read out of the context of the diagram. Constraint expressions near association ends are often a problem; it depends on how careful your CASE tool is.8 Such expressions start off, for example, by applying to the instances participating in the association, but once a little bit of editing has moved things around, end up looking like they apply to every instance of the classifier. As I say, my preference, although it's more work, is that such constraint expressions, taken as an example, explicitly refer to the instances in the association.

If one model element is the principal constrainee, then the curly brace enclosed constraint expression can be drawn against that element, as in Simple constraint.

(Incidentally, I've just spent ten minutes thinking about whether I put that constraint in the right place. And I'm still not absolutely sure! The full OCL expression9 is:

context Officer inv:

self.solved->reject( isSolved() )->isEmpty()

My initial reaction was to put the constraint near the solved association end, but that wouldn't be correct given the context.

Another option for constraints is to put them in a note symbol that can use callout lines, as in Constraint as note. This is what I tend to prefer.

If two elements are being constrained then a dashed line can connect them and bear the curly brace enclosed constraint expression. If several paths are being constrained, the dashed line bearing the constraint expression can cross them. Again, however, I think I would prefer a note enclosed constraint expression with callout lines to the constrained elements.

There are predefined constraints: the {xor} constraint, for example. We have already encountered another predefined constraint - as part of a text string - the {subsets} constraint (Subset). The {xor} constraint alters the normal semantics of a classifier in multiple associations. Normally an instance of such a classifier could participate in any or all of several associations of its classifier. With {xor} an instance is constrained to participate in just one constrained association, as in {xor} constraint.

If you use the {xor} constraint, there are a couple of things to be wary of. Be very careful with any multiplicity minima. If you were to put them in as 1.. then the model would be in conflict. Also beware that such things are probably telling the object-oriented designer that there might be a polymorphic type needed, covering the constrained classifiers and making things simpler and more flexible. {xor} simplified and object-oriented shows one possibility.


Generalization , as illustrated in Generalization, is a general-purpose concept in the UML infrastructure and several kinds of model element can be in such relationships.

Between classes, the UML infrastructure suggests the more specific form of the relationship: inheritance .

Instances of the specialized end, the other end to the triangle, can be considered to be instances of the generalized end. This relationship respects what is known as the substitutability principle. (We see that this relationship wouldn't cover private inheritance in C ++ ; in fact, there doesn't seem to be a way to depict private inheritance any more. Prior to UML 2.0 there was generalization's «implementation» stereotype for private inheritance; but it isn't mentioned in 2.0, nor is it in the list of retired stereotypes.)

(Another UML 2 issue concerns multiple inheritance. C ++ allows a class to inherit implementation from more than one base class (superclass). There are potential conflicts if that happens. What if some of the base classes themselves have common base classes? How many copies of the base classes' base class' data members are inherited? C ++ resolves this with the virtual keyword. The UML has never had, as far as I know, such a keyword but we could consider adding «virtual» to be clear.

There is another conflict. What if multiple base classes have member functions (methods) with the same signature; which one would an instance of the derived class (subclass) use? C ++ has rules about this. UML 2 says that it has rules about this but I'm darned if I can find them (ptc/03-08-02).)

In this book, we have stuck to the term generalization for use with entities - for depicting the is-a-kind-of, type sharing relationship between subject matter entities. The term inheritance was used for the type-sharing, and potentially implementation-sharing, relationships between classes. Both are shown with the solid line terminating in a triangular arrowhead, as in Generalization. The same kind of relationship is shown when one interface is related to another interface; I'm not sure whether to call it inheritance or generalization. Java programmers will at least find the symbols easy to remember because in Java the same term is used between classes and between interfaces: in Java, interfaces extend interfaces and classes extend classes (but classes implement interfaces), as shown in Inheritance generalization (and implementation).

Generalization sets

In subject matter models, and perhaps in design models for technologies other than object technology, elaborate classification schemes can be depicted with generalization sets .

Normally the use of comb-style (shared target) generalization relationships versus separate line style has no semantic meaning. Normally names are not used on generalization relationships. If names are added to generalization relationships, however, then the line styles start to make a difference.

Let me warn you to put aside any normal (Smalltalk, Java, C# or C ++ ) object-oriented perceptions at this point. If you try to relate the following to normal object-oriented thinking, you'll go nuts. I have only included this section in case you see generalization sets and do start to wonder if you're going insane. Notice that, as with most of the examples you will see in the literature, I've had to omit the keyword (the "stereotype"). Whatever is being classified in Generalization sets, isn't an instance of a typical entity, type or class. If your language permits an object to be an instance of more than one class, then generalization sets become more relevant.

What Generalization sets is saying is that there are three Dish generalization sets, and that a Dish can be classified as a spicy, vegetarian starter (vegetable samosa, for example), or a bland, meaty dessert (gooseberry gelatin parfait, for example) but not as a main starter dish or a spicy bland dish. If you don't like the comb (shared target) style notation, or if your CASE tool doesn't support it, you can still use separate lines and indicate the generalization sets with dashed lines across the relevant lines, as shown in Alternative notation for generalization sets.

There's one more thing in this area. And again, what I'm about to say makes no sense in the context of any of the typical object-oriented languages. The above example is using the default of {disjoint} membership. If an object of the general classifier could belong to more than one of the specialized classifiers in a set then you would add the constraint {overlapping} near, or in place of the generalization label. Also, if the model repository has all the possible specializations, whether or not they are all actually showing in the diagram, i.e. if every instance of a particular general classifier is also an instance of at least one of its specific classifiers for the generalization set, then the {complete} constraint would be added. The default is {incomplete} .

(It is talking about "objects of the general classifier belonging to specialized classifiers" for which there is no reasonable object-oriented interpretation.)


We have already seen the implementation relationship, when we first mentioned interfaces.

When an implementation class declares that its instances will honor an interface, we say that the class implements the interface. This is depicted with a triangular arrowhead, like inheritance but, perhaps because it's simpler and less likely to get us into trouble, it has a dashed line style, as shown in Classes implementing an interface.

One of the important things about interfaces and the implementation relationship, is that it gives an easy way for an object to carry several types without the risk of carrying several sources of implementation (the risk that makes multiple inheritance of implementation a rarely used device).

Data types

We have already met the enumeration on Graphic depiction of an enumeration. Enumerations are fairly important. In general if it is necessary to draw attention to a data type, it can be depicted with a box with the «datatype» label.10 A datatype has values that are pure values and exhibit no identity.

Objects (of class type) exhibit value (although we tend to say exhibit state), and often exhibit identity . All application (or "business" or "entity") objects exhibit identity. There are technical (or "value" or "attribute") objects that have weak identity. By identity we mean that they have individuality; two object instances could have the exact same state (value) and yet be quite distinct individuals with their own identity.

Dependency relationships

There is a sort of "everything else" relationship in the UML - the dependency relationship - a dashed line with a stick arrowhead, as shown in Instantiate dependency. This is reasonable. Any notation has to strike a balance between enough symbols that you get the picture, yet not so many symbols that no-one can remember what any of them mean. As with some of the rectangles, keywords are used to distinguish finer nuances of the dependency relationship.


A «derive» dependency could, for example, show something that was logically redundant, but which had been implemented for efficiency reasons.

A «refine» dependency could show how a model element evolved. It could relate an analysis «entity» to the «type» or «implementation class» it became.

A «trace» dependency relates the same concept in different models (whereas the «refine» relates a concept to another concept it has evolved from).


An «instantiate» dependency relates one classifier to another whose instances it instantiates.


A «permit» dependency would be called a friend by a C ++ programmer. It shows one model classifier granting special permission to access its otherwise inaccessible features.


A «realization» dependency isn't strictly defined; however, the more specific form implements is, and is very useful. The UML says: "Realization can be used to model stepwise refinement, optimizations, transformations, templates, model synthesis, framework composition, etc."


A «substitute» dependency has similarities with implements but a substitute isn't formally a specialization. In the "STL" corner of the standard C ++ library, for example, there are lots of concepts, such as the concept of an iterator . Now whereas in Java, Iterator is implemented as an interface , and mandates what its implementing classes must provide, for efficiency reasons iterator is not implemented as anything other than documentation in C ++ . In a physical C ++ design diagram (and I'm not saying you should, because this is one of those times when text is probably absolutely fine) you could show an iterator interface, but indicate that the iterator classes were in a substitution relationship with it.


A «use» dependency is where one element makes use of another, but where there is no other structural relationship between the two elements. In UML 1.x we used to use more specific forms of «use» . I found them sufficiently useful that I will probably continue to use them (and hope they're reinstated). They are «argument» , «local» , «return» (although that never was official) and, if you must, «global» .

Sometimes when I look at design patterns, for example, I still have "Aha!" moments - those moments when the mists lift and suddenly one sees how something works in an elegant and clever way. Quite often the insight revolves around the relationships: "So this object is an «associate» of this object but it sends itself as argument with that message; so here, in fact, it is actually an «argument» use relationship that supports the pattern; which is clever; because the coupling is loosened, since we don't need a bidirectional association."

In fact, if one were allowed to put these use keywords in interaction diagrams they would be even more useful.

Association classes

Something that was often found in entity-relationship models and semantic data models is the association class . (I usually call it the association entity because, while such things might be found in my subject matter models, they would not be found in my design models since for me, they are not indicative enough of what needs to be implemented. I have this fear I cannot shake off, that three different programmers could form three different interpretations as to what I meant, and that only one of them would be right.)

The UML says that an association class is both an association and a class. "The semantics of an association class is a combination of the semantics of an ordinary association and of a class." I have no idea what that means. So I think of an association class as an existence-dependent class. We have one kind of existence-dependent class already: the "component" end instances of an aggregation are often more clearly understood as being dependent for their very existence on forming the aggregate.

With the association class, however, we have a class whose instances aren't so much dependent upon another class' instances for their existence, as upon a relationship. An association class' instances only exist in order to help associate instances of the other two classes.

In common with several things in the UML, like bidirectional associations and n-ary relationships, association classes might have been crystal-clear when they were first proposed for semantic data models, but they are not necessarily clear for objects. I guess that this is part of the reason why I tend to use them, if I use them at all, in analysis models where we are closest to the spirit of their origins.

In trying to understand exactly what we mean, we can look at how we might expect to implement an association class. In each case, in Implementing association classes, the dashed line of Association class has "become" the 1..1 multiplicities that the associate instance is required to be in, and the original multiplicities have "moved" to become the non-existence-dependent entities' perception of the associating instances. The double 1..1 mandatory links of the associating instances represent the existence-dependence.

One important thing to be aware of - something that colors a lot of the difficulties of exact interpretation of these things - is that the UML frequently talks of associations, even simple ones, as though they had a life of their own.11 For us regular object-oriented developers however, associations do not have a life of their own. Associations depict the identity-based characteristics of our objects, while attributes depict the value-based characteristics. Attributes and associations are both characteristics of the objects though; associations do not go wandering around on their own. For example, earlier it was recommended that you didn't name associations; that you name association ends (roles) instead. This was partly to discourage the perception that an association is a thing in its own right. When we use an association class, however, the association has become something in its own right; and the name of the association class is the name we might have been tempted to put "in the middle" of an association had there not been an association class.


The relationship between powertypes and types is a similar relationship to that between powersets and sets. A powerset is a set whose members are sets; and a powertype is a type whose instances are types.

If we use subtypes only as types, in inter-class and inter-type relationships, then nothing special is required. If, however, we need to show an inter-instance relationship to a subtype, an association for example, then we need a powertype.

The example in the text was that of Criminal instances who were suspected of specializing in kinds of Crime , where kinds of crime were modeled as specialization classes of a Crime generalization. Another example would be kinds of Vehicle . We might model each Vehicle kind - Car , Bus , Roadroller , etc. - as a subclass of Vehicle ; and we might want Driver instances in a licensing system to be associated with the Vehicle types they were allowed to drive, but not to associate them to particular Car or Bus instances.

A powertype is depicted, as shown in Powertype,with the usual kind of type label, next to the generalization relationship with the subtypes that are the instances of the powertype.

Composite structure diagrams

Sequence diagrams (Sequence diagrams) were added early on in the UML's history. It was quickly realized that they brought an immensely important validation of structural designs. Sequence diagrams focus on instances and their message passing. Composite structure diagrams , which are new to UML 2.0, are class diagrams that focus on instances and their internal structure, providing examples of how the static architecture will achieve a requirement.

(And there are also communication diagrams (Communication diagrams) showing instances, external structure and messages, essentially an alternative portrayal of the same information as a sequence diagram has.)

Most of the symbols in a composite structure will already be known from class diagrams. The main new symbols are:

  • nested box
  • collaboration
  • connector
  • port
Structure and nesting

Unlike uses of nesting that are in other UML diagrams, nesting in a structure diagram indicates composition if the part within has a solid outline or reference ("association") if the part within has a dashed outline. It is effectively an alternative to the association and aggregation/composition relationships. A structure diagram can use a mixture of relationships and nesting. A number in the top right of a nested classifier indicates the multiplicity of its relationship with the containing classifier. Structure diagram shows an example.


This is a dashed ellipse surrounding collaborators. It is more of a comment really. It alerts the reader to the illustrative nature of the chosen collaborators.

In addition to drawing a collaboration around a composite structure, one could draw an ellipse around any selection of elements of structure diagrams that were performing an illustrative role. It is often a very good idea, for example, to accompany a sequence diagram with an illustrative structure diagram showing the types and classes of the sequence diagram's instances, and their relevant relationships. Such a diagram could sport a collaboration ellipse. Collaboration shows a collaboration ellipse.

I find that I don't use it so much when printouts are commonly required. It tends to "waste" our normally rectangular paper's available space.


Connectors encompass instance links of associations and aggregations, but they also include other, more transitory relationships, such as usage relationships. For example, the relationships between the custody sergeant and the arrest in Collaboration, is a parameter usage relationship.


A port aids understanding of how a part interacts with its environment. That means it's going to be just as useful in the diagram type coming up next (component diagram).

With a port we can indicate groups of interfaces that are provided and required in particular kinds of interaction, as shown in Ports

Component diagrams

There is little new in the way of elements that a component diagram will contain. Essentially, it contains a subset of the elements that might be found in a class diagram and a composite structure diagram. Its mission, however, is to support what has become known as component-based development , and clearly depict components . There is no universally recognized definition of a component , or of the difference between a component and an object. Here, though, is my version of the nature of a component.

One of the original hopes for objects was that they would be software ICs (integrated circuits), buyable and pluggable just like ICs. That didn't happen. It's very difficult for an object or its class to survive in the big wide world, outside of its programming language. Visual Basic controls were one of the first successful attempts to produce components that were buyable and pluggable. They weren't written in Visual Basic but part of their success was the prevalence and predictability of the infrastructure in which they would work - Visual Basic.

Looking around for a more modern and more object-oriented example, we might encounter Enterprise JavaBeans (EJB). Essentially one takes a useful Java object like a customer or a flight - an object that one wishes to make available and visible outside of its programming language - one supports it perhaps with other objects that will not themselves be visible to the outside world, one dresses the object in some survival gear (some management methods for example), one writes up (in XML) some clear contracts as to how the object (component) is to be used, one writes up (in XML again) some sample notes (deployment descriptors) as to how the object (component) could be deployed (e.g. its persistence and transactional characteristics), one zips all that up into a file with a shipping note, and voilà, one has one's component. The predictable infrastructure within which such an enterprise bean can be confident it can function is the EJB framework, which has been standardized and is available from several open source projects and from several vendors.

The UML's description of a component is quite useful: "a component represents a modular part of a system that encapsulates its contents and whose manifestation is replaceable within its environment."

We can see that the original hope - that any old object could be a software IC - was a tad naïve, but we are starting to figure out just what is necessary. Any more on component-based development would be outside the scope of this book, but if you are interested, a good way to start would be with a look at EJB.

The main extra as far as the UML is concerned is an iconic embellishment to indicate that a classifier is a component . A component is essentially a class, but to indicate its externalization and componentization a little "Gradygram" can be drawn in the corner; alternatively a keyword can be used.

A Gradygram is a box with two other little boxes sticking out; it was one of the first ways that objects were ever portrayed in diagrams. (UML 1.x had rather different things called components and they were depicted with full-size Gradygrams.) Component shows an example of a component.

Components can engage in the usual kinds of class relationships. Component boxes can have compartments as usual. They can list instance variables and methods. The compartments of a component can also list provided and required interfaces - one of the most important specifications of a component.

A component shows an example of a component using a port with two interfaces. Notice that this component has no required interfaces, functions via callbacks and is therefore likely to be more reuseable.

Deployment diagrams

Deployment diagrams show how the fruits of the development - beans, class files, executable files, dynamic link libraries and so on - get hosted by devices and execution environments, and how they communicate with one another.

Three-dimensional box symbols represent nodes . These can be devices - like client machines and server machines - or execution environments - like JSP containers or EJB containers. Ordinary boxes with an «artifact» keyword or a "dog-ear" icon represent the deployed items like the beans or executables.

Deployment specifications represent things like property files or deployment descriptions - the kinds of things that are increasingly written in XML to provide the values for deploy-time variation - things like security and transactional characteristics.

There are a variety of relationships. A simple line indicates a communication pathway. A dependency arrow indicates exactly that. A «manifest» relationship is a keyworded, directed, dashed line with a stick arrowhead, and indicates how model elements like components, classes or objects are deployed within an artifact. Frequently the deployment of artifacts to nodes is shown by nesting boxes, but a directed, dashed line with a stick arrowhead bearing the «deploy» keyword can be used as an alternative.

Deployment diagram has an example deployment diagram. The Booking of the design happens to be manifested as an Enterprise JavaBean within an enterprise archive artifact. The EJB container hosting the bean runs on a particular kind of platform, accessed by other kinds of platforms.

Just about the only deployment diagram element I didn't manage to squeeze into that example is the deployment specification, but the details of the Booking 's deployment as a bean would have been described in an XML file such as ejb-jar.xml . That could have appeared in the diagram as a box, as shown in Deployment specification.


<previous  1  2  3  next>

End notes

4. Realization is the UML infrastructure's general term for this kind of relationship; implements is its specific form when found between a class and an interface.

5. If, like me, you can't always remember which is which, most formed words use -able, so if the root is a word in its own right, like laughable, then it's probably -able, if the root isn't a word in its own right, like risible, then it's probably -ible. The number of -ible words is small and fairly fixed.

6. You might be tempted to say something along the lines, "If I follow this link from an A to a B , and then follow this link from the B to some C s, why I've got a derived link from A to C s." Well, who is this you? You the programmer won't be there. Typically there won't be some query engine there. This isn't the design of a relational database with a join mechanism. And anyway, if you follow a link to a normal object, you hit a brick wall because any ongoing links are in private instance variables.

7. Anyone more expert than I in UML 2.0, do let me know if any exceptions have arisen.

8. The UML does say that a tool must make it possible to determine the constrained element.

9. inv means invariant and -> is navigation; the rest, hopefully, is fairly obvious.

10. At its defining point in the standard, despite all the other examples being to the contrary, this is shown as «dataType», but as keywords ("stereotypes") are mostly all lower-case, let's keep it that way.

11. The definition of an association (ptc/03-08-02) is as "a set of tuples whose values refers [ sic ] to typed instances." This perception either means that the UML is trying to make it difficult to implement its models in object-oriented languages like Java, C ++ and C#; or that the UML is hinting very strongly (and I don't necessarily disagree) that our object-oriented programming languages desperately need declarative relationships.