The previous article "Does OOP really separate interface from implementation?" gave an example when blindly following OOP principles may lead to rather subtle errors: a "transparent" change in an implementation suddenly breaks client code that was written according to public interfaces. The article used a familiar example of Bags and Sets. The common criticism was that the example violated the Liskov Substitution Principle (LSP), and that a Set is not a subtype of a Bag.
This article will demonstrate that a set truly is-a bag; a set is substitutable for a bag, a set can always be manipulated as a bag, a set maintains every invariant of a bag -- and it is still a set. And C++ can be a pure functional language. We will prove that all the above assertions hold at the same time.
The article will also show that if we abide by practically checkable rules we obtain a guarantee that the subtle subclassing errors cannot occur in principle. The rules do not diminish the power of a language. The Liskov Substitution Principle (LSP) cannot be practically checkable as it is generally undecidable [LSP].
To make clear how the proposed rules preclude the subclassing problem illustrated before, this article follows the format of the previous one. We aim to give a more-or-less "real" example, which you can run and see the result for yourself. By necessity the example had to be coded in some language. The present article uses C++, again, for the sake of comparison. It appears however that similar code (with similar conclusions) can be carried on in many other OO languages (e.g., Java, Python, etc). For style comparisons, one example is coded in Haskell. Makefiles in the complete code archive [Code] tell how to build projects and run the validation tests and examples.
Suppose I was given a task to implement a Bag -- an unordered collection of possibly duplicate items (integers in this example). This time my boss laid out the rules, which we will refer to as BRules:
.h
file)
The rules break the major tenets of OOP: for example, values no longer have a state that is separate from their identity. Prohibitions on virtual methods and on modifications of public objects are severe. It appears that not much of C++ is left. Surprisingly I still can implement my assignment without losing expressiveness -- and perhaps even gaining some. The exercise will also illustrate that that C++ does indeed have a pure functional subset [CPP-multiparadigm] [FCPP], and that you can program in C++ without assignments.
class FBag { public: FBag(void); FBag(const FBag& another); // Copy-constructor ~FBag(void); private: class Cell; // Opaque type const Cell * const head; FBag(const Cell * const cell); // Private constructor // Declaration of three friends elided }; |
int size(const FBag& bag); // The number of elements in the bag int count(const FBag&, const int elem); // Count the number of occurrences // Put an element into the bag FBag put(const FBag& bag, const int elem); // Remove an element from the bag. No error if the element didn't exist FBag del(const FBag& bag, const int elem); // If a bag is empty, return the seed // Otherwise, return // functor(bag.e1,functor(bag.e2, ... functor(bag.en,seed))) template<typename Functor, typename Seed> Seed fold(const FBag& bag, const Functor& functor, const Seed seed); // Standard "print-on" operator ostream& operator << (ostream& os, const FBag& bag); // Union (merge) of the two bags FBag operator + (const FBag& bag1, const FBag& bag2); // Structural equivalence of the bags // Two bags are equal if they contain the same number of the same elements inline bool operator == (const FBag& a, const FBag& b) { return a <= b && a >= b; } // Verify bag's invariants void verify(const FBag& bag); |
It must be stressed that the functions that operate on a FBag are not FBag's methods; in particular, they are not a part of the class FBag, they are not inherited and they cannot be overridden. The implementation is also written in a functional style. FBag's elements are held in a linked list of cells, which are allocated from a pre-defined pool. The pool implements a mark-and-sweep garbage collection, in C++.
Forgoing assignments does not reduce expressiveness as the following snippets from the FBag implementation code show.
// Union (merge) of the two bags struct union_f { FBag operator() (const int elem, const FBag seed) const { return put(seed,elem); } }; FBag operator + (const FBag& bag1, const FBag& bag2) { return fold(bag1,union_f(),bag2); } // Standard "print-on" operator struct print_f { ostream& os; const bool is_first; print_f(ostream& _os, const bool _is_first) : os(_os), is_first(_is_first) {} print_f operator() (const int elem, const print_f seed) const { return print_f(seed.os << (seed.is_first ? "{" : ", ") << elem, false); } }; ostream& operator << (ostream& os, const FBag& bag) { print_f seed(os,true); print_f result = fold(bag,seed,seed); return result.os << (result.is_first ? "{}" : "}"); } |
Following good practice, I wrote a validation code (file vFBag.cc [Code]) that tests all the functions and methods of the FBag package and verifies common invariants.
Suppose you are tasked with implementing a Set package. Your boss defined a set as an unordered collection where each element has a single occurrence. In fact, your boss even said that a set is a bag with no duplicates. You have found my FBag package and realized that it can be used with few additional changes. The definition of a Set as a Bag (with some constraints) made the decision to reuse the FBag code even easier.
class FSet : public FBag { public: FSet(void) {} FSet(const FBag& bag) : FBag(remove_duplicates(bag)) {} }; bool memberof(const FSet& set, const int elem) { return count(set,elem) > 0; } |
Surprisingly, that is the whole implementation of a FSet. A set is fully a bag. Because FSet constructors eventually call FBag constructors and do no alter the latter's result, every post-condition of a FSet constructor implies a post-condition of a FBag constructor. Since FBag and FSet values are immutable, the post-conditions that hold at their birth remain true through their lifespan. Because all FSet values are created by an FBag constructor, all FBag operations automatically apply to an FSet value. This concludes the proof that an FSet is a subtype of a FBag.
The FBag.cc package [Code] has a function
verify(const FBag&)
that checks to make sure its
argument is indeed a bag. The function tests FBag's invariants, for
example:
const FBag bagnew = put(put(bag,5),5); assert( count(bagnew,5) == 2 + count(bag,5) && size(bagnew) == 2 + size(bag) ); assert( count(del(bagnew,5),5) == 1 + count(bag,5) ); |
Your validation code passes a non-empty set to this function to verify
the set is indeed a bag. You can run the validation code vFSet.cc [Code] to see for
yourself that the test passes. On the other hand, FSets do behave like
Sets:
const FSet a112 = put(put(put(FSet(),1),1),2); assert( count(a112,1) == 1 ); const FSet donce = FSet() + a112; const FSet dtwice = donce + a112; assert( dtwice == a112 ); |
where a112
is a non-empty set. The validation code
vFSet.cc you wrote contains many more tests like the above. The code
shows that a FSet is able to pass all of FBag's tests as well
as its own. The implementation of FSets makes it possible to
take a union of a set and a bag; the result is always a bag, which can
be made a set if desired. There are corresponding test cases as well.
To clarify how an FSet may be an FBag at the same time, let's consider one example in more detail:
// An illustration that an FSet is an FBag int cntb(const FBag v) { FBag b1 = put(v, 5); FBag b2 = put(b1, 5); FBag b3 = del(b2, 5); return count(b3, 5); } const int cb1 = cntb(FBag()); // cb1 has the value of 1 const int cb2 = cntb(FSet()); // cb2 has the value of 1 // An illustration that an FSet does act as a set int cnts(const FSet v) { FSet s1 = put(v, 5); FSet s2 = put(s1, 5); FSet s3 = del(s2, 5); return count(s3, 5); } const int cs = cnts(FSet()); // cs has the value of 0 |
cs
has
the value different from that of cb1
if there is no custom
del()
function for FSets?
The statement
FSet s2 = put(s1, 5);
is the most illuminating. On the right-hand side is an expression:
putting an element 5 to a FBag/FSet that already has this element in
it. The result of that expression is a FBag {5,5}
, with
two instances of element 5. The statement then constructs a FSet
s2
from that bag. A FSet constructor is invoked:
class FSet : public FBag { public: FSet(void) {} FSet(const FBag& bag) : FBag(remove_duplicates(bag)) {} }; |
The constructor takes the bag {5,5}
, removes the
duplicate element 5 from it, and "blesses" the resulting FBag to be a
FSet as well. Thus s2
will be a FBag and a FSet, with one
instance of element 5. In fact, s1
and s2
are identical. A FSet constructor guarantees that a FBag it constructs
contains no duplicates. As objects are immutable, this invariant holds
forever.
Surprising as it may be, assertions "a Set is a Bag with no duplicates" and "a Set always acts as a Bag" do not contradict each other, as the following two examples illustrate:
Let ( value ... ) be an unordered collection of values: a
Bag. Let's consider the following values:
|
Let uf-integer denote a natural number whose prime factors are
unique. Let's consider the following values:
|
We introduce operations
|
We introduce operations
|
Bags are closed under operation We may wish for a merge-like operation that, being applied to Sets, always yields a Set. We can introduce a new operation: merge-if-not-there. We can define it specifically for Sets. Alternatively, the operation can be defined on Bags; it would apply to Sets by the virtue of inclusion polymorphism as every Set is a Bag. Sets are closed with respect to merge-if-not-there.
On the other hand, to achieve closure of Sets under |
Integers are closed under operation We may wish for a multiply-like operation that, being applied to uf-integers, always yields a uf-integer. We can introduce a new operation: lcm, the least common multiple. This operation is well-defined on integers; it would apply to uf-integers by the virtue of inclusion polymorphism as every uf-integer is an integer. uf-integers are closed with respect to the lcm operation.
On the other hand, to achieve closure of uf-integers under
|
It has to be stressed that the two columns of the above table are not merely similar: they are isomorphic. Indeed, the right column is derived from the left column by the following substitution of words:
It may appear troublesome that
count(put(put(FSet(),1),1),1)
yields 2, while
count(FSet(put(put(FSet(),1),1)),1)
yields 1. However, this example is not much different from the
following:
cout << (1 + 0.5) <<
endl;
will print 1.5, yet
cout << int(1+0.5) <<
endl;
will print 1. By the same token, expressions
(int)(1/3.0 + 2/3.0 + 4/3.0)
(int)(1/3.0) + (int)(2/3.0) +
(int)(4/3.0)
will yield different results, although in both cases it will be an
integer.
Often introducing intermediary variables, with an explicitly declared
type, helps disambiguate the expressions and clarify the
meaning. Generally writing expressions with FBags and FSets is not very
different from writing arithmetic expressions with integer and real
numbers. One has to watch the type of every operation.
From an extensional point of view [CARD-1985], a type denotes a set of values. By definition of a FSet, it is a particular kind of FBag. Therefore, a set of all FSets is a subset of all FBags: FSet is a subtype of FBag. A FBag or a FSet do not have any "embedded" behavior -- just as integers don't have an embedded behavior. Behavior of numbers is defined by operations, mapping from numbers to numbers. Any function that claims to accept every member of a set of values identified by a type T will also accept any value in a subset of T.
Frequently a value can participate in several sets of operations: a value
can have several types at the same time. For example, a collection
( 42 )
is both a Bag and a Set. This fact should not be
surprising. In C++, a value typically denoted by a numeral
0
can be considered to have a character type, an integer
type, a float type, a complex number type, or a pointer type, for any
declared or yet to be declared pointer type.
This lack of behavior is what puts FBag and FSet apart from CBag and CSet discussed in the previous article. FSet is indeed a subtype of FBag, while CSet is not a subtype of a CBag as CSet has a different behavior. Incidentally LSP is trivially satisfied for values that do not carry their own behavior. FBags and FSets are close to so-called predicate classes. Since instances of FSets are immutable, the predicate needs to be checked only at value construction time.
This section has talked about subtyping rather than subclassing or inheritance. Subtyping is a far more general concept: it applies to languages which have no classes (e.g., Matlab) and to types that are not related by inheritance (e.g, char and int, which are both primitive types).
The FSet/FBag example above showed BRules in the context of subtypes formed by a restriction on a base type. As it turns out, BRules work equally well with existential (abstract) types.
|
The code archive [Code] contains a third directory,
Shapes
, with three source code
files. The files illustrate various ways of handling a collection of
polymorphic values. The collection is populated by Rectangles and
Ellipses. They are instances of concrete classes implementing a common
abstract base class Shape. A Shape is an existential type that knows
how to draw, move and resize itself. A file Shapes-oop.cc gives the conventional,
OOP-like implementation, with virtual functions and such. A file Shapes-no-oop.cc is
another implementation, also in C++. The latter follows BRules, has no assignments or virtual functions. Any
particular Shape value is created by a Shape constructor and is not
altered after that. Shapes-no-oop.cc achieves polymorphic programming
with the full separation of interface and implementation: If an
implementation of a concrete Shape is changed, the code that
constructs and uses Shapes does not even have to be recompiled! The
file defines two concrete instances of the Shape: a Square and a
Rectangle. The absence of mutations and virtual functions guarantees
that any post-condition of a Square or a Rectangle constructor implies
the post-condition of a Shape. Both shapes can be passed to a function
set_dim(const Shape& shape,
const float width, const float height);
Depending on the
new dimensions, a square can become a rectangle or a rectangle
square. You can compile Shapes-no-oop.cc and run it to see that fact
for yourself.
It is instructive to compare Shapes-no-oop.cc with Shapes-h.hs, which implements the same problem in a purely functional, strongly-typed language Haskell. All three code files in the Shapes directory solve the same problem the same way. Two C++ code files -- Shapes-oop.cc and Shapes-no-oop.cc -- look rather different. On the other hand, the purely functional Shapes-no-oop.cc and the Haskell code Shapes-h.hs are uncanny similar -- in some places, frighteningly similar.
This exercise shows that BRules do not constrain the power of a language even when abstract data types are involved.
It is known, albeit not so well, that following the OOP letter and practice may lead to insidious errors [OOP-problems]. The previous article showed how subtle the errors can be even in simple cases. In theory, there are rules -- LSP -- that could prevent the errors. Alas, the rules are in general undecidable and not practically reinforceable [LSP].
In contrast, BRules introduced in this article can be statically checked at compile time. The rules outlaw certain syntactic constructions (for example, assignments in some contexts, and non-private methods) and keywords (e.g., virtual). It is quite straightforward to write a lint-like application that scans source code files and reports if they conform to the rules. When BRules are in effect, subtle subclassing errors like the ones shown in the previous article become impossible. To be more precise, with BRules, subclassing implies subtyping. Subclassing by definition is a way of creating (derived) values by extending, restricting, or otherwise specializing other, parent values. A derived value constructor must invoke a parent value constructor to produce the parent value. The former constructor often has a chance to alter the parent constructor's result before it is cast or incorporated into the derived value. If this chance is taken away, the post-condition of a derived value constructor implies the post-condition of the parent value. Disallowing any further mutations guarantees the behavioral substitutability of derived values for parent values at all times.
As the examples in this article showed, following BRules does not diminish the power of the language. We can still benefit from polymorphism, we can still develop practically relevant code. Yet BRules blur the distinction between the identity and the state, a characteristic of objects. BRules are at odds with the practice if not the very mentality of OOP. This begs the question: Is OOP indeed conducive to software development?
One can argue that OOP -- as every powerful technique -- requires
extreme care: knives are sharp. Likewise, goto
is
expressive, and assembler- or microcode-level programming are very efficient. All of them can lead to bugs that are very difficult, statically
impossible, to find. On the other hand, if you program, for
example, in Scheme, you never have to deal with an "invalid opcode"
exception. That error becomes simply impossible. Furthermore,
"while opinions concerning the benefits of OOSD [Object-Oriented
Software Development] abound in OO literature, there is little
empirical proof of its superiority" [OOP-Perceptions].
From: oleg Message-ID: <8katsh$fmf$1@nnrp1.deja.com> <8kau0t$fn3$1@nnrp1.deja.com> Subject: Subclassing errors, OOP style and mechanical rules to prevent them Date: Sun, 09 Jul 2000 22:21:18 GMT Newsgroups: comp.object,comp.lang.functional,comp.lang.c++.moderated X-Article-Creation-Date: Sun Jul 09 22:21:18 2000 GMT
oleg-at-okmij.org