From oleg Tue Dec 6 08:49:28 CST 1994 Newsgroups: comp.lang.c++ Subject: Cool way of iterating in a local context; nested functions in C++ Summary: foreach() iterators that use an action function in its local context Followup-To: Distribution: Organization: University of North Texas, Denton Keywords: nested functions, local environments, local classes, iterators Cc: Status: RO In a nutshell, this post is about one method of writing very general foreach()-type iterators that call an iterating function within its own (and local) environment, *without* casting to (void *) and fro, without causing name conflicts, and without exposing private parts. I fully realize that I'll be severely trying reader's patience by the _long_ introduction: I apologize. I still hope I make the point at the very end. BTW, using iterators is particularly neat when the action function is defined entirely *inside* another function; it's actually possible and I'll flaunt an example. Suppose we have some (intricate) object like a matrix, image, tree, container, etc. And we have a function like void traverse(const Matrix& matrix, void (*action)(int & el)); that traverses, say, the matrix somehow (and hopefully, very optimally) and calls the 'action' to do some action on each traversed element (assumed to be int). BTW, the famous qsort() function is a less glorified example of an iterator of this kind. This works great if all we need is to, say, traverse the matrix and square each element. Squaring of an element is a pretty context-free operation, I mean, we need only the element itself and nothing more. Printing each element is actually more complicated, but fortunately 'stdout' (or 'cout') is declared (present) in a very global 'context', which everybody can access/use. But suppose we need to write each element into a file/stream (which is kind of 'local', that is, created, opened and closed all within some function), or we want to find the sum of all traversed elements. Here our 'action' function explicitly needs some 'local' context, and this makes things a bit more tangled. There appear to be two common ways of dealing with the situation: make 'em global, or carry around a pointer to the local env. Here are the snippets (sorry about triviality) >>> make a local env global static int sum; void sum_action(int & el) { sum += el; } int add_up(Matrix& matrix) { sum = 0; traverse(matrix,sum_action); return sum; } Note, the accumulator 'sum' is *logically* local to add_up(): it doesn't really make sense and useless when add_up() is finished. Making the accumulator global just doesn't look right (and opens up a possibility of name conflicts, etc). In short, it's not cool. >>> Mess around with a pointer. Here we need to modify our traverse() >>> to take an additional argument, often called 'args', 'refcon', etc. void traverse(const Matrix& matrix, void (*action)(int & el), void * refcon) { int el; for(....) { action(el,refcon); } } struct sum_env { int sum; } void sum_action(int & el, void* env) { ((sum_env*)env)->sum += el; } int add_up(Matrix& matrix) { sum_env env; env.sum = 0; traverse(matrix,sum_action,(void*)&env); return env.sum; } This technique is often used in GUI programming: Windows, Dialogs, etc have a special slot for a 'refcon' or 'user data' pointer. Borland's Container::forEach() is also of the same ilk. Well, it looks a bit better, at least the env is now really local to add_up(). Though sum_env as a name for a structure is "global" and may conflict with other struct/classes names. But the biggest snag is in casting of (sum_env *) into (void *) and back. There is no type checking in here, and sum_action() has no way of making sure that (void *) pointer it got really points to something of 'sum_env' type. Recently I stumbled on a way to make things look a bit neater. No name conflicts, local stuff always stays private, no initialization (or lack of it) problems, etc. Here is the snippet of an actual *working* program: // Class that describes which action is to be // performed on OctTree nodes being traversed. // It serves as a working environment for the // 'action' class TraverseAction { public: TraverseAction(void) {} virtual ~TraverseAction(void) {} // Takes a ref of an OctTree node and uses/ // modifies it virtual void handle_node(int & node) = 0; }; void LaplOctPyrIO::traverse(const short * offsets, TraverseAction& action) { for(...) { .... int * knode_ptr; action.handle_node(knode_ptr[offsets[0]]); ..... } } void LaplOctPyrIO::save_offset_nodes(const short * offsets) { class HistogramBuildAction : public TraverseAction { int total_node_count; int zero_node_count; public: Histogram histogram; HistogramBuildAction(void) : histogram(-600,600), zero_node_count(0), total_node_count(0) {} ~HistogramBuildAction(void) {} void handle_node(int & node) { total_node_count++; if( node == 0 ) // Don't put 0 node in the histogram { // yet - wait until a non-zero node zero_node_count++; // shows up return; // This way, trailing zeros wouldn't } // be in the histogram for(; zero_node_count > 0; zero_node_count--) histogram.put(0); // Put delayed zero nodes histogram.put(node); } int q_trailing_zeros(void) const { return zero_node_count; } int q_total_nodes(void) const { return total_node_count; } }; HistogramBuildAction hist_built; traverse(offsets,hist_built); long no_nodes_written = hist_built.q_total_nodes() - hist_built.q_trailing_zeros(); ..... } This is it. As you see, there are no casts. There are even no function pointers (on the face of it), everything is done through virtual functions. The abstract class TraverseAction provides only the interface, a bare-bone "local" context (and a hidden virtual function ptr). Derived classes may add to the context whatever they wish. The class HistogramBuildAction is 'local' to the function: a quick glance at a mangled name of, say, destructor __$_Q312LaplOctPyrIO17save_offset_nodes20HistogramBuildAction tells it all: the function name is a part of it! So, some other function can declare a (different) class with exactly the same name, HistogramBuildAction, and this would be perfectly alright, no name conflicts! Finally, look at HistogramBuildAction::handle_node(): the function is described completely _within_ another function! That's the trick: inline functions in 'local' classes are actually nested functions. Nested functions are one of the most glaring distinctions between C and Pascal, the thing that makes C in a sense "much simpler" than Pascal. But C++ catches up. And all this *really* works, I mean compiles (gcc 2.5.8) and runs. BTW, since it's possible to declare nested classes (though gcc 2.5.8 isn't very keen on them), it opens up a possibility to define functions within functions within functions, etc. If you have a comment/question, please mail me at oleg@ponder.csci.unt.edu or oleg@unt.edu. I'd appreciate that!