From posting-system@google.com Sat Jan 12 03:56:08 2002
Date: Sat, 12 Jan 2002 00:56:01 -0800
From: oleg-at-pobox.com
Newsgroups: comp.lang.scheme
Subject: The FASTEST subsets function [Was: Subsets of a list]
Message-ID: <7eb8ac3e.0201120056.3fc231c8@posting.google.com>
Status: RO

This article will show the design of the fastest ever interpreted
subsets function. Sorry I'm too excited about this. Again, we start
with the mathematical definition of the problem, which leads to a
simple, correct and stunningly efficient solution. The final, fastest
ever solution is still pure functional.
	
We start with the definition of the problem: given a set 'l' and a
number 'n', return the set of all subsets of 'l' of cardinality
'n'. Sets are represented by lists.
	
The key was to choose the right definition.
	
Let ps(L) is a powerset of L that does not include {empty set} (that
is, a singular set whose element is the empty set).
	length(ps(L)) = (- (expt 2 (length L)) 1)
	
Let L be a non-empty set and let L = A U B where A and B are two
disjoint subsets of L.
	
Obviously,
	ps(L) = ps(A U B) = 
	ps(A) U ps(B) U { y U x | x <- ps(A), y <- ps(B) }
Let (subsets L n) = (filter (lambda (el) (= n (length el))) ps(L) )
	
Thus the desired function 'subsets' is a filtered powerset ps. This
seems to be a stupid definition. Let's not be hasty however. Note that
filter and union commute: the filter of a union of sets is the union
of filtered sets. Therefore, from the previous expression
	
	(subsets L n) = (subsets (union A B) n)
	= (subsets A n) U (subsets B n)
	Union{ y U x | x <- (subsets A k), y <- (subsets B (- n k)),
	       k=1,n-1 }
	
Well, this is it. Note, we didn't say how to split L into two disjoint
subsets A and B. We can do as we wish. For example, we can choose to
split in such a way so that (length B) is n. In the most difficult
case where n = (/ (length L) 2), this corresponds to a "divide and
conquer" strategy, so to speak (at least in the first stages).
	
The Scheme code below implements this idea verbatim. It uses the
accumulator-passing style that was expounded earlier.
	
(define (subsets-v5 l n)
	
  ; The initialization function. Check the boundary conditions
  (define (loop l ln n accum)
    (cond
     ((<= n 0) (cons '() accum))
     ((< ln n) accum)
     ((= ln n) (cons l accum))
     ((= n 1)
      (let fold ((l l) (accum accum))
	(if (null? l) accum
	    (fold (cdr l) (cons (cons (car l) '()) accum)))))
     (else
      (split l ln n accum))))
	
  ; split l in two parts a and b so that (length b) is n
  ; Invariant: (equal? (append a b) l)
  ; ln is the length of l
  (define (split l ln n accum)
    (let loop  ((a '()) (b l) (lna 0) (lnb ln))
      (if (= lnb n) (cont a lna b lnb n accum)
	  (loop (cons (car b) a) (cdr b) (+ 1 lna) (- lnb 1)))))
	
  ; The main body of the algorithm
  (define (cont a lna b lnb n accum)
    (let* ((accum
	    (loop a lna n accum))
	   (accum	; this is actually (loop b lnb n accum)
	    (cons b accum))
	   )
      (let inner ((k 1) (accum accum))
	(if (> k (min lna (- n 1))) ; don't loop past meaningful boundaries
	    accum
	    (let ((as (loop a lna k '()))
		  (bs (loop b lnb (- n k) '())))
	      (inner (+ 1 k)
		     ; compute the cross-product of as and bs
		     ; and append it to accum
		     (let fold ((bs bs) (accum accum))
		       (if (null? bs) accum
			   (fold (cdr bs)
				 (append 
				  (map (lambda (lst) (append lst (car bs))) as)
				  accum))))))))))
 	 
  (loop l (length l) n '()))

	
The benchmark runs in 993 ms of user time and allocates only 36.5 MB
of memory, on Gambit-C interpreter. This is the absolute, incredible
record. Under SCM:
	
subsets-v3 (called combos by John David Stone)
;Evaluation took 1596 mSec (98 in gc) 657662 cells work, 4721364 env, 97 bytes other
	
subsets-v5:
;Evaluation took 700 mSec (322 in gc) 1708112 cells work, 610264 env, 105 bytes
other                                                                           
That is, more than twice as fast.
	
Continuing the table from the previous post:
	
Procedure       Gambit-C               Bigloo 2.4b      Bigloo 2.4b
                interpreter, s         interpreter, s   compiler, s
subsets-v0        285.0                   11.59          5.62    3.14
subsets-v1	    6.3                    5.45          2.22    0.34
subsets-v3	    8.1                    4.78          0.96    0.27
subsets-v20        14.1                    5.53          0.96    0.26    
subsets-v21         7.7                    4.88          0.66    0.26
subsets-v22         5.0                    3.18          0.62    0.25
subsets-v23         4.1                    2.86          0.82    0.25
subsets-v5	    0.9                    1.56          1.10	 0.76
	
Well, compiled code isn't that fast -- but that because subsets-v5
isn't too optimal. The append and map ought to be converted into the
accumulation-passing style. That will reduce the amount of garbage as
well. But it's past midnight.
	
The conclusion of this Friday night exercise is astonishingly trite.
What a Math teacher told us is true: we have to attack the algorithm
if we want to really big improvements. And Math rules!


P.S. Additional optimization [sent in a message to David Feuer]

> > Well, compiled code isn't that fast -- but that because subsets-v5
> > isn't too optimal. The append and map ought to be converted into the
> > accumulation-passing style. That will reduce the amount of garbage as
> > well. But it's past midnight.

I was referring to the piece of code
	(append 
	   (map (lambda (lst) (append lst (car bs))) as)
	 accum)

in subsets-v5, which we will write as
	(append
	   (map fn lst1)
	 lst2)
for clarity. The problem with this code is that map creates a list
(transformed lst1), which 'append' appends to lst2 and discards. Thus
the result of map becomes garbage. Because we consider lists as models
of sets, the code is equivalent to
	(append
	   (map fn (reverse lst1))
	 lst2)
The latter can be re-written as

	(let fold ((l lst1) (accum lst2))
	   (if (null? l) accum
	       (fold (cdr l) (cons (fn (car l)) accum))))

which creates no such garbage. The more optimized code follows. It may
have a better performance when compiled; when interpreted, it loses
out to the simple-minded subsets-v5 above because the latter relies
more on built-in functions such as append.

(define (subsets-v51 l n)
	
  ; The initialization function. Check the boundary conditions
  (define (loop l ln n accum)
    (cond
     ((<= n 0) (cons '() accum))
     ((< ln n) accum)
     ((= ln n) (cons l accum))
     ((= n 1)
      (let fold ((l l) (accum accum))
	(if (null? l) accum
	    (fold (cdr l) (cons (cons (car l) '()) accum)))))
     ((= ln (+ 1 n))			; at this point, l has at least 2 el
       (let fold2 ((l l) (lp '()) (accum accum))
	 (if (null? (cdr l)) (cons lp accum)
	   (fold2 (cdr l) (cons (car l) lp) (cons (append lp (cdr l)) accum))))
       )
     (else
      (split l ln n accum))))
	
  ; split l in two parts a and b so that (length b) is n
  ; Invariant: (equal? (append a b) l)
  ; ln is the length of l
  (define (split l ln n accum)
    (let loop  ((a '()) (b l) (lnb ln))
      (if (= lnb n) (cont a (- ln lnb) b lnb n accum)
	  (loop (cons (car b) a) (cdr b) (- lnb 1)))))
	
  ; The main body of the algorithm
  (define (cont a lna b lnb n accum)
    (let* ((accum
	    (loop a lna n accum))
	   (accum	; this is actually (loop b lnb n accum)
	    (cons b accum))
	   (max-k (min lna (- n 1)))
	   )
      ;(if (or (= 1 lna) (= 1 lnb)) (printf "one ~a ~a ~n" lna lnb))
      (let inner ((k 1) (accum accum))
	(if (> k max-k) ; don't loop past meaningful boundaries
	    accum
	    (let ((as (loop a lna k '()))
		  (bs (loop b lnb (- n k) '())))
		     ; compute the cross-product of as and bs
		     ; and append it to accum
	      (let fold ((bs bs) (accum accum))
		(if (null? bs) (inner (+ 1 k) accum)
		  (let foldi ((as as) (accum accum))
		    (if (null? as) (fold (cdr bs) accum)
		      (foldi (cdr as) 
			(cons (append (car as) (car bs)) accum)))))))))
      ))
 	 
  (loop l (length l) n '()))


(define (set-equal? pred? l1 l2)
  (if (null? l1) (null? l2)
   (let loop ((l2-to-see l2) (l2-seen '()))
     (and (pair? l2-to-see)
	  (if (pred? (car l1) (car l2-to-see))
	      (set-equal? pred? (cdr l1) (append (cdr l2-to-see) l2-seen))
	      (loop (cdr l2-to-see) (cons (car l2-to-see) l2-seen)))))))

; Test two powersets for equality
(define (pset-equal? ps1 ps2)
  (set-equal? (lambda (s1 s2) (set-equal? equal? s1 s2)) ps1 ps2))

(define (v5-51)
  (let ((l '(1 2 3 4 5 6 7 8 9 10)))
    (let ((n5 (subsets-v5 l 5))
	  (n51 (subsets-v51 l 5)))
      (pset-equal? n5 n51))))

(define test-l '(1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20))

(begin (time (subsets-v23 test-l 10)) #f)
(begin (time (subsets-v5 test-l 10)) #f)
(begin (time (subsets-v51 test-l 10)) #f)