From posting-system@google.com Sat Jan 12 03:56:08 2002 Date: Sat, 12 Jan 2002 00:56:01 -0800 From: oleg-at-pobox.com Newsgroups: comp.lang.scheme Subject: The FASTEST subsets function [Was: Subsets of a list] Message-ID: <7eb8ac3e.0201120056.3fc231c8@posting.google.com> Status: RO This article will show the design of the fastest ever interpreted subsets function. Sorry I'm too excited about this. Again, we start with the mathematical definition of the problem, which leads to a simple, correct and stunningly efficient solution. The final, fastest ever solution is still pure functional. We start with the definition of the problem: given a set 'l' and a number 'n', return the set of all subsets of 'l' of cardinality 'n'. Sets are represented by lists. The key was to choose the right definition. Let ps(L) is a powerset of L that does not include {empty set} (that is, a singular set whose element is the empty set). length(ps(L)) = (- (expt 2 (length L)) 1) Let L be a non-empty set and let L = A U B where A and B are two disjoint subsets of L. Obviously, ps(L) = ps(A U B) = ps(A) U ps(B) U { y U x | x <- ps(A), y <- ps(B) } Let (subsets L n) = (filter (lambda (el) (= n (length el))) ps(L) ) Thus the desired function 'subsets' is a filtered powerset ps. This seems to be a stupid definition. Let's not be hasty however. Note that filter and union commute: the filter of a union of sets is the union of filtered sets. Therefore, from the previous expression (subsets L n) = (subsets (union A B) n) = (subsets A n) U (subsets B n) Union{ y U x | x <- (subsets A k), y <- (subsets B (- n k)), k=1,n-1 } Well, this is it. Note, we didn't say how to split L into two disjoint subsets A and B. We can do as we wish. For example, we can choose to split in such a way so that (length B) is n. In the most difficult case where n = (/ (length L) 2), this corresponds to a "divide and conquer" strategy, so to speak (at least in the first stages). The Scheme code below implements this idea verbatim. It uses the accumulator-passing style that was expounded earlier. (define (subsets-v5 l n) ; The initialization function. Check the boundary conditions (define (loop l ln n accum) (cond ((<= n 0) (cons '() accum)) ((< ln n) accum) ((= ln n) (cons l accum)) ((= n 1) (let fold ((l l) (accum accum)) (if (null? l) accum (fold (cdr l) (cons (cons (car l) '()) accum))))) (else (split l ln n accum)))) ; split l in two parts a and b so that (length b) is n ; Invariant: (equal? (append a b) l) ; ln is the length of l (define (split l ln n accum) (let loop ((a '()) (b l) (lna 0) (lnb ln)) (if (= lnb n) (cont a lna b lnb n accum) (loop (cons (car b) a) (cdr b) (+ 1 lna) (- lnb 1))))) ; The main body of the algorithm (define (cont a lna b lnb n accum) (let* ((accum (loop a lna n accum)) (accum ; this is actually (loop b lnb n accum) (cons b accum)) ) (let inner ((k 1) (accum accum)) (if (> k (min lna (- n 1))) ; don't loop past meaningful boundaries accum (let ((as (loop a lna k '())) (bs (loop b lnb (- n k) '()))) (inner (+ 1 k) ; compute the cross-product of as and bs ; and append it to accum (let fold ((bs bs) (accum accum)) (if (null? bs) accum (fold (cdr bs) (append (map (lambda (lst) (append lst (car bs))) as) accum)))))))))) (loop l (length l) n '())) The benchmark runs in 993 ms of user time and allocates only 36.5 MB of memory, on Gambit-C interpreter. This is the absolute, incredible record. Under SCM: subsets-v3 (called combos by John David Stone) ;Evaluation took 1596 mSec (98 in gc) 657662 cells work, 4721364 env, 97 bytes other subsets-v5: ;Evaluation took 700 mSec (322 in gc) 1708112 cells work, 610264 env, 105 bytes other That is, more than twice as fast. Continuing the table from the previous post: Procedure Gambit-C Bigloo 2.4b Bigloo 2.4b interpreter, s interpreter, s compiler, s subsets-v0 285.0 11.59 5.62 3.14 subsets-v1 6.3 5.45 2.22 0.34 subsets-v3 8.1 4.78 0.96 0.27 subsets-v20 14.1 5.53 0.96 0.26 subsets-v21 7.7 4.88 0.66 0.26 subsets-v22 5.0 3.18 0.62 0.25 subsets-v23 4.1 2.86 0.82 0.25 subsets-v5 0.9 1.56 1.10 0.76 Well, compiled code isn't that fast -- but that because subsets-v5 isn't too optimal. The append and map ought to be converted into the accumulation-passing style. That will reduce the amount of garbage as well. But it's past midnight. The conclusion of this Friday night exercise is astonishingly trite. What a Math teacher told us is true: we have to attack the algorithm if we want to really big improvements. And Math rules! P.S. Additional optimization [sent in a message to David Feuer] > > Well, compiled code isn't that fast -- but that because subsets-v5 > > isn't too optimal. The append and map ought to be converted into the > > accumulation-passing style. That will reduce the amount of garbage as > > well. But it's past midnight. I was referring to the piece of code (append (map (lambda (lst) (append lst (car bs))) as) accum) in subsets-v5, which we will write as (append (map fn lst1) lst2) for clarity. The problem with this code is that map creates a list (transformed lst1), which 'append' appends to lst2 and discards. Thus the result of map becomes garbage. Because we consider lists as models of sets, the code is equivalent to (append (map fn (reverse lst1)) lst2) The latter can be re-written as (let fold ((l lst1) (accum lst2)) (if (null? l) accum (fold (cdr l) (cons (fn (car l)) accum)))) which creates no such garbage. The more optimized code follows. It may have a better performance when compiled; when interpreted, it loses out to the simple-minded subsets-v5 above because the latter relies more on built-in functions such as append. (define (subsets-v51 l n) ; The initialization function. Check the boundary conditions (define (loop l ln n accum) (cond ((<= n 0) (cons '() accum)) ((< ln n) accum) ((= ln n) (cons l accum)) ((= n 1) (let fold ((l l) (accum accum)) (if (null? l) accum (fold (cdr l) (cons (cons (car l) '()) accum))))) ((= ln (+ 1 n)) ; at this point, l has at least 2 el (let fold2 ((l l) (lp '()) (accum accum)) (if (null? (cdr l)) (cons lp accum) (fold2 (cdr l) (cons (car l) lp) (cons (append lp (cdr l)) accum)))) ) (else (split l ln n accum)))) ; split l in two parts a and b so that (length b) is n ; Invariant: (equal? (append a b) l) ; ln is the length of l (define (split l ln n accum) (let loop ((a '()) (b l) (lnb ln)) (if (= lnb n) (cont a (- ln lnb) b lnb n accum) (loop (cons (car b) a) (cdr b) (- lnb 1))))) ; The main body of the algorithm (define (cont a lna b lnb n accum) (let* ((accum (loop a lna n accum)) (accum ; this is actually (loop b lnb n accum) (cons b accum)) (max-k (min lna (- n 1))) ) ;(if (or (= 1 lna) (= 1 lnb)) (printf "one ~a ~a ~n" lna lnb)) (let inner ((k 1) (accum accum)) (if (> k max-k) ; don't loop past meaningful boundaries accum (let ((as (loop a lna k '())) (bs (loop b lnb (- n k) '()))) ; compute the cross-product of as and bs ; and append it to accum (let fold ((bs bs) (accum accum)) (if (null? bs) (inner (+ 1 k) accum) (let foldi ((as as) (accum accum)) (if (null? as) (fold (cdr bs) accum) (foldi (cdr as) (cons (append (car as) (car bs)) accum))))))))) )) (loop l (length l) n '())) (define (set-equal? pred? l1 l2) (if (null? l1) (null? l2) (let loop ((l2-to-see l2) (l2-seen '())) (and (pair? l2-to-see) (if (pred? (car l1) (car l2-to-see)) (set-equal? pred? (cdr l1) (append (cdr l2-to-see) l2-seen)) (loop (cdr l2-to-see) (cons (car l2-to-see) l2-seen))))))) ; Test two powersets for equality (define (pset-equal? ps1 ps2) (set-equal? (lambda (s1 s2) (set-equal? equal? s1 s2)) ps1 ps2)) (define (v5-51) (let ((l '(1 2 3 4 5 6 7 8 9 10))) (let ((n5 (subsets-v5 l 5)) (n51 (subsets-v51 l 5))) (pset-equal? n5 n51)))) (define test-l '(1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20)) (begin (time (subsets-v23 test-l 10)) #f) (begin (time (subsets-v5 test-l 10)) #f) (begin (time (subsets-v51 test-l 10)) #f)