MIT Scheme provides several mechanisms for associating objects with one another. Each of these mechanisms creates a link between one or more objects, called keys, and some other object, called a datum. Beyond this common idea, however, each of the mechanisms has various different properties that make it appropriate in different situations:
An association list, or alist, is a data structure used very frequently in Scheme. An alist is a list of pairs, each of which is called an association. The car of an association is called the key.
An advantage of the alist representation is that an alist can be
incrementally augmented simply by adding new entries to the front.
Moreover, because the searching procedures assv
et al. search the
alist in order, new entries can "shadow" old entries. If an alist is
viewed as a mapping from keys to data, then the mapping can be not only
augmented but also altered in a non-destructive manner by adding new
entries to the front of the alist.(18)
#t
if object is an association list (including the
empty list); otherwise returns #f
. Any object satisfying this
predicate also satisfies list?
.
#f
(n.b.: not the empty list) is returned. assq
uses eq?
to compare object with the car fields of the pairs
in alist, while assv
uses eqv?
and assoc
uses
equal?
.(19)
(define e '((a 1) (b 2) (c 3))) (assq 'a e) => (a 1) (assq 'b e) => (b 2) (assq 'd e) => #f (assq (list 'a) '(((a)) ((b)) ((c)))) => #f (assoc (list 'a) '(((a)) ((b)) ((c)))) => ((a)) (assq 5 '((2 3) (5 7) (11 13))) => unspecified (assv 5 '((2 3) (5 7) (11 13))) => (5 7)
assv
, except
that selector (a procedure of one argument) is used to select the
key from the association, and predicate (an equivalence predicate)
is used to compare the key to the given item. This can be used to make
association lists whose elements are, say, vectors instead of pairs
(also see section Searching Lists).
For example, here is how assv
could be implemented:
(define assv (association-procedure eqv? car))
Another example is a "reverse association" procedure:
(define rassv (association-procedure eqv? cdr))
del-assq
uses eq?
to compare
object with the keys, while del-assv
uses eqv?
and
del-assoc
uses equal?
.
(define a '((butcher . "231 e22nd St.") (baker . "515 w23rd St.") (hardware . "988 Lexington Ave."))) (del-assq 'baker a) => ((butcher . "231 e22nd St.") (hardware . "988 Lexington Ave."))
del-assq!
uses eq?
to compare object with the keys,
while del-assv!
uses eqv?
and del-assoc!
uses
equal?
. These procedures are like del-assq
,
del-assv
, and del-assoc
, respectively, except that they
destructively modify alist.
del-assv
or
del-assq!
. The predicate and selector arguments are
the same as those for association-procedure
, while the
deletor argument should be either the procedure
list-deletor
(for non-destructive deletions), or the procedure
list-deletor!
(for destructive deletions).
For example, here is a possible implementation of del-assv
:
(define del-assv (delete-association-procedure list-deletor eqv? car))
list-copy
except that the "association" pairs, i.e. the
elements of the list alist, are also copied. alist-copy
could have been implemented like this:
(define (alist-copy alist) (if (null? alist) '() (cons (cons (car (car alist)) (cdr (car alist))) (alist-copy (cdr alist)))))
1D tables ("one-dimensional" tables) are similar to association
lists. In a 1D table, unlike an association list, the keys of the table
are held weakly: if a key is garbage-collected, its associated
value in the table is removed. 1D tables compare their keys for
equality using eq?
.
1D tables can often be used as a higher-performance alternative to the two-dimensional association table (see section The Association Table). If one of the keys being associated is a compound object such as a vector, a 1D table can be stored in one of the vector's slots. Under these circumstances, accessing items in a 1D table will be comparable in performance to using a property list in a conventional Lisp.
#t
if object is a 1D table, otherwise returns
#f
. Any object that satisfies this predicate also satisfies
list?
.
1d-table/lookup
.
MIT Scheme provides a generalization of the property-list mechanism
found in most other implementations of Lisp: a global two-dimensional
association table. This table is indexed by two keys, called
x-key and y-key in the following procedure descriptions.
These keys and the datum associated with them can be arbitrary objects.
eq?
is used to discriminate keys.
Think of the association table as a matrix: a single datum can be accessed using both keys, a column using x-key only, and a row using y-key only.
#f
if no such association exists.
(y-key . datum)
pairs. Returns the empty list if no
entries for x-key exist.
(2d-put! 'foo 'bar 5) (2d-put! 'foo 'baz 6) (2d-get-alist-x 'foo) => ((baz . 6) (bar . 5))
(x-key . datum)
pairs. Returns the empty list if no
entries for y-key exist.
(2d-put! 'bar 'foo 5) (2d-put! 'baz 'foo 6) (2d-get-alist-y 'foo) => ((baz . 6) (bar . 5))
Hash tables are a fast, powerful mechanism for storing large numbers of associations. MIT Scheme's hash tables feature automatic resizing, customizable growth parameters, and customizable hash procedures.
The average times for the insertion, deletion, and lookup operations on a hash table are bounded by a constant. The space required by the table is proportional to the number of associations in the table; the constant of proportionality is described below (see section Resizing of Hash Tables).
The hash-table implementation is a run-time-loadable option. To use hash tables, execute
(load-option 'hash-table)
once before calling any of the procedures defined here.
The next few procedures are hash-table constructors. All hash table
constructors are procedures that accept one optional argument,
initial-size, and return a newly allocated hash table. If
initial-size is given, it must be an exact non-negative integer or
#f
. The meaning of initial-size is discussed below
(see section Resizing of Hash Tables).
Hash tables are normally characterized by two things: the equivalence predicate that is used to compare keys, and whether or not the table allows its keys to be reclaimed by the garbage collector. If a table prevents its keys from being reclaimed by the garbage collector, it is said to hold its keys strongly; otherwise it holds its keys weakly (see section Weak Pairs).
eq?
. The keys are held
weakly. These are the fastest of the standard hash tables.
For compatibility with old code, make-symbol-hash-table
is a
synonym for this procedure.
eqv?
. The keys are held
weakly, except that booleans, characters, and numbers are held strongly.
These hash tables are a little slower than those made by
make-eq-hash-table
.
For compatibility with old code, make-object-hash-table
is a
synonym for this procedure.
equal?
. The keys are held
strongly. These hash tables are quite a bit slower than those made by
make-eq-hash-table
.
string=?
. The keys are held
strongly.
The next two procedures are used to create new hash-table constructors.
All of the above hash table constructors, with the exception of
make-eqv-hash-table
, could have been created by calls to these
"constructor-constructors"; see the examples below.
The optional argument rehash-after-gc?, if true, says that the
values returned by key-hash might change after a garbage
collection. If so, the hash-table implementation arranges for the table
to be rehashed when necessary. (see section Address Hashing, for
information about hash procedures that have this property.) Otherwise,
it is assumed that key-hash always returns the same value for the
same arguments. The default value of this argument is #f
.
The constructors returned by strong-hash-table/constructor
make
hash tables that hold their keys strongly. The constructors returned by
weak-hash-table/constructor
make hash tables that hold their keys
weakly.
Some examples showing how some standard hash-table constructors could have been defined:
(define make-eq-hash-table (weak-hash-table/constructor eq-hash-mod eq? #t)) (define make-equal-hash-table (strong-hash-table/constructor equal-hash-mod equal? #t)) (define make-string-hash-table (strong-hash-table/constructor string-hash-mod string=? #f))
The following procedure is sometimes useful in conjunction with weak hash tables. Normally it is not needed, because such hash tables clean themselves automatically as they are used.
The procedures described in this section are the basic operations on hash tables. They provide the functionality most often needed by programmers. Subsequent sections describe other operations that provide additional functionality needed by some applications.
#t
if object is a hash table, otherwise returns
#f
.
(key . datum)
where key is one of the keys of hash-table, and datum
is its associated datum. The average and worst-case times required by
this operation are linear in the number of associations
in the table.
hash-table/remove!
to remove the association being processed.
The following procedure is an alternate form of hash-table/get
that is useful in some situations. Usually, hash-table/get
is
preferable because it is faster.
hash-table/lookup
(hash-table/lookup
reduces into
the invoked procedure, i.e. calls it tail-recursively). The average
time required by this operation is bounded by a constant.
Normally, hash tables automatically resize themselves according to need. Because of this, the programmer need not be concerned with management of the table's size. However, some limited control over the table's size is provided, which will be discussed below. This discussion involves two concepts, usable size and physical size, which we will now define.
The usable size of a hash table is the number of associations that the table can hold at a given time. If the number of associations in the table exceeds the usable size, the table will automatically grow, increasing the usable size to a new value that is sufficient to hold the associations.
The physical size is an abstract measure of a hash table that specifies how much space is allocated to hold the associations of the table. The physical size is always greater than or equal to the usable size. The physical size is not interesting in itself; it is interesting only for its effect on the performance of the hash table. While the average performance of a hash-table lookup is bounded by a constant, the worst-case performance is not. For a table containing a given number of associations, increasing the physical size of the table decreases the probability that worse-than-average performance will occur.
The physical size of a hash table is statistically related to the number of associations. However, it is possible to place bounds on the physical size, and from this to estimate the amount of space used by the table:
(define (hash-table-space-bounds count rehash-size rehash-threshold) (let ((tf (/ 1 rehash-threshold))) (values (if (exact-integer? rehash-size) (- (* count (+ 4 tf)) (* tf (+ rehash-size rehash-size))) (* count (+ 4 (/ tf (* rehash-size rehash-size))))) (* count (+ 4 tf)))))
What this formula shows is that, for a "normal" rehash size (that is, not an exact integer), the amount of space used by the hash table is proportional to the number of associations in the table. The constant of proportionality varies statistically, with the low bound being
(+ 4 (/ (/ 1 rehash-threshold) (* rehash-size rehash-size)))
and the high bound being
(+ 4 (/ 1 rehash-threshold))
which, for the default values of these parameters, are 4.25
and
5
, respectively. Reducing the rehash size will tighten these
bounds, but increases the amount of time spent resizing, so you can see
that the rehash size gives some control over the time-space tradeoff of
the table.
The programmer can control the size of a hash table by means of three parameters:
If the programmer knows that the table will initially contain a specific
number of items, initial-size can be given when the table is
created. If initial-size is an exact non-negative integer, it
specifies the initial usable size of the hash table; the table will not
change size until the number of items in the table exceeds
initial-size, after which automatic resizing is enabled and
initial-size no longer has any effect. Otherwise, if
initial-size is not given or is #f
, the table is
initialized to an unspecified size and automatic resizing is immediately
enabled.
The rehash size specifies how much to increase the usable size of
the hash table when it becomes full. It is either an exact positive
integer, or a real number greater than one. If it is an integer, the
new size is the sum of the old size and the rehash size. Otherwise, it
is a real number, and the new size is the product of the old size and
the rehash size. Increasing the rehash size decreases the average cost
of an insertion, but increases the average amount of space used by the
table. The rehash size of a table may be altered dynamically by the
application in order to optimize the resizing of the table; for example,
if the table will grow quickly for a known period and afterwards will
not change size, performance might be improved by using a large rehash
size during the growth phase and a small one during the static phase.
The default rehash size of a newly constructed hash table is 2.0
.
Note well: The use of an exact positive integer for a rehash size is almost always undesirable; this option is provided solely for compatibility with the Common Lisp hash-table mechanism. The reason for this has to do with the time penalty for resizing the hash table. The time needed to resize a hash table is proportional to the number of associations in the table. This resizing cost is amortized across the insertions required to fill the table to the point where it needs to grow again. If the table grows by an amount proportional to the number of associations, then the cost of resizing and the increase in size are both proportional to the number of associations, so the amortized cost of an insertion operation is still bounded by a constant. However, if the table grows by a constant amount, this is not true: the amortized cost of an insertion is not bounded by a constant. Thus, using a constant rehash size means that the average cost of an insertion increases proportionally to the number of associations in the hash table.
The rehash threshold is a real number, between zero exclusive and
one inclusive, that specifies the ratio between a hash table's usable
size and its physical size. Decreasing the rehash threshold decreases
the probability of worse-than-average insertion, deletion, and lookup
times, but increases the physical size of the table for a given usable
size. The default rehash threshold of a newly constructed hash table is
1
.
The procedures described in this section may be used to make very efficient key-hashing procedures for arbitrary objects. All of these procedures are based on address hashing, which uses the address of an object as its hash number. The great advantage of address hashing is that converting an arbitrary object to a hash number is extremely fast and takes the same amount of time for any object.
The disadvantage of address hashing is that the garbage collector changes the addresses of most objects. The hash-table implementation compensates for this disadvantage by automatically rehashing tables that use address hashing when garbage collections occur. Thus, in order to use these procedures for key hashing, it is necessary to tell the hash-table implementation (by means of the rehash-after-gc? argument to the "constructor-constructor" procedure) that the hash numbers computed by your key-hashing procedure must be recomputed after a garbage collection.
eq?
to one another map to the
same hash number, provided that the garbage collector does not run
during or between the two calls to eq-hash
.
The following procedures are the key-hashing procedures used by the standard address-hash-based hash tables.
make-eq-hash-table
.
make-eqv-hash-table
.
make-equal-hash-table
.
The procedures in this section allow the programmer to control some of the internal structure of a hash table. Normally, hash tables maintain associations between keys and datums using pairs or weak pairs. These procedures allow the programmer to specify the use of some other data structure to maintain the association. In this section, the data structure that represents an association in a hash table is called an entry.
hash-table/constructor
define the characteristics of the hash
table as follows:
#f
iff the entry's
key has been reclaimed by the garbage collector. Instead of a
procedure, this may be #t
, which is equivalent to (lambda
(entry) #t)
.
#f
.
For example, here is how the constructors for ordinary hash tables could be defined:
(define (strong-hash-table/constructor key-hash key=? #!optional rehash-after-gc?) (hash-table/constructor key-hash key=? cons #t car cdr set-cdr! (if (default-object? rehash-after-gc?) #f rehash-after-gc?))) (define (weak-hash-table/constructor key-hash key=? #!optional rehash-after-gc?) (hash-table/constructor key-hash key=? weak-cons weak-pair/car? weak-car weak-cdr weak-set-cdr! (if (default-object? rehash-after-gc?) #f rehash-after-gc?)))
hash-table/constructor
. When called, each procedure returns the
value of the corresponding argument that was used to construct
hash-table.
The following procedures return the contents of a hash table as a collection of entries. While the data structure holding the entries is newly allocated, the entries themselves are not copied. Since hash table operations can modify these entries, the entries should be copied if it is desired to keep them while continuing to modify the table.
(list->vector (hash-table/entries-list hash-table))
The MIT Scheme object-hashing facility provides a mechanism for generating a unique hash number for an arbitrary object. This hash number, unlike an object's address, is unchanged by garbage collection. The object-hashing facility is useful in conjunction with hash tables, but it may be used for other things as well. In particular, it is used in the generation of the written representation for some objects (see section Custom Output).
All of these procedures accept an optional argument called table;
this table contains the object-integer associations. If given, this
argument must be an object-hash table as constructed by
hash-table/make
(see below). If not given, a default table is
used.
hash
associates an exact non-negative integer with object
and returns that integer. If hash
was previously called with
object as its argument, the integer returned is the same as was
returned by the previous call. hash
guarantees that distinct
objects (in the sense of eq?
) are associated with distinct
integers.
unhash
takes an exact non-negative integer k and returns
the object associated with that integer. If there is no object
associated with k, or if the object previously associated with
k has been reclaimed by the garbage collector, an error of type
condition-type:bad-range-argument
is signalled. In other words,
if hash
previously returned k for some object, and that
object has not been reclaimed, it is the value of the call to
unhash
.
An object that is passed to hash
as an argument is not protected
from being reclaimed by the garbage collector. If all other references
to that object are eliminated, the object will be reclaimed.
Subsequently calling unhash
with the hash number of the (now
reclaimed) object will signal an error.
(define x (cons 0 0)) => unspecified (hash x) => 77 (eqv? (hash x) (hash x)) => #t (define x 0) => unspecified (gc-flip) ;force a garbage collection (unhash 77) error-->
The following two procedures provide a lower-level interface to the object-hashing mechanism.
object-hash
is like hash
, except that it accepts an
additional optional argument, insert?. If insert? is
supplied and is #f
, object-hash
will return an integer for
object only if there is already an association in the table;
otherwise, it will return #f
. If insert? is not supplied,
or is not #f
, object-hash
always returns an integer,
creating an association in the table if necessary.
object-hash
additionally treats #f
differently than does
hash
. Calling object-hash
with #f
as its argument
will return an integer that, when passed to unhash
, will signal
an error rather than returning #f
. Likewise,
valid-hash-number?
will return #f
for this integer.
object-unhash
is like unhash
, except that when k is
not associated with any object or was previously associated with an
object that has been reclaimed, object-unhash
returns #f
.
This means that there is an ambiguity in the value returned by
object-unhash
: if #f
is returned, there is no way to
tell if k is associated with #f
or is not associated with
any object at all.
Finally, this procedure makes new object-hash tables:
Balanced binary trees are a useful data structure for maintaining large sets of associations whose keys are ordered. While most applications involving large association sets should use hash tables, some applications can benefit from the use of binary trees. Binary trees have two advantages over hash tables:
MIT Scheme provides an implementation of red-black trees. The red-black tree-balancing algorithm provides generally good performance because it doesn't try to keep the tree very closely balanced. At any given node in the tree, one side of the node can be twice as high as the other in the worst case. With typical data the tree will remain fairly well balanced anyway.
A red-black tree takes space that is proportional to the number of associations in the tree. For the current implementation, the constant of proportionality is eight words per association.
Red-black trees hold their keys strongly. In other words, if a red-black tree contains an association for a given key, that key cannot be reclaimed by the garbage collector.
The red-black tree implementation is a run-time-loadable option. To use red-black trees, execute
(load-option 'rb-tree)
once before calling any of the procedures defined here.
#t
if object is a red-black tree, otherwise
returns #f
.
(key . datum)
where
key is one of the keys of rb-tree, and datum is its
associated datum. The alist is sorted by key according to the
key<? argument used to construct rb-tree. The
time required by this operation is proportional to the
number of associations in the tree.
This procedure is equivalent to:
(lambda (rb-tree) (map cdr (rb-tree->alist rb-tree)))
#t
iff they are equal and #f
otherwise. The trees must
have been constructed with the same equality and order predicates (same
in the sense of eq?
). The keys of the trees are compared using
the key=? predicate used to build the trees, while the datums of
the trees are compared using the equivalence predicate datum=?.
The worst-case time required by this operation is proportional to the
number of associations in the tree.
#t
iff rb-tree contains no associations. Otherwise
returns #f
.
The returned value satisfies the following:
(lambda (rb-tree) (let ((size (rb-tree/size rb-tree)) (lg (lambda (x) (/ (log x) (log 2))))) (<= (lg size) (rb-tree/height rb-tree) (* 2 (lg (+ size 1))))))
(lambda (alist key=? key<?) (let ((tree (make-rb-tree key=? key<?))) (for-each (lambda (association) (rb-tree/insert! tree (car association) (cdr association))) alist) tree))
Balanced binary trees are a useful data structure for maintaining large sets of ordered objects or sets of associations whose keys are ordered. MIT Scheme has an comprehensive implementation of weight-balanced binary trees which has several advantages over the other data structures for large aggregates:
(+ 1 x)
modifies neither the constant 1 nor the value bound to x
. The
trees are referentially transparent thus the programmer need not worry
about copying the trees. Referential transparency allows space
efficiency to be achieved by sharing subtrees.
These features make weight-balanced trees suitable for a wide range of applications, especially those that require large numbers of sets or discrete maps. Applications that have a few global databases and/or concentrate on element-level operations like insertion and lookup are probably better off using hash-tables or red-black trees.
The size of a tree is the number of associations that it contains. Weight balanced binary trees are balanced to keep the sizes of the subtrees of each node within a constant factor of each other. This ensures logarithmic times for single-path operations (like lookup and insertion). A weight balanced tree takes space that is proportional to the number of associations in the tree. For the current implementation, the constant of proportionality is six words per association.
Weight balanced trees can be used as an implementation for either
discrete sets or discrete maps (associations). Sets are implemented by
ignoring the datum that is associated with the key. Under this scheme
if an associations exists in the tree this indicates that the key of the
association is a member of the set. Typically a value such as
()
, #t
or #f
is associated with the key.
Many operations can be viewed as computing a result that, depending on
whether the tree arguments are thought of as sets or maps, is known by
two different names.
An example is wt-tree/member?
, which, when
regarding the tree argument as a set, computes the set membership operation, but,
when regarding the tree as a discrete map, wt-tree/member?
is the
predicate testing if the map is defined at an element in its domain.
Most names in this package have been chosen based on interpreting the
trees as sets, hence the name wt-tree/member?
rather than
wt-tree/defined-at?
.
The weight balanced tree implementation is a run-time-loadable option. To use weight balanced trees, execute
(load-option 'wt-tree)
once before calling any of the procedures defined here.
Binary trees require there to be a total order on the keys used to arrange the elements in the tree. Weight balanced trees are organized by types, where the type is an object encapsulating the ordering relation. Creating a tree is a two-stage process. First a tree type must be created from the predicate which gives the ordering. The tree type is then used for making trees, either empty or singleton trees or trees from other aggregate structures like association lists. Once created, a tree `knows' its type and the type is used to test compatibility between trees in operations taking two trees. Usually a small number of tree types are created at the beginning of a program and used many times throughout the program's execution.
a
, b
and c
:
(key<? a a) => #f (and (key<? a b) (key<? b a)) => #f (if (and (key<? a b) (key<? b c)) (key<? a c) #t) => #t
Two key values are assumed to be equal if neither is less than the other by key<?.
Each call to make-wt-tree-type
returns a distinct value, and
trees are only compatible if their tree types are eq?
.
A consequence is
that trees that are intended to be used in binary tree operations must all be
created with a tree type originating from the same call to
make-wt-tree-type
.
Number-wt-type
could have been defined by
(define number-wt-type (make-wt-tree-type <))
String-wt-type
could have been defined by
(define string-wt-type (make-wt-tree-type string<?))
make-wt-tree-type
; the returned tree has this type.
make-wt-tree-type
; the returned tree has this type.
(lambda (type alist) (let ((tree (make-wt-tree type))) (for-each (lambda (association) (wt-tree/add! tree (car association) (cdr association))) alist) tree))
This section describes the basic tree operations on weight balanced trees. These operations are the usual tree operations for insertion, deletion and lookup, some predicates and a procedure for determining the number of associations in a tree.
#t
if object is a weight-balanced tree, otherwise
returns #f
.
#t
if wt-tree contains no associations, otherwise
returns #f
.
#t
if wt-tree contains an association for
key, otherwise returns #f
. The average and worst-case
times required by this operation are proportional to the logarithm of
the number of associations in wt-tree.
In the following the size of a tree is the number of associations that the tree contains, and a smaller tree contains fewer associations.
wt-tree/union
computes the map override of wt-tree-1 by
wt-tree-2. If the trees are viewed as sets the result is the set
union of the arguments.
The worst-case time required by this operation
is proportional to the sum of the sizes of both trees.
If the minimum key of one tree is greater than the maximum key of
the other tree then the time required is at worst proportional to
the logarithm of the size of the larger tree.
wt-tree/intersection
computes the domain restriction of
wt-tree-1 to (the domain of) wt-tree-2.
The time required by this operation is never worse that proportional to
the sum of the sizes of the trees.
#t
iff the key of each association in wt-tree-1 is
the key of some association in wt-tree-2, otherwise returns #f
.
Viewed as a set operation, wt-tree/subset?
is the improper subset
predicate.
A proper subset predicate can be constructed:
(define (proper-subset? s1 s2) (and (wt-tree/subset? s1 s2) (< (wt-tree/size s1) (wt-tree/size s2))))
As a discrete map operation, wt-tree/subset?
is the subset
test on the domain(s) of the map(s). In the worst-case the time
required by this operation is proportional to the size of
wt-tree-1.
#t
iff for every association in wt-tree-1 there is
an association in wt-tree-2 that has the same key, and vice
versa.
Viewing the arguments as sets wt-tree/set-equal?
is the set
equality predicate. As a map operation it determines if two maps are
defined on the same domain.
This procedure is equivalent to
(lambda (wt-tree-1 wt-tree-2) (and (wt-tree/subset? wt-tree-1 wt-tree-2 (wt-tree/subset? wt-tree-2 wt-tree-1)))
In the worst-case the time required by this operation is proportional to the size of the smaller tree.
wt-tree/fold
takes time
proportional to the size of wt-tree.
A sorted association list can be derived simply:
(wt-tree/fold (lambda (key datum list) (cons (cons key datum) list)) '() wt-tree))
The data in the associations can be summed like this:
(wt-tree/fold (lambda (key datum sum) (+ sum datum)) 0 wt-tree)
wt-tree/for-each
takes time proportional to in the size of
wt-tree.
The example prints the tree:
(wt-tree/for-each (lambda (key value) (display (list key value))) wt-tree))
(lambda (key datum-1 datum-2) ...)
If some key occurs only in one tree, that association will appear in the result tree without being processed by merge, so for this operation to make sense, either merge must have both a right and left identity which correspond to the association being absent in one of the trees, or some guarantee must be made, for example, all the keys in one tree are known to occur in the other.
These are all reasonable procedures for merge
(lambda (key val1 val2) (+ val1 val2)) (lambda (key val1 val2) (append val1 val2)) (lambda (key val1 val2) (wt-tree/union val1 val2))
However, a procedure like
(lambda (key val1 val2) (- val1 val2))
would result in a subtraction of the data for all associations with keys occuring in both trees but associations with keys occuring in only the second tree would be copied, not negated, as is presumably be intent. The programmer might ensure that this never happens.
This procedure has the same time behaviour as wt-tree/union
but
with a slightly worse constant factor. Indeed, wt-tree/union
might have beed defined like this:
(define (wt-tree/union tree1 tree2) (wt-tree/union-merge tree1 tree2 (lambda (key val1 val2) val2)))
The merge procedure takes the key as a parameter in case the data are not independent of the key.
Weight balanced trees support operations that view the tree as sorted sequence of associations. Elements of the sequence can be accessed by position, and the position of an element in the sequence can be determined, both in logarthmic time.
wt-tree/index
returns the indexth key,
wt-tree/index-datum
returns the datum associated with the
indexth key and wt-tree/index-pair
returns a new pair
(key . datum)
which is the cons
of the indexth
key and its datum. The average and worst-case times required by this
operation are proportional to the logarithm of the number of
associations in the tree.
These operations signal an error if the tree is empty, if
index<0
, or if index is greater than or equal to the
number of associations in the tree.
Indexing can be used to find the median and maximum keys in the tree as follows:
median: (wt-tree/index wt-tree (quotient (wt-tree/size wt-tree) 2)) maximum: (wt-tree/index wt-tree (-1+ (wt-tree/size wt-tree)))
#f
if the tree
has no association with for key. This procedure returns either an
exact non-negative integer or #f
. The average and worst-case
times required by this operation are proportional to the logarithm of
the number of associations in the tree.
wt-tree/min
returns the least key,
wt-tree/min-datum
returns the datum associated with the
least key and wt-tree/min-pair
returns a new pair
(key . datum)
which is the cons
of the minimum key and its datum.
The average and worst-case times required by this operation are
proportional to the logarithm of the number of associations in the tree.
These operations signal an error if the tree is empty. They could be written
(define (wt-tree/min tree) (wt-tree/index tree 0)) (define (wt-tree/min-datum tree) (wt-tree/index-datum tree 0)) (define (wt-tree/min-pair tree) (wt-tree/index-pair tree 0))
(wt-tree/delete wt-tree (wt-tree/min wt-tree))
(wt-tree/delete! wt-tree (wt-tree/min wt-tree))