Go to the first, previous, next, last section, table of contents.

Strings

A string is a mutable sequence of characters. In the current implementation of MIT Scheme, the elements of a string must all satisfy the predicate char-ascii?; if someone ports MIT Scheme to a non-ASCII operating system this requirement will change.

A string is written as a sequence of characters enclosed within double quotes " ". To include a double quote inside a string, precede the double quote with a backslash \ (escape it), as in

"The word \"recursion\" has many meanings."

The printed representation of this string is

The word "recursion" has many meanings.

To include a backslash inside a string, precede it with another backslash; for example,

"Use #\\Control-q to quit."

The printed representation of this string is

Use #\Control-q to quit.

The effect of a backslash that doesn't precede a double quote or backslash is unspecified in standard Scheme, but MIT Scheme specifies the effect for three other characters: \t, \n, and \f. These escape sequences are respectively translated into the following characters: #\tab, #\newline, and #\page. Finally, a backslash followed by exactly three octal digits is translated into the character whose ASCII code is those digits.

If a string literal is continued from one line to another, the string will contain the newline character (#\newline) at the line break. Standard Scheme does not specify what appears in a string literal at a line break.

The length of a string is the number of characters that it contains. This number is an exact non-negative integer that is established when the string is created (but see section Variable-Length Strings). Each character in a string has an index, which is a number that indicates the character's position in the string. The index of the first (leftmost) character in a string is 0, and the index of the last character is one less than the length of the string. The valid indexes of a string are the exact non-negative integers less than the length of the string.

A number of the string procedures operate on substrings. A substring is a segment of a string, which is specified by two integers start and end satisfying these relationships:

0 <= start <= end <= (string-length string)

Start is the index of the first character in the substring, and end is one greater than the index of the last character in the substring. Thus if start and end are equal, they refer to an empty substring, and if start is zero and end is the length of string, they refer to all of string.

Some of the procedures that operate on strings ignore the difference between uppercase and lowercase. The versions that ignore case include `-ci' (for "case insensitive") in their names.

Construction of Strings

procedure: make-string k [char]
Returns a newly allocated string of length k. If you specify char, all elements of the string are initialized to char, otherwise the contents of the string are unspecified. Char must satisfy the predicate char-ascii?.

(make-string 10 #\x)              =>  "xxxxxxxxxx"

procedure+: string char ...
Returns a newly allocated string consisting of the specified characters. The arguments must all satisfy char-ascii?.

(string #\a)                                =>  "a"
(string #\a #\b #\c)                        =>  "abc"
(string #\a #\space #\b #\space #\c)        =>  "a b c"
(string)                                    =>  ""

For compatibility with old code, char->string is a synonym for this procedure.

procedure: list->string char-list
Char-list must be a list of ASCII characters. list->string returns a newly allocated string formed from the elements of char-list. This is equivalent to (apply string char-list). The inverse of this operation is string->list.

(list->string '(#\a #\b))               =>  "ab"
(string->list "Hello")                  =>  (#\H #\e #\l #\l #\o)

procedure: string-copy string
Returns a newly allocated copy of string.

Note regarding variable-length strings: the maximum length of the result depends only on the length of string, not its maximum length. If you wish to copy a string and preserve its maximum length, do the following:

(define (string-copy-preserving-max-length string)
  (let ((length))
    (dynamic-wind 
     (lambda ()
       (set! length (string-length string))
       (set-string-length! string (string-maximum-length string)))
     (lambda ()
       (string-copy string))
     (lambda ()
       (set-string-length! string length)))))

Selecting String Components

procedure: string? object
Returns #t if object is a string; otherwise returns #f.

(string? "Hi")                  =>  #t
(string? 'Hi)                   =>  #f

procedure: string-length string
Returns the length of string as an exact non-negative integer.

(string-length "")              =>  0
(string-length "The length")    =>  10

procedure: string-null? string
Returns #t if string has zero length; otherwise returns #f.

(string-null? "")               =>  #t
(string-null? "Hi")             =>  #f

procedure: string-ref string k
Returns character k of string. K must be a valid index of string.

(string-ref "Hello" 1)          =>  #\e
(string-ref "Hello" 5)          error--> 5 not in correct range

procedure: string-set! string k char
Stores char in element k of string and returns an unspecified value. K must be a valid index of string, and char must satisfy the predicate char-ascii?.

(define str "Dog")              =>  unspecified
(string-set! str 0 #\L)         =>  unspecified
str                             =>  "Log"
(string-set! str 3 #\t)         error--> 3 not in correct range

Comparison of Strings

procedure: string=? string1 string2
procedure+: substring=? string1 start end string2 start end
procedure: string-ci=? string1 string2
procedure+: substring-ci=? string1 start end string2 start end
Returns #t if the two strings (substrings) are the same length and contain the same characters in the same (relative) positions; otherwise returns #f. string-ci=? and substring-ci=? don't distinguish uppercase and lowercase letters, but string=? and substring=? do.

(string=? "PIE" "PIE")                  =>  #t
(string=? "PIE" "pie")                  =>  #f
(string-ci=? "PIE" "pie")               =>  #t
(substring=? "Alamo" 1 3 "cola" 2 4)    =>  #t ; compares "la"

procedure: string<? string1 string2
procedure+: substring<? string1 start1 end1 string2 start2 end2
procedure: string>? string1 string2
procedure: string<=? string1 string2
procedure: string>=? string1 string2
procedure: string-ci<? string1 string2
procedure+: substring-ci<? string1 start1 end1 string2 start2 end2
procedure: string-ci>? string1 string2
procedure: string-ci<=? string1 string2
procedure: string-ci>=? string1 string2
These procedures compare strings (substrings) according to the order of the characters they contain (also see section Comparison of Characters). The arguments are compared using a lexicographic (or dictionary) order. If two strings differ in length but are the same up to the length of the shorter string, the shorter string is considered to be less than the longer string.

(string<? "cat" "dog")          =>  #t
(string<? "cat" "DOG")          =>  #f
(string-ci<? "cat" "DOG")       =>  #t
(string>? "catkin" "cat")       =>  #t ; shorter is lesser

procedure+: string-compare string1 string2 if-eq if-lt if-gt
procedure+: string-compare-ci string1 string2 if-eq if-lt if-gt
If-eq, if-lt, and if-gt are procedures of no arguments (thunks). The two strings are compared; if they are equal, if-eq is applied, if string1 is less than string2, if-lt is applied, else if string1 is greater than string2, if-gt is applied. The value of the procedure is the value of the thunk that is applied.

string-compare distinguishes uppercase and lowercase letters; string-compare-ci does not.

(define (cheer) (display "Hooray!"))
(define (boo)   (display "Boo-hiss!"))
(string-compare "a" "b"  cheer  (lambda() 'ignore)  boo)
        -|  Hooray!
        =>  unspecified

procedure+: string-hash string
procedure+: string-hash-mod string k
string-hash returns an exact non-negative integer that can be used for storing the specified string in a hash table. Equal strings (in the sense of string=?) return equal (=) hash codes, and non-equal but similar strings are usually mapped to distinct hash codes.

string-hash-mod is like string-hash, except that it limits the result to a particular range based on the exact non-negative integer k. The following are equivalent:

(string-hash-mod string k)
(modulo (string-hash string) k)

Alphabetic Case in Strings

procedure+: string-capitalized? string
procedure+: substring-capitalized? string start end
These procedures return #t if the first word in the string (substring) is capitalized, and any subsequent words are either lower case or capitalized. Otherwise, they return #f. A word is defined as a non-null contiguous sequence of alphabetic characters, delimited by non-alphabetic characters or the limits of the string (substring). A word is capitalized if its first letter is upper case and all its remaining letters are lower case.

(map string-capitalized? '(""    "A"    "art"  "Art"  "ART"))
                       => (#f    #t     #f     #t     #f)

procedure+: string-upper-case? string
procedure+: substring-upper-case? string start end
procedure+: string-lower-case? string
procedure+: substring-lower-case? string start end
These procedures return #t if all the letters in the string (substring) are of the correct case, otherwise they return #f. The string (substring) must contain at least one letter or the procedures return #f.

(map string-upper-case?  '(""    "A"    "art"  "Art"  "ART"))
                       => (#f    #t     #f     #f     #t)

procedure+: string-capitalize string
procedure+: string-capitalize! string
procedure+: substring-capitalize! string start end
string-capitalize returns a newly allocated copy of string in which the first alphabetic character is uppercase and the remaining alphabetic characters are lowercase. For example, "abcDEF" becomes "Abcdef". string-capitalize! is the destructive version of string-capitalize: it alters string and returns an unspecified value. substring-capitalize! destructively capitalizes the specified part of string.

procedure+: string-downcase string
procedure+: string-downcase! string
procedure+: substring-downcase! string start end
string-downcase returns a newly allocated copy of string in which all uppercase letters are changed to lowercase. string-downcase! is the destructive version of string-downcase: it alters string and returns an unspecified value. substring-downcase! destructively changes the case of the specified part of string.

(define str "ABCDEFG")          =>  unspecified
(substring-downcase! str 3 5)   =>  unspecified
str                             =>  "ABCdeFG"

procedure+: string-upcase string
procedure+: string-upcase! string
procedure+: substring-upcase! string start end
string-upcase returns a newly allocated copy of string in which all lowercase letters are changed to uppercase. string-upcase! is the destructive version of string-upcase: it alters string and returns an unspecified value. substring-upcase! destructively changes the case of the specified part of string.

Cutting and Pasting Strings

procedure: string-append string ...
Returns a newly allocated string made from the concatenation of the given strings. With no arguments, string-append returns the empty string ("").

(string-append)                         =>  ""
(string-append "*" "ace" "*")           =>  "*ace*"
(string-append "" "" "")                =>  ""
(eq? str (string-append str))           =>  #f ; newly allocated

procedure: substring string start end
Returns a newly allocated string formed from the characters of string beginning with index start (inclusive) and ending with end (exclusive).

(substring "" 0 0)              => ""
(substring "arduous" 2 5)       => "duo"
(substring "arduous" 2 8)       error--> 8 not in correct range

(define (string-copy s)
  (substring s 0 (string-length s)))

procedure+: string-head string end
Returns a newly allocated copy of the initial substring of string, up to but excluding end. It could have been defined by:

(define (string-head string end)
  (substring string 0 end))

procedure+: string-tail string start
Returns a newly allocated copy of the final substring of string, starting at index start and going to the end of string. It could have been defined by:

(define (string-tail string start)
  (substring string start (string-length string)))

(string-tail "uncommon" 2)      =>  "common"

procedure+: string-pad-left string k [char]
procedure+: string-pad-right string k [char]
These procedures return a newly allocated string created by padding string out to length k, using char. If char is not given, it defaults to #\space. If k is less than the length of string, the resulting string is a truncated form of string. string-pad-left adds padding characters or truncates from the beginning of the string (lowest indices), while string-pad-right does so at the end of the string (highest indices).

(string-pad-left "hello" 4)             =>  "ello"
(string-pad-left "hello" 8)             =>  "   hello"
(string-pad-left "hello" 8 #\*)         =>  "***hello"
(string-pad-right "hello" 4)            =>  "hell"
(string-pad-right "hello" 8)            =>  "hello   "

procedure+: string-trim string [char-set]
procedure+: string-trim-left string [char-set]
procedure+: string-trim-right string [char-set]
Returns a newly allocated string created by removing all characters that are not in char-set from: (string-trim) both ends of string; (string-trim-left) the beginning of string; or (string-trim-right) the end of string. Char-set defaults to char-set:not-whitespace.

(string-trim "  in the end  ")          =>  "in the end"
(string-trim "              ")          =>  ""
(string-trim "100th" char-set:numeric)  =>  "100"
(string-trim-left "-.-+-=-" (char-set #\+))
                                        =>  "+-=-"
(string-trim "but (+ x y) is" (char-set #\( #\)))
                                        =>  "(+ x y)"

Searching Strings

procedure+: substring? pattern string
Searches string to see if it contains the substring pattern. Returns the index of the first substring of string that is equal to pattern; or #f if string does not contain pattern.

(substring? "rat" "pirate")             =>  2
(substring? "rat" "outrage")            =>  #f
(substring? "" any-string)              =>  0
(if (substring "moon" text)
    (process-lunar text)
    'no-moon)

procedure+: string-find-next-char string char
procedure+: substring-find-next-char string start end char
procedure+: string-find-next-char-ci string char
procedure+: substring-find-next-char-ci string start end char
Returns the index of the first occurrence of char in the string (substring); returns #f if char does not appear in the string. For the substring procedures, the index returned is relative to the entire string, not just the substring. The -ci procedures don't distinguish uppercase and lowercase letters.

(string-find-next-char "Adam" #\A)              =>  0 
(substring-find-next-char "Adam" 1 4 #\A)       =>  #f
(substring-find-next-char-ci "Adam" 1 4 #\A)    =>  2 

procedure+: string-find-next-char-in-set string char-set
procedure+: substring-find-next-char-in-set string start end char-set
Returns the index of the first character in the string (or substring) that is also in char-set, or returns #f if none of the characters in char-set occur in string. For the substring procedure, only the substring is searched, but the index returned is relative to the entire string, not just the substring.

(string-find-next-char-in-set my-string char-set:alphabetic)
                =>  start position of the first word in my-string
; Can be used as a predicate:
(if (string-find-next-char-in-set my-string (char-set #\( #\) ))
    'contains-parentheses
    'no-parentheses)

procedure+: string-find-previous-char string char
procedure+: substring-find-previous-char string start end char
procedure+: string-find-previous-char-ci string char
procedure+: substring-find-previous-char-ci string start end char
Returns the index of the last occurrence of char in the string (substring); returns #f if char doesn't appear in the string. For the substring procedures, the index returned is relative to the entire string, not just the substring. The -ci procedures don't distinguish uppercase and lowercase letters.

procedure+: string-find-previous-char-in-set string char-set
procedure+: substring-find-previous-char-in-set string start end char-set
Returns the index of the last character in the string (substring) that is also in char-set. For the substring procedure, the index returned is relative to the entire string, not just the substring.

Matching Strings

procedure+: string-match-forward string1 string2
procedure+: substring-match-forward string1 start end string2 start end
procedure+: string-match-forward-ci string1 string2
procedure+: substring-match-forward-ci string1 start end string2 start end
Compares the two strings (substrings), starting from the beginning, and returns the number of characters that are the same. If the two strings (substrings) start differently, returns 0. The -ci procedures don't distinguish uppercase and lowercase letters.

(string-match-forward "mirror" "micro") =>  2  ; matches "mi"
(string-match-forward "a" "b")          =>  0  ; no match

procedure+: string-match-backward string1 string2
procedure+: substring-match-backward string1 start end string2 start end
procedure+: string-match-backward-ci string1 string2
procedure+: substring-match-backward-ci string1 start end string2 start end
Compares the two strings (substrings), starting from the end and matching toward the front, returning the number of characters that are the same. If the two strings (substrings) end differently, returns 0. The -ci procedures don't distinguish uppercase and lowercase letters.

(string-match-backward-ci "BULBOUS" "fractious")
                                        =>  3  ; matches "ous"

procedure+: string-prefix? string1 string2
procedure+: substring-prefix? string1 start1 end1 string2 start2 end2
procedure+: string-prefix-ci? string1 string2
procedure+: substring-prefix-ci? string1 start1 end1 string2 start2 end2
These procedures return #t if the first string (substring) forms the prefix of the second; otherwise returns #f. The -ci procedures don't distinguish uppercase and lowercase letters.

(string-prefix? "abc" "abcdef")         =>  #t
(string-prefix? "" any-string)          =>  #t

procedure+: string-suffix? string1 string2
procedure+: substring-suffix? string1 start1 end1 string2 start2 end2
procedure+: string-suffix-ci? string1 string2
procedure+: substring-suffix-ci? string1 start1 end1 string2 start2 end2
These procedures return #t if the first string (substring) forms the suffix of the second; otherwise returns #f. The -ci procedures don't distinguish uppercase and lowercase letters.

(string-suffix? "ous" "bulbous")        =>  #t
(string-suffix? "" any-string)          =>  #t

Modification of Strings

procedure+: string-replace string char1 char2
procedure+: substring-replace string start end char1 char2
procedure+: string-replace! string char1 char2
procedure+: substring-replace! string start end char1 char2
These procedures replace all occurrences of char1 with char2 in the original string (substring). string-replace and substring-replace return a newly allocated string containing the result. string-replace! and substring-replace! destructively modify string and return an unspecified value.

(define str "a few words")              =>  unspecified
(string-replace str #\space #\-)        =>  "a-few-words"
(substring-replace str 2 9 #\space #\-) =>  "a few-words"
str                                     =>  "a few words"
(string-replace! str #\space #\-)       =>  unspecified
str                                     =>  "a-few-words"

procedure: string-fill! string char
Stores char in every element of string and returns an unspecified value.

procedure+: substring-fill! string start end char
Stores char in elements start (inclusive) to end (exclusive) of string and returns an unspecified value.

(define s (make-string 10 #\space))     =>  unspecified
(substring-fill! s 2 8 #\*)             =>  unspecified
s                                       =>  "  ******  "

procedure+: substring-move-left! string1 start1 end1 string2 start2
procedure+: substring-move-right! string1 start1 end1 string2 start2
Copies the characters from start1 to end1 of string1 into string2 at the start2-th position. The characters are copied as follows (note that this is only important when string1 and string2 are eqv?):

substring-move-left!
The copy starts at the left end and moves toward the right (from smaller indices to larger). Thus if string1 and string2 are the same, this procedure moves the characters toward the left inside the string.
substring-move-right!
The copy starts at the right end and moves toward the left (from larger indices to smaller). Thus if string1 and string2 are the same, this procedure moves the characters toward the right inside the string.

The following example shows how these procedures can be used to build up a string (it would have been easier to use string-append):

(define answer (make-string 9 #\*))             =>  unspecified
answer                                          =>  "*********"
(substring-move-left! "start" 0 5 answer 0)     =>  unspecified
answer                                          =>  "start****"
(substring-move-left! "-end" 0 4 answer 5)      =>  unspecified
answer                                          =>  "start-end"

Variable-Length Strings

MIT Scheme allows the length of a string to be dynamically adjusted in a limited way. This feature works as follows. When a new string is allocated, by whatever method, it has a specific length. At the time of allocation, it is also given a maximum length, which is guaranteed to be at least as large as the string's length. (Sometimes the maximum length will be slightly larger than the length, but it is a bad idea to count on this. Programs should assume that the maximum length is the same as the length at the time of the string's allocation.) After the string is allocated, the operation set-string-length! can be used to alter the string's length to any value between 0 and the string's maximum length, inclusive.

procedure+: string-maximum-length string
Returns the maximum length of string. The following is guaranteed:

(<= (string-length string)
    (string-maximum-length string))     =>  #t

The maximum length of a string never changes.

procedure+: set-string-length! string k
Alters the length of string to be k, and returns an unspecified value. K must be less than or equal to the maximum length of string. set-string-length! does not change the maximum length of string.

Byte Vectors

MIT Scheme implements strings as packed vectors of 8-bit ASCII bytes. Most of the string operations, such as string-ref, coerce these 8-bit codes into character objects. However, some lower-level operations are made available for use.

procedure+: vector-8b-ref string k
Returns character k of string as an ASCII code. K must be a valid index of string.

(vector-8b-ref "abcde" 2)               =>  99 ; ascii for `c'

procedure+: vector-8b-set! string k ascii
Stores ascii in element k of string and returns an unspecified value. K must be a valid index of string, and ascii must be a valid ASCII code.

procedure+: vector-8b-fill! string start end ascii
Stores ascii in elements start (inclusive) to end (exclusive) of string and returns an unspecified value. Ascii must be a valid ASCII code.

procedure+: vector-8b-find-next-char string start end ascii
procedure+: vector-8b-find-next-char-ci string start end ascii
Returns the index of the first occurrence of ascii in the given substring; returns #f if ascii does not appear. The index returned is relative to the entire string, not just the substring. Ascii must be a valid ASCII code.

vector-8b-find-next-char-ci doesn't distinguish uppercase and lowercase letters.

procedure+: vector-8b-find-previous-char string start end ascii
procedure+: vector-8b-find-previous-char-ci string start end ascii
Returns the index of the last occurrence of ascii in the given substring; returns #f if ascii does not appear. The index returned is relative to the entire string, not just the substring. Ascii must be a valid ASCII code.

vector-8b-find-previous-char-ci doesn't distinguish uppercase and lowercase letters.


Go to the first, previous, next, last section, table of contents.