list(n) 1.4 struct "Tcl Data Structures"
list - Procedures for manipulating lists
package require Tcl 8.0
package require struct ?1.4?
The ::struct::list namespace contains several useful commands
for processing Tcl lists. Generally speaking, they implement
algorithms more complex or specialized than the ones provided by Tcl
itself.
It exports only a single command, struct::list. All
functionality provided here can be reached through a subcommand of
this command.
- ::struct::list longestCommonSubsequence sequence1 sequence2 ?maxOccurs?
-
Returns the longest common subsequence of elements in the two lists
sequence1 and sequence2. If the maxOccurs parameter
is provided, the common subsequence is restricted to elements that
occur no more than maxOccurs times in sequence2.
The return value is a list of two lists of equal length. The first
sublist is of indices into sequence1, and the second sublist is
of indices into sequence2. Each corresponding pair of indices
corresponds to equal elements in the sequences; the sequence returned
is the longest possible.
- ::struct::list longestCommonSubsequence2 sequence1 sequence2 ?maxOccurs?
-
Returns an approximation to the longest common sequence of elements in
the two lists sequence1 and sequence2.
If the maxOccurs parameter is omitted, the subsequence computed
is exactly the longest common subsequence; otherwise, the longest
common subsequence is approximated by first determining the longest
common sequence of only those elements that occur no more than
maxOccurs times in sequence2, and then using that result
to align the two lists, determining the longest common subsequences of
the sublists between the two elements.
As with longestCommonSubsequence, the return value is a list
of two lists of equal length. The first sublist is of indices into
sequence1, and the second sublist is of indices into
sequence2. Each corresponding pair of indices corresponds to
equal elements in the sequences. The sequence approximates the
longest common subsequence.
- ::struct::list lcsInvert lcsData len1 len2
-
This command takes a description of a longest common subsequence
(lcsData), inverts it, and returns the result. Inversion means
here that as the input describes which parts of the two sequences are
identical the output describes the differences instead.
To be fully defined the lengths of the two sequences have to be known
and are specified through len1 and len2.
The result is a list where each element describes one chunk of the
differences between the two sequences. This description is a list
containing three elements, a type and two pairs of indices into
sequence1 and sequence2 respectively, in this order.
The type can be one of three values:
- added
-
Describes an addition. I.e. items which are missing in sequence1
can be found in sequence2.
The pair of indices into sequence1 describes where the added
range had been expected to be in sequence1. The first index
refers to the item just before the added range, and the second index
refers to the item just after the added range.
The pair of indices into sequence2 describes the range of items
which has been added to it. The first index refers to the first item
in the range, and the second index refers to the last item in the
range.
- deleted
-
Describes a deletion. I.e. items which are in sequence1 are
missing from sequence2.
The pair of indices into sequence1 describes the range of items
which has been deleted. The first index refers to the first item in
the range, and the second index refers to the last item in the range.
The pair of indices into sequence2 describes where the deleted
range had been expected to be in sequence2. The first index
refers to the item just before the deleted range, and the second index
refers to the item just after the deleted range.
- changed
-
Describes a general change. I.e a range of items in sequence1
has been replaced by a different range of items in sequence2.
The pair of indices into sequence1 describes the range of items
which has been replaced. The first index refers to the first item in
the range, and the second index refers to the last item in the range.
The pair of indices into sequence2 describes the range of items
replacing the original range. Again the first index refers to the
first item in the range, and the second index refers to the last item
in the range.
|
sequence 1 = {a b r a c a d a b r a}
lcs 1 = {1 2 4 5 8 9 10}
lcs 2 = {0 1 3 4 5 6 7}
sequence 2 = {b r i c a b r a c}
Inversion = {{deleted {0 0} {-1 0}}
{changed {3 3} {2 2}}
{deleted {6 7} {4 5}}
{added {10 11} {8 8}}}
|
Notes:
-
An index of -1 in a deleted chunk refers to just before
the first element of the second sequence.
-
Also an index equal to the length of the first sequence in an
added chunk refers to just behind the end of the sequence.
- ::struct::list lcsInvert2 lcs1 lcs2 len1 len2
-
Similar to lcsInvert. Instead of directly taking the result
of a call to longestCommonSubsequence this subcommand expects
the indices for the two sequences in two separate lists.
- ::struct::list lcsInvertMerge lcsData len1 len2
-
Similar to lcsInvert. It returns essentially the same
structure as that command, except that it may contain chunks of type
unchanged too.
These new chunks describe the parts which are unchanged between the
two sequences. This means that the result of this command describes
both the changed and unchanged parts of the two sequences in one
structure.
|
sequence 1 = {a b r a c a d a b r a}
lcs 1 = {1 2 4 5 8 9 10}
lcs 2 = {0 1 3 4 5 6 7}
sequence 2 = {b r i c a b r a c}
Inversion/Merge = {{deleted {0 0} {-1 0}}
{unchanged {1 2} {0 1}}
{changed {3 3} {2 2}}
{unchanged {4 5} {3 4}}
{deleted {6 7} {4 5}}
{unchanged {8 10} {5 7}}
{added {10 11} {8 8}}}
|
- ::struct::list lcsInvertMerge2 lcs1 lcs2 len1 len2
-
Similar to lcsInvertMerge. Instead of directly taking the
result of a call to longestCommonSubsequence this subcommand
expects the indices for the two sequences in two separate lists.
- ::struct::list reverse sequence
-
The subcommand takes a single sequence as argument and returns a new
sequence containing the elements of the input sequence in reverse
order.
- ::struct::list assign sequence ?varname?...
-
The subcommand assigns the first n elements of the input
sequence to the zero or more variables whose names were listed
after the sequence, where n is the number of specified
variables.
If there are more variables specified than there are elements in the
sequence the empty string will be assigned to the superfluous
variables.
If there are more elements in the sequence than variable names
specified the subcommand returns a list containing the unassigned
elements. Else an empty list is returned.
|
tclsh> ::struct::list assign {a b c d e} foo bar
c d e
tclsh> set foo
a
tclsh> set bar
b
|
- ::struct::list flatten ?-full? ?--? sequence
-
The subcommand takes a single sequence and returns a new
sequence where one level of nesting was removed from the input
sequence. In other words, the sublists in the input sequence are
replaced by their elements.
The subcommand will remove any nesting it finds if the option
-full is specified.
|
tclsh> ::struct::list flatten {1 2 3 {4 5} {6 7} {{8 9}} 10}
1 2 3 4 5 6 7 {8 9} 10
tclsh> ::struct::list flatten -full {1 2 3 {4 5} {6 7} {{8 9}} 10}
1 2 3 4 5 6 7 8 9 10
|
- ::struct::list map sequence cmdprefix
-
The subcommand takes a sequence to operate on and a command
prefix (cmdprefix) specifying an operation, applies the command
prefix to each element of the sequence and returns a sequence
consisting of the results of that application.
The command prefix will be evaluated with a single word appended to
it. The evaluation takes place in the context of the caller of the
subcommand.
|
tclsh> # squaring all elements in a list
tclsh> proc sqr {x} {expr {$x*$x}}
tclsh> ::struct::list map {1 2 3 4 5} sqr
1 4 9 16 25
tclsh> # Retrieving the second column from a matrix
tclsh> # given as list of lists.
tclsh> proc projection {n list} {::lindex $list $n}
tclsh> ::struct::list map {{a b c} {1 2 3} {d f g}} {projection 1}
b 2 f
|
- ::struct::list fold sequence initialvalue cmdprefix
-
The subcommand takes a sequence to operate on, an arbitrary
string initial value and a command prefix (cmdprefix)
specifying an operation.
The command prefix will be evaluated with two words appended to
it. The second of these words will always be an element of the
sequence. The evaluation takes place in the context of the caller of
the subcommand.
It then reduces the sequence into a single value through repeated
application of the command prefix and returns that value. This
reduction is done by
- 1
-
Application of the command to the initial value and the first element
of the list.
- 2
-
Application of the command to the result of the last call and the
second element of the list.
- ...
-
- i
-
Application of the command to the result of the last call and the
i'th element of the list.
- ...
-
- end
-
Application of the command to the result of the last call and the last
element of the list. The result of this call is returned as the result
of the subcommand.
|
tclsh> # summing the elements in a list.
tclsh> proc + {a b} {expr {$a + $b}}
tclsh> ::struct::list fold {1 2 3 4 5} 0 +
15
|
- ::struct::list iota n
-
The subcommand returns a list containing the integer numbers
in the range [0,n). The element at index i
of the list contain the number i.
For "n == 0" an empty list will be returned.
- ::struct::list equal a b
-
The subcommand compares the two lists a and b for
equality. In other words, they have to be of the same length and have
to contain the same elements in the same order. If an element is a
list the same definition of equality applies recursively.
A boolean value will be returned as the result of the command.
This value will be true if the two lists are equal, and
false else.
- ::struct::list repeat value size...
-
The subcommand creates a (nested) list containing the value in
all positions. The exact size and degree of nesting is determined by
the size arguments, all of which have to be integer numbers
greater than or equal to zero.
A single argument size which is a list of more than one element
will be treated as if more than argument size was specified.
If only one argument size is present the returned list will not
be nested, of length size and contain value in all
positions.
If more than one size argument is present the returned
list will be nested, and of the length specified by the last
size argument given to it. The elements of that list
are defined as the result of Repeat for the same arguments,
but with the last size value removed.
An empty list will be returned if no size arguments are present.
|
tclsh> ::struct::list repeat 0 3 4
{0 0 0} {0 0 0} {0 0 0} {0 0 0}
tclsh> ::struct::list repeat 0 {3 4}
{0 0 0} {0 0 0} {0 0 0} {0 0 0}
tclsh> ::struct::list repeat 0 {3 4 5}
{{0 0 0} {0 0 0} {0 0 0} {0 0 0}} {{0 0 0} {0 0 0} {0 0 0} {0 0 0}} {{0 0 0} {0 0 0} {0 0 0} {0 0 0}} {{0 0 0} {0 0 0} {0 0 0} {0 0 0}} {{0 0 0} {0 0 0} {0 0 0} {0 0 0}}
|
- ::struct::list dbJoin ?-inner|-left|-right|-full? ?-keys varname? {keycol table}...
-
The method performs a table join according to relational algebra. The
execution of any of the possible outer join operation is triggered by
the presence of either option -left, -right, or
-full. If none of these options is present a regular inner
join will be performed. This can also be triggered by specifying
-inner. The various possible join operations are explained in
detail in section TABLE JOIN.
If the -keys is present its argument is the name of a
variable to store the full list of found keys into. Depending on the
exact nature of the input table and the join mode the output table may
not contain all the keys by default. In such a case the caller can
declare a variable for this information and then insert it into the
output table on its own, as she will have more information about the
placement than this command.
What is left to explain is the format of the arguments.
The keycol arguments are the indices of the columns in the
tables which contain the key values to use for the joining. Each
argument applies to the table following immediately after it. The
columns are counted from 0, which references the first
column. The table associated with the column index has to have at
least keycol+1 columns. An error will be thrown if there are
less.
The table arguments represent a table or matrix of rows and
columns of values. We use the same representation as generated and
consumed by the methods get rect and set rect of
matrix objects. In other words, each argument is a list,
representing the whole matrix. Its elements are lists too, each
representing a single rows of the matrix. The elements of the
row-lists are the column values.
The table resulting from the join operation is returned as the result
of the command. We use the same representation as described above for
the input tables.
- ::struct::list dbJoinKeyed ?-inner|-left|-right|-full? ?-keys varname? table...
-
The operations performed by this method are the same as described
above for dbJoin. The only difference is in the specification
of the keys to use. Instead of using column indices separate from the
table here the keys are provided within the table itself. The row
elements in each table are not the lists of column values, but a
two-element list where the second element is the regular list of
column values and the first element is the key to use.
The longestCommonSubsequence subcommand forms the core of a
flexible system for doing differential comparisons of files, similar
to the capability offered by the Unix command diff.
While this procedure is quite rapid for many tasks of file comparison,
its performance degrades severely if sequence2 contains many
equal elements (as, for instance, when using this procedure to compare
two files, a quarter of whose lines are blank. This drawback is
intrinsic to the algorithm used (see the Reference for details).
One approach to dealing with the performance problem that is sometimes
effective in practice is arbitrarily to exclude elements that appear
more than a certain number of times.
This number is provided as the maxOccurs parameter. If frequent
lines are excluded in this manner, they will not appear in the common
subsequence that is computed; the result will be the longest common
subsequence of infrequent elements.
The procedure longestCommonSubsequence2 implements this
heuristic.
It functions as a wrapper around longestCommonSubsequence; it
computes the longest common subsequence of infrequent elements, and
then subdivides the subsequences that lie between the matches to
approximate the true longest common subsequence.
This is an operation from relational algebra for relational databases.
The easiest way to understand the regular inner join is that it
creates the cartesian product of all the tables involved first and
then keeps only all those rows in the resulting table for which the
values in the specified key columns are equal to each other.
Implementing this description naively, i.e. as described above will
generate a huge intermediate result. To avoid this the
cartesian product and the filtering of row are done at the same
time. What is required is a fast way to determine if a key is present
in a table. In a true database this is done through indices. Here we
use arrays internally.
An outer join is an extension of the inner join for two
tables. There are three variants of outerjoins, called left,
right, and full outer joins. Their result always
contains all rows from an inner join and then some additional rows.
-
For the left outer join the additional rows are all rows from the left
table for which there is no key in the right table. They are joined to
an empty row of the right table to fit them into the result.
-
For the right outer join the additional rows are all rows from the right
table for which there is no key in the left table. They are joined to
an empty row of the left table to fit them into the result.
-
The full outer join combines both left and right outer join. In other
words, the additional rows are as defined for left outer join, and
right outer join, combined.
We extend all the joins from two to n tables (n > 2) by
executing
|
(...((table1 join table2) join table3) ...) join tableN
|
Examples for all the joins:
|
Inner Join
{0 foo} {0 bagel} {0 foo 0 bagel}
{1 snarf} inner join {1 snatz} = {1 snarf 1 snatz}
{2 blue} {3 driver}
Left Outer Join
{0 foo} {0 bagel} {0 foo 0 bagel}
{1 snarf} left outer join {1 snatz} = {1 snarf 1 snatz}
{2 blue} {3 driver} {2 blue {} {}}
Right Outer Join
{0 foo} {0 bagel} {0 foo 0 bagel}
{1 snarf} right outer join {1 snatz} = {1 snarf 1 snatz}
{2 blue} {3 driver} {{} {} 3 driver}
Full Outer Join
{0 foo} {0 bagel} {0 foo 0 bagel}
{1 snarf} full outer join {1 snatz} = {1 snarf 1 snatz}
{2 blue} {3 driver} {2 blue {} {}}
{{} {} 3 driver}
|
J. W. Hunt and M. D. McIlroy, "An algorithm for differential
file comparison," Comp. Sci. Tech. Rep. #41, Bell Telephone
Laboratories (1976). Available on the Web at the second
author's personal site: http://www.cs.dartmouth.edu/~doug/
assign, common, comparison, diff, differential, equal, equality, flatten, folding, full outer join, inner join, join, left outer join, list, longest common subsequence, map, outer join, reduce, repeating, repetition, reverse, right outer join, subsequence
Copyright © 2003 by Kevin B. Kenny. All rights reserved
Copyright © 2003 Andreas Kupries <andreas_kupries@users.sourceforge.net>