MetaKit for Tcl

The structured database which fits in the palm of your hand

[ Overview | Terminology | Installation | Getting started | Mk4tcl Reference ]

Buzzwords - MetaKit is an embeddable database which runs on Unix, Windows, Macintosh, and other platforms. It lets you build applications which store their data efficiently, in a portable way, and which will not need a complex runtime installation. In terms of the data model, MetaKit takes the middle ground between RDBMS, OODBMS, and flat-file databases - yet it is quite different from each of them.

Technology - Everything is stored variable-sized yet with efficient positional row access. Changing an existing datafile structure is as simple as re-opening it with that new structure. All changes are transacted. You can mix and match software written in C++, Python, and Tcl. Things can't get much more flexible...

Tcl/Tk - The extension for Tcl is called "Mk4tcl". It is being used in a number of commercial projects, for in-house use as well as in commercially distributed products.

Mk4tcl 2.4.9.2 - is a final/production release. The homepage points to a download area with pre-compiled shared libraries for Unix, Windows, and Macintosh. The MetaKit source distribution includes this documentation, the Mk4tcl C++ source code, a small Tcl test suite, a "mkshow.tcl" utility which lets you examine data in any MetaKit datafile from the command line, and a few more goodies.

Changes since 2.01 - the MK core has changed substantially:

New commit-aside and commit-extend modes (see the mk::file command)
Performance improvements, mostly due to a much more scalable file format
The "M" (memo) datatype is gone, use "B" instead, it now handles huge items
Internal changes to take advantage of the new hash custom viewer
Added "mk::file autocommit db" to force a commit on a subsequent close.

License and support - MetaKit 2 and up are distributed under the liberal X/MIT-style open source license. Commercial support is available through an Enterprise License. See the license page for details.

Credits - Are due to Mark Roseman for providing the initial incentive and feedback, and to Matt Newman for a range of suggestions and ideas. Evidently, Mk4tcl could not exist without the Tcl/Tk scripting platform and its superb extensibility.

Updates - The latest version of this document is at http://www.equi4.com/metakit/tcl.html.

Overview

MetaKit is a machine- and language-independent toolkit for storing and managing structured data. This is a description of the Mk4tcl extension, which allows you to create, access, and manipulate MetaKit datafiles using Tcl. Here is a Tcl script which selects, sorts, and displays some previously stored results:

    mk::file open db phonebook.dat -readonly
    foreach i [mk::select db.persons -glob name "Jon*" -sort date] {
        puts "Found [mk::get db.persons!$i name phone date]"
    }

This script illustrates how easy it is to access stored data from Tcl. What it does not show, however, is that numeric data can be stored in binary format (yet remain fully portable), that datafiles can contain complex (nested) datastructures, that the structure of datafiles can be adjusted at any time, and that all modifications use the commit / rollback transaction model.

In actual use, MetaKit resembles more an array manipulation package than a database - with the main access mechanism being 'by position', not by primary key. The Tcl interface does not yet cover all operations provided by the complete C++ interface of MetaKit, but as the mk::select command illsutrates, it does include quite flexible forms of searching and sorting.

Terminology

There are several ways to say the same thing, depending on where you're coming from. For example, the terms table, list, collection, array, sequence, and vector all denote a more or less similar concept. To help avoid confusion, MetaKit uses a simple (but hopefully precise) terminology.

The terms adopted by MetaKit can be summarized as follows:

A view is an indexable collection of rows (a table of records, an array of elements).
An index is a position in a view, used to specify a row (the first row is at index zero).
Each view has an ordered set of properties, used to refer to the data values of each row.
In MetaKit, each (view, index, property) combination denotes a single data value.
A different way to describe this combination would be: (matrix, row-index, column-id).
Data values can be strings, numeric, untyped data, or a nested view, called a subview.
A cursor is a reference to a specific row in a specific view, i.e. a (view, index) tuple.

The Mk4tcl extension adds several notational conventions:

A tag is an identifier used to refer to an open datafile.
Top-level views are specified as tag.viewname.
Row N in such a view can be specified as tag.viewname!N.
Subviews extend this notation, e.g. tag.viewname!N.subview.
Sub-rows continue in the same way, e.g. tag.viewname!N.subview!M.
The specification of a view (either top-level or subview) is called a path.
Thus, both tag.viewname and tag.viewname!N.subview are paths.
In Mk4tcl, a cursor placed at the Nth row is equivalent to the string "path!N".
A trailing row index is allowed and ignored wherever a path is expected.
As a result, cursors are allowed (and frequently used) as path arguments.

A few more comments about the semantics of MetaKit:

Views are homogenous: each row in a view contains the same type of information.
This also implies that all subviews within the same view always have the same structure.
Rows are either part of a view on file, or temporary (gone when no longer referenced).
A cursor need not point to an existing row (its current position may be out of range).

Installation

Download the latest version from http://www.equi4.com/pub/download.html
On Unix, rename the appropriate compiled extension to "Mk4tcl.so" (on Win/Mac, use the corresponding file)
Do a small test, by running "demo.tcl". If all is well, you should get some self-explanatory output
Place the extension somewhere on Tcl's package search path (or just leave it in ".")

Getting started

Create a datafile:

package require Mk4tcl
mk::file open db datafile.mk

Create a view (this is the MetaKit term for "table"):

set vw [mk::view layout db.people {first last shoesize:I}]

Add two rows (this is the MetaKit term for "record"):

mk::row append $vw first "John" last "Lennon" shoesize 44
mk::row append $vw first "Flash" last "Gordon" shoesize 42

Commit the changes to file:

mk::file commit db

Show a list of all people:

mk::loop c $vw {puts [mk::get $c first last shoesize]}

Show a list of all people, sorted by last name:

foreach r [mk::select $vw -sort last] {puts [mk::get $vw!$r]}

Show a list of all people with first name 'John':

foreach r [mk::select $vw first "John"] {puts [mk::get $vw!$r]}

Mk4tcl Reference

mk::file		Opening, closing, and saving datafiles
mk::view		View structure and size operations
mk::cursor		Cursor variables for positioning
mk::row		Create, insert, and delete rows
mk::get		Fetch values
mk::set		Store values
mk::loop		Iterate over the rows of a view
mk::select		Selection and sorting
mk::channel		Channel interface (new in 1.2)

mk::file

Opening, closing, and saving datafiles

SYNOPSIS

mk::file open
mk::file open tag
mk::file open tag filename ?-readonly? ?-nocommit? ?-extend? ?-shared?
mk::file views tag
mk::file close tag
mk::file commit tag ?-full?
mk::file rollback tag ?-full?
mk::file load tag channel
mk::file save tag channel
mk::file aside tag tag2
mk::file autocommit tag

DESCRIPTION

The mk::file command is used to open and close MetaKit datafiles. It is also used to force pending changes to disk (commit), to cancel the last changes (rollback), and to send/receive the entire contents of a datafile over a Tcl channel, including sockets (load/save).

Without arguments, 'mk::file open' returns the list of tags and filenames of all datasets which are currently open (of the form tag1 name1 tag2 name2 ...).

The 'mk::file open' command associates a datafile with a unique symbolic tag. A tag must consist of alphanumeric characters, and is used in the other commands to refer to a specfic open datafile. If filename is omitted, a temporary in-memory dataset is created (which cannot use commit, but which you could save to an I/O channel). When a datafile is closed, all pending changes will be written to file, unless the -nocommit option is specified. In that case, only an explicit commit will save changes. To open a file only for reading, use the -readonly option. Datafiles can be opened read-only by any number of readers, or by a single writer (no other combinations are allowed). There is an additional mode, specified by the -extend option: in this case changes are always written at the end of the datafile. This allows modifications by one writer without affecting readers. Readers can adjust to new changes made that way by doing a "rollback" (see below). The term is slightly confusing in this case, since it really is a "roll-forward" ... The -shared option causes an open datafile to be visible in every Tcl interpreter, with thread locking as needed. The datafile is still tied to the current interpreter and will be closed when that interpreter is terminated.

The 'mk::file views' command returns a list with the views currently defined in the open datafile associated with tag. You can use the 'mk::view layout' command to determine the current structure of each view.

The 'mk::file close' command closes the datafile and releases all associated resources. If not opened with -readonly or -nocommit, all pending changes will be saved to file before closing it. A tag loses its special meaning after the corresponding datafile has been closed.

The 'mk::file commit' command flushes all pending changes to disk. It should not be used on a file opened with the -readonly option. The optional -full argument is only useful when a commit-aside is active (see below). In that case, changes are merged back into the main datafile instead of being saved separately. The aside dataset is cleared.

The 'mk::file rollback' command cancels all pending changes and reverts the situation to match what was last stored on file. When commit-aside is active, a full rollback cause the state to be rollback to what it was without the aside changes. The aside dataset will be ignored from now on.

The 'mk::file load' command replaces all views with data read from any Tcl channel. This data must have been generated using 'mk::file save'. Changes are made permanent when commit is called (explicitly or implicitly, when a datafile is closed), or they can be reverted by calling rollback.

The 'mk::file aside' command starts a special "commit-aside" mode, whereby changes are saved to a second database file. This can be much faster that standard commits, because only changes are saved. In commit- aside mode, the main datafile will not be modified it all, in fact it can be opened in read-only mode.

The 'mk::file autocommit' command sets up a database file to automatically issue a commit when the file is closed later. This is useful if the file was initially opened in -nocommit mode, but you now want to change this setting (there is no way to return to -nocommit, although a rollback has a similar effect).

EXAMPLES

Open a datafile (create it if necessary), for read-write access:

    mk::file open db test.dat

Display the structure of every view in the datafile:

    foreach v [mk::file views db] {
        puts [mk::view layout db.$v]
    }

Send all data across a TCP/IP socket connection:

    set chan [socket 127.0.0.1 12345]
    mk::file save db $chan
    close $chan

mk::view

View structure and size operations

SYNOPSIS

mk::view layout tag.view
mk::view layout tag.view {structure}
mk::view delete tag.view
mk::view size path
mk::view size path size
mk::view info path

DESCRIPTION

The mk::view command is used to query or alter the structure of a view in a datafile (layout, delete), as well as the number of rows it contains (size). The last command (info) returns the list of properties currently defined for a view.

The 'mk::view layout' command returns a description of the current datastructure of tag.view. If a structure is specified, the current data is restructured to match that, by adding new properties with a default value, deleting obsolete ones, and reordering them.

Structure definitions consist of a list of properties. Subviews are specified as a sublist of two entries: the name and the list of properties in that subview. Note that subviews add two levels of nesting (see phones in the phonebook example below). The type of a property is specified by appending a suffix to the property name (the default type is string):

:S: A string property for storing strings of any size, but no null bytes.
:I: An integer property for efficiently storing values as integers (1..32 bits).
:L: An long property for storing values as 64-bit integers.
:F: A float property for storing single-precision floating point values (32 bits).
:D: A double property for storing double-precision floating point values (64 bits).
:B: A binary property for untyped binary data (including null bytes).
:M: Obsolete (now treated as :B).

Properties which are not listed int the layout will only remain set while the datafile is open, but not be stored. To make properties persist, you must list them in the layout definition, and do so before setting them.

The 'mk::view delete' command completely removes a view and all the data it contains from a datafile.

The 'mk::view size' command returns the number of rows contained in the view identified as tag.view. If an argument is specified, the size of the view is adjusted accordingly, dropping the highest rows if the size is decreased or adding new empty ones if the size is increased. The command 'mk::view size 0' deletes all rows from a view, but keeps the view in the datafile so rows can be added again later (unlike 'mk::view delete'.

The 'mk::view info' returns the list of properties which are currently defined for path.

Note that the layout and delete sub-commands operate only on top-level views (of the form tag.view), whereas size and info take a path as arguments, which is either a top-level view or a nested subview (of the form 'tag.view!index.subview!subindex...etc...subview').

EXAMPLES

Define a phonebook view which can store more than one phone number for each person:

    mk::view layout db.book {name address {phones {category phone}}}

Add a new phonebook entry:

    mk::row append db.book name "Steve" address "Down-under"

Add two phone numbers to phone book entry zero, i.e. "Steve":

    mk::row append db.book!0.phones category "home" phone "1234567"
    mk::row append db.book!0.phones category "mobile" phone "2345678"

Restructure the view in the datafile, adding an integer date field:

    mk::view layout db.book {name address {phones {category phone}} date:I}

Delete all phonebook entries as well as its definition from the datafile:

    mk::view delete db.book

mk::cursor

Cursor variables for positioning

SYNOPSIS

mk::cursor create cursorName ?path? ?index?
mk::cursor position cursorName
mk::cursor position cursorName 0
mk::cursor position cursorName end
mk::cursor position cursorName index
mk::cursor incr cursorName ?step?

DESCRIPTION

The mk::cursor command is used to manipulate 'cursor variables', which offer an efficient means of iterating and repositioning a 'reference to a row in a view'. Though cursors are equivalent to strings of the form somepath!N, it is much more efficient to keep a cursor around in a variable and to adjust it (using the position subcommand), than evaluating a 'somepath!$index' expression every time a cursor is expected.

The 'mk::cursor create' command defines (or redefines) a cursor variable. The index argument defaults to zero. This is a convenience function, since 'mk::cursor create X somePath N' is equivalent to 'set X somePath!N'.

When both path and index arguments are omitted from the 'mk::cursor create' command, a cursor pointing to an empty temporary view is created, which can be used as buffer for data not stored on file.

The 'mk::cursor position' command returns the current position of a cursor, i.e. the 0-based index of the row it is pointing to. If an extra argument is specified, the cursor position will be adjusted accordingly. The 'end' pseudo-position is the index of the last row (or -1 if the view is currently empty). Note that if 'X' is a cursor equivalent to somePath!N, then 'mk::cursor position X M' is equivalent to the far less efficient 'set X somePath!M'.

The 'mk::cursor incr' command adjusts the current position of a cursor with a specified relative step, which can be positive as well as negative. If step is zero, then this command does nothing. The command 'mk::cursor incr X N' is equivalent to 'mk::cursor position X [expr {[mk::cursor position X] + N}]'.

mk::row

Create, insert, and delete rows

SYNOPSIS

mk::row create ?prop value ...?
mk::row append path ?prop value ...?
mk::row insert cursor count ?cursor2?
mk::row delete cursor ?count?
mk::row replace cursor ?cursor2?

DESCRIPTION

The mk::row command deals with one or more rows of information. There is a command to allocate a temporary row which is not part of any datafile (create), and the usual set of container operations: appending, inserting, deleting, and replacing rows.

The 'mk::row create' command creates an empty temporary row, which is not stored in any datafile. Each temporary rows starts out without any properties. Setting a property in a row will implicitly add that property if necessary. The return value is a unique cursor, pointing to this temporary row. The row (and all data stored in it) will cease to exist when no cursor references to it remain.

The 'mk::row append' command extends the view with a new row, optionally setting some properties in it to the specified values.

The 'mk::row insert' command is similar to the append sub-command, inserting the new row in a specified position instead of at the end. The count argument can be used to efficiently insert multiple copies of a row.

The 'mk::row delete' command deletes one or more rows from a view, starting at the row pointed to by cursor.

The 'mk::row replace' command replaces one row with a copy of another one, or clears its contents if cursor2 is not specified.

EXAMPLES

Define a cursor pointing to a new empty row:

    set cursor [mk::row create]

Initialize a temporary view with 100 copies of the string "Hello":

    mk::cursor create cursor 
    mk::row insert $cursor 100 [mk::row create text "Hello"]

mk::get

Fetch values

SYNOPSIS

mk::get cursor ?-size?
mk::get cursor ?-size? prop ...

DESCRIPTION

The mk::get command fetches values from the row specified by cursor.

Without argument, get returns a list of 'prop1 value1 prop2 value2 ...'. This format is most convenient for setting an array variable, as the following example illustrates:

    array set v [mk::get db.phonebook!0]
    parray v

Note that the cursor argument can be the value of a cursor variable, or it can be synthesized on the spot, as in the above example.

If the -size option is specified, the size of property values is returned instead of their contents. This is normally in bytes, but for integers it can be a negative value indicating the number of bits used to store ints (-1, -2, or -4). This is an efficient way to determine the sizes of property values without fetching them.

If arguments are specified in the get command, they are interpreted as property names and a list will be returned containing the values of these properties in the specified order.

If cursor does not point to a valid row, default values are returned instead (no properties, and empty strings or numeric zero's, according to the property types).

EXAMPLES

Set up an array containing all the fields in the third row:

    array set fields [mk::get db.phonebook!2]

Created a line with some formatted fields:

    puts [eval [list format {%-20s %d}] [mk::get db.phonebook!2 name date]]

mk::set

Store values

SYNOPSIS

mk::set cursor ?prop value ...?

DESCRIPTION

The mk::set command stores values into the row specified by cursor.

If a property is specified which does not exist, it will be appended as a new definition for the containing view. As an important side effect, all other rows in this view will now also have such a property, with an appropriate default value for the property. Note that when new properties are defined in this way, they will be created as string properties unless qualified by a type suffix (see 'mk::view layout' for details on property types and their default values).

Using mk::set command without specifying properties returns the current value and is identical to mk::get.

If cursor points to a non-existent row past the end of the view, an appropriate number of empty rows will be inserted first.

mk::loop

Iterate over the rows of a view

SYNOPSIS

mk::loop cursorName {body}
mk::loop cursorName path {body}
mk::loop cursorName path first ?limit? ?step? {body}

DESCRIPTION

The mk::loop command offers a convenient way to iterate over the rows of a view. Iteration can be restricted to a certain range, and can optionally use a forward or backward step. This is a convenience function which is more efficient than performing explicit iteration over an index and positioning a cursor.

When called with just a path argument, the loop will iterate over all the rows in the corresponding view. The cursorName loop variable will be set (or reset) on each iteration, and is created if it did not yet exist.

When path is not specified, the cursorName variable must exist and be a valid cursor, although its current position will be ignored. The command 'mk::loop X {...}' is identical to 'mk::loop X $X {...}'.

The first argument specifies the first index position to use (default 0), the limit argument specifies the last argument (default 'end'), and the step argument specifies the increment (default 1). If step is negative and limit exceeds first, then the loop body will never be executed. A zero step value can lead to infinite looping unless the break command is called inside the loop.

The first, limit, and step arguments may be arbitrary integer expressions and are evaluated exactly once when the loop is entered.

Note that you cannot easily use a loop to insert or delete rows, since changes to views do not adjust cursors pointing into that view. Instead, you can use tricks like moving backwards (for deletions), or splitting the work into two separate passes.

mk::select

Selection and sorting

SYNOPSIS

mk::select path ?options ...?

DESCRIPTION

The mk::select command combines a flexible selection operation with a way to sort the resulting set of rows. The result is a list of row index numbers (possibly empty), which can be used to reposition a cursor and to address rows directly.

A selection is specified using any combination of these criteria:

prop value: Numeric or case-insensitive match
-min prop value: Property must be greater or equal to value (case is ignored)
-max prop value: Property must be less or equal to value (case is ignored)
-exact prop value: Exact case-sensitive string match
-glob prop pattern: Match "glob-style" expression wildcard
-globnc prop pattern: Match "glob-style" expression, ignoring case
-regexp prop pattern: Match specified regular expression
-keyword prop word: Match word as free text or partial prefix

If multiple criteria are specified, then selection succeeds only if all criteria are satisfied. If prop is a list, selection succeeds if any of the given properties satisfies the corresponding match.

Optional selection constraints:

-first pos: Selection starts at specified row index
-count num: Return no more than this many results

Note: not yet very useful with sorting, which is done after these constraints have been applied.

To sort the set of rows (with or without preliminary selection), use:

-sort prop
-sort {prop ...}: Sort on one or more properties, ascending
-rsort prop
-rsort {prop ...}: Sort on one or more properties, descending

Multiple sort options are combined in the order given.

EXAMPLES

Select a range of entries:

    foreach i [mk::select db.phonebook -min date 19980101 -max date 19980131] {
        puts "Dated Jan 1998: [mk::get db.phonebook!$i name]"
    }

Search for a unique match ('-count 2' speeds up selection when many entries match):

    set v [mk::select db.phonebook -count 2 -glob name "John*"]
    switch [llength $v] {
        0       {puts "not found"}
        1       {puts "found: [mk::get db.phonebook![lindex $v 0] name]"}
        2       {puts "there is more than one entry matching 'John*'"}
    }

Sort by descending date and by ascending name:

    foreach i [mk::select db.phonebook -rsort date -sort name] {
        puts "Change log: [mk::get db.phonebook!$i date name]"
    }

mk::channel

Channel interface

SYNOPSIS

mk::channel path prop ?mode?

DESCRIPTION

The mk::channel command provides a channel interface to binary fields. It needs the path of a row and the name of a binary prop, and returns a channel descriptor which can be used to read or write from.

Channels are opened in one of three modes:

read - open for reading existing contents (default)
write - clear contents and start saving data
append - keep contents, set seek pointer to end

Note: do not insert or delete rows in a view within which there are open channels, because subsequent reads and writes may end up going to the wrong memo property.

EXAMPLES

Write a few values (with line separators):

    mk::view layout db.v {b:B}
    mk::view size db.v 1

    set fd [mk::channel db.v!0 b w]
    puts $fd one
    puts $fd two
    puts $fd three
    close $fd

Read values back, line by line:

    set fd [mk::channel db.v!0 b]
    while {[gets $fd text] >= 0} {
        puts $text
    }
    close $fd