Next Page | Prev Page | Main Index | Home |
MPbase can store any class and type of data a computer
can contain. Data classes include tabular, raster image, vector
image, spatial, textual, attribute-value pairs, etc. MPbase
does not treat all data classes the same. A data class is stored
in its natural order. Each class of data has a different kind
of "natural" order. This means using different approaches
to "naturalize" different classes of data.
In addition to differences in data class, there are also differences
in data types. Data types include integer, text, floating-point
and binary, in addition to several mixed forms. Within MPbase
the data typing can be either hard or soft. In the case of hard
typing an update will fail if the data does not conform to the
expectations. With soft typing any data can be put into any field
or column.
Soft typing can be used to allow "raw" data streams
to feed a database directly. Editing can then be a second step.
This allows editing of the data in context. This also allows the
roll back logs to see the original data when required. With soft
typing the storage efficiency for "out of type" data
is not very high. However, this is a small price to pay for the
enhanced editing and the ability to include or exclude it from
queries.
The efficiency with which MPbase stores any one class and
type is dependent on the data's information content or entropy.
This entropy value can very widely within a database. However,
for any one class it will tend to average into a relatively small
range. As an example, for a scanned photo requiring lossless compression
one should expect around 50%; for tabular data between 80% and
99.9%.
It is important to note that MPbase's compression works
significantly better on "real" data than on generated
test data. This is due to the relative entropy content of random
test values vs. their real world counterparts. Part of the naturalization
process involves taking a statistically significant portion of
the real data and analyzing it. This allows MPbase to fine-tune
the type of encoding it will use to store the data. This encoding
may vary over time without any need to reformat the older data.
In addition to the standard types, MPbase can support special
custom classes and types. These types then become part of MPbase.
One example of this special type is "address number."
This is a mostly numeric field. However, it also efficiently supports
n, s, e, w, -, / and any trailing letter. This means that "12w345-1/2"
is an "in type" value. In addition this is a soft typing
and so allows any "out of type" values as well.
So, as far as data typing and MPbase, almost anything goes.
Next Page | Prev Page | Main Index | Home | © 1998-2004 NPSI |