Next Page Prev Page Main Index Home

Data Types in MPbase
Almost anything goes

MPbase can store any class and type of data a computer can contain. Data classes include tabular, raster image, vector image, spatial, textual, attribute-value pairs, etc. MPbase does not treat all data classes the same. A data class is stored in its natural order. Each class of data has a different kind of "natural" order. This means using different approaches to "naturalize" different classes of data.

In addition to differences in data class, there are also differences in data types. Data types include integer, text, floating-point and binary, in addition to several mixed forms. Within MPbase the data typing can be either hard or soft. In the case of hard typing an update will fail if the data does not conform to the expectations. With soft typing any data can be put into any field or column.

Soft typing can be used to allow "raw" data streams to feed a database directly. Editing can then be a second step. This allows editing of the data in context. This also allows the roll back logs to see the original data when required. With soft typing the storage efficiency for "out of type" data is not very high. However, this is a small price to pay for the enhanced editing and the ability to include or exclude it from queries.

The efficiency with which MPbase stores any one class and type is dependent on the data's information content or entropy. This entropy value can very widely within a database. However, for any one class it will tend to average into a relatively small range. As an example, for a scanned photo requiring lossless compression one should expect around 50%; for tabular data between 80% and 99.9%.

It is important to note that MPbase's compression works significantly better on "real" data than on generated test data. This is due to the relative entropy content of random test values vs. their real world counterparts. Part of the naturalization process involves taking a statistically significant portion of the real data and analyzing it. This allows MPbase to fine-tune the type of encoding it will use to store the data. This encoding may vary over time without any need to reformat the older data.

In addition to the standard types, MPbase can support special custom classes and types. These types then become part of MPbase. One example of this special type is "address number." This is a mostly numeric field. However, it also efficiently supports n, s, e, w, -, / and any trailing letter. This means that "12w345-1/2" is an "in type" value. In addition this is a soft typing and so allows any "out of type" values as well.

So, as far as data typing and MPbase, almost anything goes.

Next Page Prev Page Main Index Home © 1998-2004 NPSI