Next Page Prev Page Main Index Home

Virtual Meta-data
The keys to the information world

First, let's establish a working analogy for meta-data. Meta-data can be viewed as the information on a library's card catalog. This represents a very small but common subset of the information contained in the complete library. The catalog's information subset can be very useful both on its own, and as an index to the main body of information. The task of creating or changing such a catalog is a large and costly one. This cost severely limits the usefulness of a card catalog or any set of meta-data.

Now, what if you could use a "magic" card catalog? One where all you needed to do was define the card content and start your search based on your own definition of what should be on the cards. This is virtual meta-data. A database view containing summary information about the main body of the database. A view which, like meta-data, can be used on its own or as an index to the main data store. A view that is painless to build or change.

When you start looking at computerized meta-data the analogy starts to break down a little. In some systems the meta-data storage is several times larger than the original data. This is where you really need to start asking, "Is this the best way to work with this data?" In the case of more meta-data than data, there are normally millions of subtotals generated most of which will never be used. They are there to support queries that may or may not ever be run. The reason all possible subtotals are generated is that, at the time of creation, there is no way to know which subtotals will be needed.

So, why use meta-data at all? If the database is fast enough, as in the case of MPbase, why not just work with all of the data? Very few database applications can handle the totality of data in a data warehouse. Even if the database can produce all of the data in a timely manner, it would just overwhelm the application (like filling a teacup from a fire hose). With MPbase, meta-data is still required to reduce the data volume to the application. MPbase can produce this needed meta-data on the fly so that it requires no additional storage space.

This virtual meta-data is defined as a database "view". A database view is nothing more than a way of telling the database what you wish to see. A view can be thought of as a virtual database containing a subset of the complete database. In addition, a view may contain summary information from the database. In short, the view is used to create the "magic" card catalog mentioned above.

As an example, take a company with a 1-terabyte database and 3.5-terabytes of meta-data. MPbase could reduce the 1-terabyte to 300-gigabytes and eliminate the need to store the other 3.5-terabytes. The total savings in this case would be 4.2-terabytes.

This virtual meta-data from MPbase is the perfect way to use a massively parallel database with a "normal" application. It allows the database to do what it does best and leaves to the application the tasks that it does best.

One key aspect to using "virtual meta-data" is in the way you handle "big binary blobs" (BBB). This is a traditional database's way of handling data it does not understand: things like image data. MPbase can work with the information inside the BBB when creating the "virtual meta-data." This allows queries directly into the content of the BBB. In the traditional database environment, this would involve a separate application reading the BBB's and producing fixed meta-data.

Next Page Prev Page Main Index Home © 1998-2004 NPSI