|Next Page||Prev Page||Main Index||Home|
This narrative will walk you through the steps taken while creating
an MPbase. It will follow the creation of the phone book
demo. Although the specifics will change based on the data to
be used, the general flow will remain the same.
1. Select the data to be used.
The 1996 addition of ProCD, Inc's, select phone product was picked. There were three main reasons for this choice. First, their product allowed unlimited downloading of the data into any database. Second, it contained 95 million rows. Third, it was data that would be recognized by anyone.
2. Naturalize the data.
A) Determine what physical tables will be needed.
For the phone listings it was decided that two physical tables would be needed. All the information relating to phone number and address would be in the one table. The other table would contain the name and business (SIC) information.
B) Determine a natural order for each table.
The order for the first table was set by State, Sorting center
(first three digits of zip code), Town name, Zip code, Area code,
Street name and finally Address. This sequence most closely approximates
how you would physically find this data outside of computers in
the natural world.
The order for the second table was set by State, First letter of name, Sorting center then rest of name. This sequence was influenced by the need to link it to the first table. The most natural order for the second table would have been by full name. However, this would have created the need for a large and unnatural key in the first table. The solution was to link the two tables at the sorting center level. This required "state" and "sorting center" to come before "rest of name."
3. Create data access layer.
A) Determine the groupings to be used.
This is where the directory trees to be used as part of the database are determined. The primary goal here is to create 500Kbyte to 1.5Mbyte low level files. A secondary goal is to group the directories in a way that will minimize the impact of any less than fully normal tables.
The directory tree for the first table is table/state/sorting center/file
The directory tree for the second table is table/state/first letter of name/file
B) Create the table-specific access-layer code.
For each table two data-specific programs must be created. One to add and update rows and one to extract rows. This is part of MPbase's "magic." The low-level files are compressed using multidimensional, data-intelligent, run length encoding. The result is a continuation of the naturalized structure down into a compressed file format.
4. Load the database.
For the phone book demo this took over two weeks. Most of this time was spent extracting the data from the source database. Each state was read and loaded individually. A PC ran the extract to a tab-delimited format. This result-set was then "naturalized" using PERL, C shell, and sort. The resulting set flat files were then loaded with the data access layer.
5. Set up the initial interface
Two Visual Basic programs were then written for this demo. The first to show what can be done in terms of a very powerful interface. The second to demonstrate how little is really needed to connect to an MPbase.
|Next Page||Prev Page||Main Index||Home||© 1998-2004 NPSI|