Variable Depth Data Model Hierarchies

 

Introduction

It is sometimes required to have self referencing relationships in relational data models. There are, however, certain use cases where unary relationships in general and, specifically as implemented in Helium DSL, presents challenges with regards to performance and robustness. 

Consider the following use case as an example:

A variable depth hierarchy of locations needs to be maintained in a DSL product. This hierarchy must be configurable in that it can have maximum depth that is variable per application or application instance depending on the physical or governmental locations present in the country of deployment. Each location can have many children locations but can have only one parent location.

 

 

Unary Relationship Solution

A solution consisting of unary relationships will have a location object with a unary relationship representing the relationship between different object instances. In addition a meta object describing the location levels can be included.These location levels can be maintained as part of the application configuration.

Now consider three commonly used operations on such a structure as well as their relative performance:

  • Given a location, insert, a new location that as its child
    Performance wise this would be relatively fast as it simply involves setting a relationship

  • Given a location, query for all children of that location
    Performance wise this would would also be relatively fast 

  • Given a location, query for all descendants of that location
    Using a unary relationship, this will be implemented by looping over each level of location, starting at the provided parent location and at each iteration of the loop executing a relationshipIn selector of the children from the previous iteration or the parent if it is the first iteration of the loop. At each step the results are then added to a collection with the final collection returned after execution. Due to the iteration, multiple relationshipIn selectors and lazy loading in the Helium persistence tier, the performance in this case will be poor.

  • Given a location, query for its descendants of a specific level of location
    This use case is similar to the use case mentioned above but the collection that is built up only contains locations on a single level. The same performance penalties apply.

From the above we can see that insert operations are relatively quick while query operations might be prohibitively expensive.

 

 

Nested Set Solution

In order to overcome some of the challenges when querying data in a variable depth hierarchy is to make use of nested sets. Using nested sets, each location has two additional attributes which represent the start and end of a set. All locations that are descendants of a location will have subset start values larger that that their parent and subset end values smaller than that of their parent. See the diagram below for an example:

Note, that for nested sets, keeping track of the location level meta data for each location is more important and will be required for queries.

Using this solution the relative performance of common operations are as follows:

  • Given a location, insert, a new location that is it's child
    This operation can potentially be very expensive as the elements in the hierarchical structure needs to be renumbered. That is, their start and end subset numbers need to be updated after a new location is inserted. The number of locations that have to be renumbered depends on the current state of the hierarchy and the position in which the new location is to be placed. Generally speaking though, more locations in the current structure results in more renumbering and thus slower inserts. 

  • Given a location, query for all children of that location
    This is a relatively inexpensive operation and, in the DSL, can be implemented with an and selector based on the start and end subset values and the location level.

  • Given a location, query for all descendants of that location
    In contrast to the unary relationship solution, this operation is much more performant. Similarly to the above query it's simply based on the subset start and end values of the parent location.

  • Given a location, query for its descendants of a specific level of location
    Similarly to the query operations mentioned above, this is a relatively performant operation and is again based on an and selector that references the parent location subset start and end values and the location level.

From the above we see that in contrast to a solution making use of unary relationships, a nested set solution will be slow when inserting new locations, but highly performant when querying the structure.

 

 

Nested Set Example DSL Implementation

A sample DSL implementation for managing a variable depth locations hierarchy using the nested set approach is included in this document under the section Nested Set Example DSL Implementation Source Code. In this section we will highlight the key components of this implementation.

 

 

Data Model

The data model of our example consists of two main objects representing the location hierarchy:

// Object that represents locations on any level in our variable depth hierarchy 
persistent object Location {
    
    // Subset start and end values
    int start;
    int end;
    
    // Details of the location and its level
    string description;
    string levelDescription;
    int level;
    
    // Details of the parent and its level, useful for display purposes
    string parentDescription;
    string parentLevelDescription;
    int parentLevel;
    
    // Unary relationship to the parent for easy traversal from descendants and as backup in case hierarchy renumbering fails
    @ManyToOne
    Location parent via children;
}
// Part of system configuration that provides some meta data on the different levels of locations in the variable depth location hierarchy
persistent object LocationLevel {
 
	// The name of the location level
    string description;
 
	// Used for display purposes only
    string descriptionPlural;
 
	// Integer value representing the depth of this level
    int level;
    
    @OneToMany
    Location locations via level;
}

Note that there is still a unary relationship included and maintained on the Location object. This is only used as a backup for an exceptional case of something preventing the correct renumbering of locations in our hierarchy.  In such a case the unary relationship can be used to reconstruct the nested set hierarchy if needed. Including a unary relationship does not have a negative effect on performance as it is not used for any normal queries.

 

 

Insert and Maintenance of the Hierarchy

The basic operation for inserting a new location is as follows:

Determine the position of the new location by traversing the hierarchy structure from the parent of the new location to the root. To do this we simply construct a collection of locations representing a path from the parent of the new location to the root of the hierarchy. Note that this was implemented using the "backup" unary relationship. It could also have been done using subset start and end values.

LocationsUtil:getPathToRoot
 // Path from a location to the root, using unary relationships (this is only for convenience, can be changed to used subset values)
Location[] getPathToRoot(Location location) {
    Location[] path;
    
    for(;location.parent != null;) {
        location = location.parent;
        path.append(location);
    }
    return path;
}
LocationsUtil:getUuidPathToRoot
// Variation to find path to root only keeping UUIDs
uuid[] getUuidPathToRoot(Location location) {
    uuid[] path;
    
    for(;location.parent != null;) {
        location = location.parent;
        path.append(location._id);
    }
    return path;
}

 

Now, renumber the hierarchy by starting at the root of the hierarchy and counting the number of descendants for that location while keeping in mind that the new location might now be a descendant of this location. Counting the number of descendants using the subset start and end values for a location is easy and efficient:

 / Count the original descendants of the current root
int descendantsCount = countDescendants(root);
        
// If the new location is to be a descendant of the current root we take this into consideration
if(containsLocation(pathFromNewLocation, root) == true) {
    descendantsCount++;
}
LocationsUtilHelper:countDescendants
// Number of descendants for the given location
int countDescendants(Location location) {
    return (location.end - location.start) / 2;
}

 

Update the start and end subset values accordingly. Do this recursively for the current location's children and their children etc. Stop making recursive calls when the current location is not updated. This implies that none of its descendants need to be updated.

Note that in the example DSL implementation presented in this document, the subset start and end values are changed directly during renumbering. If many locations need to be renumbered, this implies many operations on the database. These persistent operations are by far the slowest part of the implementation. During the renumbering process, the tree is therefore in an inconsistent state meaning queries for data might return incorrect results. Once the renumbering is complete, however, query results will be correct once again.

Please consult the included source code for more details of this implementation.

 

 

Querying the Hierarchy

As shown in the example functions below, querying the hierarchy is relatively simple making use of greaterThan, lessThan, and equals selectors combined in an and selector. This avoids iteration and multiple relationshipIn selectors that would have been necessary with a solution making use of unary relationships only. 

LocationsUtil:getChildren
// Return children of the provided location, special case for the null location
Location[] getChildren(Location location) {
    if(location == null) {
        return getRoots();
    }
    
    return Location:and(
        greaterThan(start, location.start),
        lessThan(end, location.end),
        equals(level, location.level + 1)
    );
}
LocationsUtil:getDescendants
// Return all locations that are descendants of the provided location
Location[] getDescendants(Location location) {
    return Location:and(
        greaterThan(start, location.start),
        lessThan(end, location.end),
        greaterThan(level, location.level)
    );
}
LocationsUtil:getDescendantsOnLevel
// Return locations that are descendants of the provided location and on a specific location level
Location[] getDescendantsOnLevel(Location location, int level) {
    return Location:and(
        greaterThan(start, location.start),
        lessThan(end, location.end),
        equals(level, level)
    );
}

 

 

 

Testing and Performance Evaluation

The included sample source code contains some functionality that can be used for testing and evaluation of performance.

 

 

Generating Test Data

Included in the app is a "Testing" view. This can be accessed from the "Testing" menu item. This view contains various action buttons for generating test data:

ButtonPurpose
Delete and generate location levelsDeletes all current locations levels and generates hard coded location levels. This needs to be generated before any test location data can be inserted. Location levels can also be added manually using the "Manage Location Levels" menu item.
Delete and generate basic locationsDeletes all current locations and generates basic, hard coded locations.
Delete and generate random locationsDeletes all current locations and generates three fixed root locations followed by 10 random locations. Random locations are added by randomly selecting an existing location as parent and then creating the new location with name that is a concatenation of the parent's location, the level on which the random location is added and a random number.
Generate random locationsThis generates 10 random locations without deleting existing locations. If the three root locations mentioned above already exists, they are not recreated.

 

 

 

User Interface Hierarchy Traversal

The "Manage Locations" menu item allows for the manual traversal of the locations hierarchy starting at a root location. At each step the user can drill down further to view the children of a location using the provided row actions.

 

 

String Representation of Hierarchy

In addition to traversing the hierarchy using the user interface the "Testing" view also contains a widget that displays a string representation of the hierarchy. This can be used for a quick inspection of the whole hierarchy for the purpose of testing, debugging or to better understand the workings of the algorithm.

 

 

 

Asynchronous Processing of Inserts

Given the potentially slow nature of inserts the sample app includes functionality to process insert operations asynchronously. This is done using an additional persistence object and scheduled function:

persistent object AsyncLocationInsertTask {
    datetime requestTstamp;
    string locationName;
    string locationLevelName;
    string parentName;
}
Timers:processAsyncLocationInsertTasks
@Scheduled("* * * * *")
void processAsyncLocationInsertTasks() {
    AsyncLocationInsertTask[] tasks = AsyncLocationInsertTask:all();
    SystemConfig config = ConfigUtil:getSystemConfig();
    if(config.verboseLogging == true) {
        Mez:log(Strings:concat("Processing batch location insert. Current queue size: ", tasks.length()));
    }
    
    tasks.sortDesc("requestTstamp");
    int stop = tasks.length() - 10;
    if(stop < 0) {
        stop = 0;
    }
    
    for(int i = tasks.length() - 1; i >= stop; i--) {
        AsyncLocationInsertTask currentTask = tasks.get(i);
        LocationsUtilHelper:doInsert(currentTask.locationName, currentTask.locationLevelName, currentTask.parentName);
        AsyncLocationInsertTask:delete(currentTask);
    }
    
    tasks = AsyncLocationInsertTask:all();
    if(config.verboseLogging == true) {
        Mez:log(Strings:concat("Batch location insert completed. Current queue size: ", tasks.length()));
    }
}

To activate asynchronous inserts, access the "Configuration" view using the "Configuration" menu item and select the "Asynchronous locations inserts" option.

 

 

Scheduled Function for Generating Inserts

For further testing locations can also be generated randomly using a scheduled function:

Timers:insertRandomLocations
@Scheduled("*/5 * * * *")
void insertRandomLocations() {  
    SystemConfig config = ConfigUtil:getSystemConfig();
    if(config.randomLocationsTimer == true) {
        LocationsTest:generateCountLocationsRandom(30);
    }
}

To activate inserts using this scheduled function, access the "Configuration" view using the "Configuration" menu item and select the "Enable timer to insert random locations" option.

 

 

Measuring the Performance of Inserts

In order to measure the performance of inserts, an additional persistent object is included in the data model:

persistent object InsertResult {
    datetime createdTstamp;
    datetime startTime;
    datetime endTime;
    int seconds;
    int renumberings;
    int totalRecords;
    string logOutput;
}

Each insert automatically creates a record of the above object that can later be inspected manually in the database. In addition real time logging can also be enabled. The logOutput attribute above is then logged and can be seen using the Helium logging service. To enable logging, access the "Configuration" view using the "Configuration" menu item and select the "Verbose logging" option.

 

 

Summary

In summary, when a variable depth data model hierarchy is required, the suggested method is to make use of nested sets. With that said, it should be evaluated for your use case and the following characteristics, advantages and disadvantages should be considered:

  • Inserts on a variable depth hierarchy using nested sets is relatively slow and becomes slower with more data in the hierarchy. It is therefore suited to use cases where inserts only happen occasionally.
  • Using variable depth hierarchies in the first place adds complexity to data models and might impact legibility of code and maintenance costs. Before deciding on the use of a nested set variable depth hierarchy it should be evaluated whether a variable depth hierarchy is needed at all.
  • Altering start and end subset values might leave the hierarchy in a temporary inconsistent state and might produce incorrect or incomplete results when querying while the elements are still be renumbered.

 

 

 

Nested Set Example DSL Implementation Source Code

nested-set-locations-hierarchy.zip

Â