SQL Server 2008/R2/2012 offers hierarchyid data type that can be used for handling hierarchical data. In this blog I will try to take you down the rabbit hole and explore common table expressions and hierarchyid.
Quite often we have to deal with hierarchical or recursive data. e.g.
Product Category (Entertainment Device)
Product Sub-category (TV)
Product Sub-sub-category (LCD)
Product Sub-sub-sub-category (HD)
…..
I guess you get the idea.
Another example would be Employee – Manager relationship where every manager is an employee (but not every employee is a manager). Here’s a sample diagram:
So, how do we store this data in our table and then how do we query it? We have two options:
1. Common Table Expressions – CTE
This works in older versions of SQL Server 2005 and also the newer versions like 2008/R2, 2012. We use the traditional data types like int etc. to handle relationship.
Let’s first create the table so we can store the records and then we will query it.
1: CREATE TABLE dbo.Employee
2: (
3: EmployeeId int NOT NULL IDENTITY(1,1) PRIMARY KEY,
4: Name varchar(50) NOT NULL,
5: ManagerId int
6: )
7: GO
Next step is to populate the table with the data (note: I am using same data that I have in my diagram)
1: INSERT INTO dbo.Employee (Name, ManagerId)
2: VALUES
3: ('CEO', NULL),
4: ('Sales Manager', 1),
5: ('Accounts Manager', 1),
6: ('Senior Sales Person 1', 2),
7: ('Senior Sales Person 2', 2),
8: ('Senior Accounts Person 1', 3),
9: ('Senior Accounts Person 2', 3),
10: ('Junior Sales Person 1', 4),
11: ('Junior Sales Person 2', 5),
12: ('Junior Accounts Person 1', 6),
13: ('Junior Accounts Person 2', 7)
Now we are ready to use common table expressions to query the table.
1: --Finding descendants
2: WITH EmployeeDescendants(EmployeeId, Name, ManagerId)
3: AS
4: (
5: select EmployeeId, Name, ManagerId from dbo.Employee
6: WHERE EmployeeId = 1 -- this is parameter
7: union all
8: select m.EmployeeId, m.Name, m.ManagerId
9: from EmployeeDescendants CTE,dbo.Employee m
10: WHERE CTE.EmployeeId = m.ManagerId
11: )
12: SELECT EmployeeId, Name, ManagerId
13: FROM EmployeeDescendants
14: ORDER BY EmployeeId
15:
16: --Finding ancestors
17:
18: WITH EmployeeAncestor(EmployeeId, Name, ManagerId)
19: AS
20: (
21: select EmployeeId, Name, ManagerId from dbo.Employee
22: WHERE EmployeeId = 11 -- this is parameter
23: union all
24: select m.EmployeeId, m.Name, m.ManagerId
25: from EmployeeAncestor CTE,dbo.Employee m
26: WHERE CTE.ManagerId = m.EmployeeId
27: )
28: SELECT EmployeeId, Name, ManagerId
29: FROM EmployeeAncestor
30: ORDER BY EmployeeId ASC
31:
Common Table Expressions aren’t too bad, eh. You could easily convert the above script into a stored procedure and pass arguments to parameters.
Query Execution Plan for finding descendant nodes looks like this (Note: Plan for finding ancestors is similar):
Ouch, lots is going on in the query execution plan.
2. HierarchyId
This won’t work in SQL Server 2005. You must have newer versions like 2008/R2, 2012. We use the newer managed data type called “hierarchyid” to handle relationship.
Let’s first create the table so we can store the records and then we will query it.
1: CREATE TABLE .
2: (
3: [int] IDENTITY(1,1) NOT NULL PRIMARY KEY,
4: [varchar](50) NOT NULL,
5: [int] NULL,
6: NULL
7: )
8:
9: GO
Next step is to populate the table with sample records
1: INSERT INTO dbo.EmployeeNew(Name, ManagerId, OrgNode)
2: VALUES
3: ('CEO', NULL, hierarchyid::Parse('/')),
4: ('Sales Manager', 1, hierarchyid::Parse('/1/')),
5: ('Accounts Manager', 1, hierarchyid::Parse('/2/')),
6: ('Senior Sales Person 1', 2, hierarchyid::Parse('/1/1/')),
7: ('Senior Sales Person 2', 2, hierarchyid::Parse('/1/2/')),
8: ('Senior Accounts Person 1', 3, hierarchyid::Parse('/2/1/')),
9: ('Senior Accounts Person 2', 3, hierarchyid::Parse('/2/2/')),
10: ('Junior Sales Person 1', 4, hierarchyid::Parse('/1/1/1/')),
11: ('Junior Sales Person 2', 5, hierarchyid::Parse('/1/2/1/')),
12: ('Junior Accounts Person 1', 6, hierarchyid::Parse('/2/1/1/')),
13: ('Junior Accounts Person 2', 7, hierarchyid::Parse('/2/2/1/'))
Time to test the table:
1: --display all records
2: select orgnode.ToString() AS OrgNodeString,orgnode.GetLevel() Level,*
3: from dbo.EmployeeNew
4:
5:
6: --getting root
7: DECLARE @root hierarchyid = hierarchyid::GetRoot();
8: SELECT @root, @root.ToString()
9:
10: --employee we are interested in
11: DECLARE @node hierarchyid
12: SELECT @node = OrgNode FROM dbo.EmployeeNew
13: WHERE Name = 'Sales Manager';
14: SELECT @node, @node.ToString()
15:
16: --get ancestors
17: SELECT * FROM dbo.EmployeeNew WHERE @node.IsDescendantOf(orgnode) = 1;
18:
19: --get descendants
20: SELECT * FROM dbo.EmployeeNew WHERE OrgNode.IsDescendantOf(@node) = 1;
21:
22: --get by level (junior employees)
23: SELECT * FROM dbo.EmployeeNew
24: WHERE OrgNode.GetLevel() = 3;
And the query execution plan for finding descendants looks like this (Note: query execution plan for finding ancestors is similar):
That query execution plan is sweet and simple, eh. But, don’t let it mislead you. Under the hood there’s still lots going on. Bulk of the work is getting performed by the managed functions like IsDescendantOf, GetLevel etc.
HierarchyId definitely reduces the lines of code required to query the hierarchical or recursive data.
Conclusion
Use hierarchyid if you have newer SQL Server like 2008 and onwards. It’s designed just to tackle hierarchical scenarios.
Use CTE if you have older SQL Server like 2005. However, CTE is a generic query pattern which means it can be used even when we don’t have hierarchical data.