Home  > Resources  > Blog

SQL - Common Table Expressions (CTE) and HierarchyId

 
April 20, 2012 by Faheem Javed
Category: Microsoft

SQL Server 2008/R2/2012 offers hierarchyid data type that can be used for handling hierarchical data. In this blog I will try to take you down the rabbit hole and explore common table expressions and hierarchyid.

Quite often we have  to deal with hierarchical or recursive data. e.g.

Product Category (Entertainment Device)

     Product Sub-category (TV)

         Product Sub-sub-category (LCD)

              Product Sub-sub-sub-category (HD)

                      …..

I guess you get the idea.

 

Another example would be Employee – Manager relationship where every manager is an employee (but not every employee is a manager). Here’s a sample diagram:

image

So, how do we store this data in our table and then how do we query it? We have two options:

1. Common Table Expressions – CTE

This works in older versions of SQL Server 2005 and also the newer versions like 2008/R2, 2012. We use the traditional data types like int etc. to handle relationship.

Let’s first create the table so we can store the records and then we will query it.

   1:  CREATE TABLE dbo.Employee
   2:   ( 
   3:     EmployeeId int NOT NULL IDENTITY(1,1) PRIMARY KEY, 
   4:     Name varchar(50) NOT NULL, 
   5:     ManagerId int 
   6:  ) 
   7:  GO

 

Next step is to populate the table with the data (note: I am using same data that I have in my diagram)

   1:  INSERT INTO dbo.Employee (Name, ManagerId) 
   2:  VALUES 
   3:  ('CEO', NULL), 
   4:  ('Sales Manager', 1), 
   5:  ('Accounts Manager', 1), 
   6:  ('Senior Sales Person 1', 2), 
   7:  ('Senior Sales Person 2', 2), 
   8:  ('Senior Accounts Person 1', 3), 
   9:  ('Senior Accounts Person 2', 3), 
  10:  ('Junior Sales Person 1', 4), 
  11:  ('Junior Sales Person 2', 5), 
  12:  ('Junior Accounts Person 1', 6), 
  13:  ('Junior Accounts Person 2', 7) 

Now we are ready to use common table expressions to query the table.

   1:  --Finding descendants
   2:  WITH EmployeeDescendants(EmployeeId, Name, ManagerId) 
   3:  AS 
   4:  ( 
   5:  select EmployeeId, Name, ManagerId from dbo.Employee 
   6:  WHERE EmployeeId = 1 -- this is parameter
   7:  union all 
   8:  select m.EmployeeId, m.Name, m.ManagerId 
   9:  from EmployeeDescendants CTE,dbo.Employee m 
  10:  WHERE CTE.EmployeeId = m.ManagerId 
  11:  ) 
  12:  SELECT EmployeeId, Name, ManagerId 
  13:  FROM EmployeeDescendants 
  14:  ORDER BY EmployeeId
  15:   
  16:  --Finding ancestors
  17:   
  18:  WITH EmployeeAncestor(EmployeeId, Name, ManagerId) 
  19:  AS 
  20:  ( 
  21:  select EmployeeId, Name, ManagerId from dbo.Employee 
  22:  WHERE EmployeeId = 11 -- this is parameter
  23:  union all 
  24:  select m.EmployeeId, m.Name, m.ManagerId 
  25:  from EmployeeAncestor CTE,dbo.Employee m 
  26:  WHERE CTE.ManagerId = m.EmployeeId 
  27:  ) 
  28:  SELECT EmployeeId, Name, ManagerId 
  29:  FROM EmployeeAncestor 
  30:  ORDER BY EmployeeId ASC
  31:   

Common Table Expressions aren’t too bad, eh. You could easily convert the above script into a stored procedure and pass arguments to parameters.

Query Execution Plan for finding descendant nodes looks like this (Note: Plan for finding ancestors is similar):

image

Ouch, lots is going on in the query execution plan.

2. HierarchyId

This won’t work in SQL Server 2005. You must have newer versions like 2008/R2, 2012. We use the newer managed data type called “hierarchyid” to handle relationship.

Let’s first create the table so we can store the records and then we will query it.

   1:  CREATE TABLE .
   2:  ( 
   3:       [int] IDENTITY(1,1) NOT NULL PRIMARY KEY, 
   4:       [varchar](50) NOT NULL, 
   5:       [int] NULL, 
   6:        NULL
   7:  )
   8:   
   9:  GO 

Next step is to populate the table with sample records

   1:  INSERT INTO dbo.EmployeeNew(Name, ManagerId, OrgNode) 
   2:  VALUES 
   3:  ('CEO', NULL, hierarchyid::Parse('/')), 
   4:  ('Sales Manager', 1, hierarchyid::Parse('/1/')), 
   5:  ('Accounts Manager', 1, hierarchyid::Parse('/2/')), 
   6:  ('Senior Sales Person 1', 2, hierarchyid::Parse('/1/1/')), 
   7:  ('Senior Sales Person 2', 2, hierarchyid::Parse('/1/2/')), 
   8:  ('Senior Accounts Person 1', 3, hierarchyid::Parse('/2/1/')), 
   9:  ('Senior Accounts Person 2', 3, hierarchyid::Parse('/2/2/')), 
  10:  ('Junior Sales Person 1', 4, hierarchyid::Parse('/1/1/1/')), 
  11:  ('Junior Sales Person 2', 5, hierarchyid::Parse('/1/2/1/')), 
  12:  ('Junior Accounts Person 1', 6, hierarchyid::Parse('/2/1/1/')), 
  13:  ('Junior Accounts Person 2', 7, hierarchyid::Parse('/2/2/1/')) 
     

Time to test the table:

   1:  --display all records 
   2:  select orgnode.ToString() AS OrgNodeString,orgnode.GetLevel()  Level,* 
   3:  from dbo.EmployeeNew 
   4:   
   5:   
   6:  --getting root 
   7:  DECLARE @root hierarchyid = hierarchyid::GetRoot(); 
   8:  SELECT @root, @root.ToString()
   9:   
  10:  --employee we are interested in 
  11:  DECLARE @node hierarchyid 
  12:  SELECT @node = OrgNode FROM dbo.EmployeeNew 
  13:  WHERE Name = 'Sales Manager'; 
  14:  SELECT @node, @node.ToString()
  15:   
  16:  --get ancestors 
  17:  SELECT * FROM dbo.EmployeeNew WHERE @node.IsDescendantOf(orgnode) = 1;
  18:   
  19:  --get descendants 
  20:  SELECT * FROM dbo.EmployeeNew WHERE OrgNode.IsDescendantOf(@node) = 1;
  21:   
  22:  --get by level (junior employees) 
  23:  SELECT * FROM dbo.EmployeeNew 
  24:  WHERE OrgNode.GetLevel() = 3;

And the query execution plan for finding descendants looks like this (Note: query execution plan for finding ancestors is similar):

image

That query execution plan is sweet and simple, eh. But, don’t let it mislead you. Under the hood there’s still lots going on. Bulk of the work is getting performed by the managed functions like IsDescendantOf, GetLevel etc.

HierarchyId definitely reduces the lines of code required to query the hierarchical or recursive data.

 

Conclusion

Use hierarchyid if you have newer SQL Server like 2008 and onwards. It’s designed just to tackle hierarchical scenarios.

Use CTE if you have older SQL Server like 2005. However, CTE is a generic query pattern which means it can be used even when we don’t have hierarchical data.

Follow Us

Blog Categories