Friday, 13 March 2015

Trees, Hierarchical queries in PL/SQL

My intro to simple hierarchical/recursive queries

I am sure we all come across a problem like I did the other day having to implement code based on tree like tables. Structure as simple as:
SELECT project_id,
       project_code,
       project_parent_id
FROM R_PROJECTS
Idea here is to note a few queries I used that proven helpful so far. 

PL/SQL functions that are in play are START WITH, CONNECT_BY, CONNECT_BY_ROOT, SYS_CONNECT_BY_PATH. 

By no means will I claim that I am guru of these features but with a bit of play we all come to something useful at the end and this is the whole point why reinvent when it is already out there.

Of course I could have opted for numerous loops and classic approach but I said now that I do have some time to get into it lets have a look is there an easier way to get in and around the tree structure queries.

Of course ultimate problem is getting all grandparents and their grandchildren across multiple levels with the least acrobatics possible. 

First what is self explanatory is getting all members and their parents plus showing a level they currently are on. 
SELECT project_id,
       project_code,
       project_parent_id,
       level t_level
FROM   R_PROJECTS 
START WITH project_parent_id IS NULL
CONNECT BY PRIOR project_id = project_parent_id;
What we get as result is simply all records from a table "joined" with their parent records plus showing a level of child record. So far so good. 

To get hold of maximum level
SELECT MAX(LEVEL)
FROM PROJECT_HIERARCHY_V
START WITH project_parent_id is null
CONNECT BY PRIOR project_id = project_parent_id;
 For the purpose of easier reading I created a view 
PROJECT_HIERARCHY_V as
SELECT project_id,
       project_code,
       project_parent_id,
       level t_level
FROM   R_PROJECTS 
START WITH project_parent_id IS NULL
CONNECT BY PRIOR project_id = project_parent_id;
For example to get to the number of children each node has counting in all its grandchildren we can use the query above. 

Then our query would be something along
select project_id, ( count(*) -1 ) has_Child_Count
from ( select connect_by_root(project_id) project_id
       from  
PROJECT_HIERARCHY_V

       connect by project_parent_id = prior project_id
     )
group by project_id
Keeping in mind that query contains a parents/grandparents records as well hence Count(*) - 1. CONNECTED_BY_ROOT gives this wonderful ability to keep track of your parents info.

Another great and useful example could be to get path to a child for given project_id:
select substr ( SYS_CONNECT_BY_PATH ( project_id, ' <- ' ), 5 ) req_path
 from PROJECT_HIERARCHY_V
 where project_id = 179039 and connect_by_root ( project_parent_id ) is null
 connect by prior project_id = project_parent_id
As Oracle documentation says SYS_CONNECT_BY_PATH is valid only in hierarchical queries. It returns the path of a column value from root to node, with column values separated by char for each row returned by CONNECT BY condition. Resulting in something like 
410125 <- 360147 <- 179039
Now getting to the core of my initial problem. Some of above queries I used to get all available children under a certain parent ID:
select project_id,
         --root con_over_project,
         IS_PARENT_IND
 from
        (SELECT project_id
              ,CONNECT_BY_ROOT project_id ROOT,                       
              -- simple function that queries table to check if project_paretn_id is null or not                                        CHECK_IS_PARENT_PROJECT(project_id) IS_PARENT_IND
        FROM PROJECT_HIERARCHY_V
        START WITH project_id in (                                   --this query retuns all projects and their leafs of given project_id which is used in another recursive query to get all nodes of these
                 SELECT project_id
                  FROM PROJECT_HIERARCHY_V
                  START WITH project_id = YOUR_PROJECT_ID
                   CONNECT BY PRIOR project_id = project_parent_id)
                CONNECT BY PRIOR project_id = project_parent_id) tree
      /* -- filters only ones we need to calculate figures for excluding Grand parents/parents. By removing a comment you get only leafs that are children
 
where IS_PARENT_IND = 'N'
   and project_id != root*/
   group by project_id,
         --root,
           IS_PARENT_IND
A bit more complex would be to get sums of each individual child rolled up to a parent. Similarly if you wanted to count how many children/ grandchildren there are under a certain root in more technical way than shown previously. I noticed this one online and seems to do the trick:
-- for the count of subprojects
select project_id , tot_children, project_parent_id
from (select
           project_id,
           level lev,
           project_parent_id,
           count(project_id) over(partition by connect_by_root (project_id)
                           order by level desc
                           rows between unbounded preceding
                                and     1 preceding
                               ) tot_children
      from PROJECT_HIERARCHY_V
    connect by prior project_id = project_parent_id)
    where lev=1;
Of course key being the OVER( PARTIOTION BY ....) part where by ordering them we actually get rid of each parent record so results only contains children leafs.

Excellent article and more fine details how some of mentioned functions work were found and inspired by Lucas's blog

Well this was all for now and back to work for me.

Cheers,
SLino

No comments:

Post a Comment