Introduction to Hierarchical Data Structure
Last Updated :
23 Aug, 2023
We have discussed Overview of Array, Linked List, Queue and Stack. In this article following Data Structures are discussed.
5. Binary Tree
6. Binary Search Tree
7. Binary Heap
8. Hashing
Binary Tree
Unlike Arrays, Linked Lists, Stack, and queues, which are linear data structures, trees are hierarchical data structures.
A binary tree is a tree data structure in which each node has at most two children, which are referred to as the left child and the right child. It is implemented mainly using Links.
Binary Tree Representation: A tree is represented by a pointer to the topmost node in the tree. If the tree is empty, then the value of the root is NULL. A Binary Tree node contains the following parts.
1. Data
2. Pointer to left child
3. Pointer to the right child
A Binary Tree can be traversed in two ways:
Depth First Traversal: Inorder (Left-Root-Right), Preorder (Root-Left-Right), and Postorder (Left-Right-Root)
Breadth-First Traversal: Level Order Traversal
Binary Tree Properties:
The maximum number of nodes at level ‘l’ = 2l.
Maximum number of nodes = 2h + 1 – 1.
Here h is height of a tree. Height is considered
as the maximum number of edges on a path from root to leaf.
Minimum possible height = ceil(Log2(n+1)) - 1
In Binary tree, number of leaf nodes is always one
more than nodes with two children.
Time Complexity of Tree Traversal: O(n)
Basic Operation On Binary Tree:
- Inserting an element.
- Removing an element.
- Searching for an element.
- Traversing an element.
Auxiliary Operation On Binary Tree:
- Finding the height of the tree.
- Find the level of the tree.
- Finding the size of the entire tree.
Applications of Binary Tree:
- Huffman coding trees are used in data compression algorithms.
- Priority Queue is another application of binary tree that is used for searching maximum or minimum in O(logn) time complexity.
- In compilers, Expression Trees are used which is an application of binary tree.
Binary Tree Traversals:
- Preorder Traversal: Here, the traversal is : root – left child – right child. It means that the root node is traversed first then its left child and finally the right child.
- Inorder Traversal: Here, the traversal is : left child – root – right child. It means that the left child is traversed first then its root node and finally the right child.
- Postorder Traversal: Here, the traversal is : left child – right child – root. It means that the left child is traversed first then the right child and finally its root node.
Examples: One reason to use binary trees or trees, in general, is for the things that form a hierarchy. They are useful in File structures where each file is located in a particular directory and there is a specific hierarchy associated with files and directories. Another example where Trees are useful is storing hierarchical objects like JavaScript Document Object Model considers HTML page as a tree with nesting of tags as parent-child relations.
Binary Search Tree
Binary Search Tree (BST) is a tree whose main function is to search a specific element.
Binary Search Tree is a Binary Tree with the following additional properties:
1. The left subtree of a node contains only nodes with keys less than the node’s key.
2. The right subtree of a node contains only nodes with keys greater than the node’s key.
3. The left and right subtree each must also be a binary search tree.
Binary Search Tree Declaration
struct BinarySearchTree{
int data;
struct BinarySearchTree* left;
struct BinarySearchTree* right;
};
Since it is a Binary Tree, its declaration is similar to the Binary Tree.
Primary BST Operations:
- Finding minimum or maximum element.
- Deleting a particular element from the tree.
- Inserting a particular element in the tree.
Auxiliary BST Operations:
- Finding kth smallest element.
- To identify whether the given binary tree is a BST or not.
Time Complexities:
Search : O(h)
Insertion : O(h)
Deletion : O(h)
Extra Space : O(n) for pointers
h: Height of BST
n: Number of nodes in BST
If Binary Search Tree is Height Balanced,
then h = O(Log n)
Self-Balancing BSTs such as AVL Tree, Red-Black
Tree and Splay Tree make sure that height of BST
remains O(Log n)
BST provides moderate access/search (quicker than Linked List and slower than arrays).
BST provides moderate insertion/deletion (quicker than Arrays and slower than Linked Lists).
Examples: Its main use is in search applications where data is constantly entering/leaving and data needs to be printed in sorted order. For example in implementation in E-commerce websites where a new product is added or product goes out of stock and all products are listed in sorted order.
Binary Heap
A Binary Heap is a Binary Tree with the following properties.
1) It’s a complete tree (All levels are completely filled except possibly the last level and the last level has all keys as left as possible). This property of Binary Heap makes them suitable to be stored in an array.
2) A Binary Heap is either Min Heap or Max Heap. In a Min Binary Heap, the key at the root must be minimum among all keys present in Binary Heap. The same property must be recursively true for all nodes in Binary Tree. Max Binary Heap is similar to Min Heap. It is mainly implemented using an array.
Get Minimum in Min Heap: O(1) [Or Get Max in Max Heap]
Extract Minimum Min Heap: O(Log n) [Or Extract Max in Max Heap]
Decrease Key in Min Heap: O(Log n) [Or Decrease Key in Max Heap]
Insert: O(Log n)
Delete: O(Log n)
Here is the algorithm to implement a binary heap:
- Create a binary heap class that has a private array for storing the elements of the heap and a private method for maintaining the heap property.
- The class should have a constructor method for initializing the heap and a method for adding a new element to the heap. The add method should start by inserting the new element at the end of the array and then compare it with its parent node. If the value of the new element is larger (for a max-heap) or smaller (for a min-heap) than the value of its parent, then the two elements should be swapped. This process should be repeated until the heap property is satisfied.
- The class should also have a method for removing the root element from the heap. The remove method should start by replacing the root element with the last element in the array and then compare it with its children. If the value of the root element is smaller (for a max-heap) or larger (for a min-heap) than the value of one of its children, then the two elements should be swapped. This process should be repeated until the heap property is satisfied.
- The class should also have a method for checking if the heap is empty.
- To implement the binary heap efficiently, it is recommended to use an array-based implementation where the parent, left child, and right child of a node at index i are stored at indices (i-1)/2, 2i+1, and 2i+2 respectively.
Example: Used in implementing efficient priority queues, which in turn are used for scheduling processes in operating systems. Priority Queues are also used in Dijkstra’s and Prim’s graph algorithms.
The Heap data structure can be used to efficiently find the k smallest (or largest) elements in an array, merging k sorted arrays, a median of a stream, etc.
Heap is a special data structure and it cannot be used for searching a particular element.
Hashing: Hashing is a popular technique for storing and retrieving data as fast as possible. The main reason behind using hashing is that it gives optimal results as it performs optimal searches.
Why to use Hashing? :
If you observe carefully, in a balanced binary search tree, if we try to search , insert or delete any element then the time complexity for the same is O(logn). Now there might be a situation when our applications want to do the same operations in a faster way i.e. in a more optimized way and here hashing comes into play. In hashing, all the above operations can be performed in O(1) i.e. constant time. It is important to understand that the worst case time complexity for hashing remains O(n) but the average case time complexity is O(1).
A hash function is a function that takes an input (or “message”) and returns a fixed-size string of characters, which is typically a “digest” that is unique to the unique values of the input.
A good hash function should have the following properties:
- Deterministic: The same input will always produce the same hash value.
- Efficiently computable: The hash function should be easy to compute, even for large inputs.
- Uniform distribution: The hash values should be distributed uniformly across the range of possible values.
Here is an example algorithm for implementing a simple hash function:
C++
#include <iostream>
#include <string>
class HashFunction {
public :
static int hash_function(std::string input_string)
{
int hash_value = 0;
for ( int i = 0; i < input_string.length(); i++) {
hash_value += static_cast < int >(input_string[i]);
}
return hash_value % 101;
}
};
int main()
{
std::string input_string = "Hello, World!" ;
int hash_value = HashFunction::hash_function(input_string);
std::cout << hash_value << std::endl;
return 0;
}
|
Java
public class HashFunction {
public static int hash_function(String input_string) {
int hash_value = 0 ;
for ( int i = 0 ; i < input_string.length(); i++) {
hash_value += input_string.charAt(i);
}
return hash_value % 101 ;
}
public static void main(String[] args) {
String input_string = "Hello, World!" ;
int hash_value = hash_function(input_string);
System.out.println(hash_value);
}
}
|
Python
def hash_function(input_string):
hash_value = 0
for char in input_string:
hash_value + = ord (char)
return hash_value % 101
S = "Hello, World!" ;
print (hash_function(S))
|
C#
using System;
public class HashFunction
{
public static int HashFunction( string inputString)
{
int hashValue = 0;
for ( int i = 0; i < inputString.Length; i++)
{
hashValue += ( int ) inputString[i];
}
return hashValue % 101;
}
public static void Main( string [] args)
{
string inputString = "Hello, World!" ;
int hashValue = HashFunction(inputString);
Console.WriteLine(hashValue);
}
}
|
Javascript
function hash_function(input_string) {
let hash_value = 0;
for (let i = 0; i < input_string.length; i++) {
hash_value += input_string.charCodeAt(i);
}
return hash_value % 101;
}
let str= "Hello, World!" ;
let hash_value= hash_function(str);
document.write(hash_value);
|
Irreversible: It should be difficult to generate the original input given only the hash value.
Hash Function: A function that converts a given big phone number to a small practical integer value. The mapped integer value is used as an index in hash table. So, in simple terms we can say that a hash function is used to transform a given key into a specific slot index. Its main job is to map each and every possible key into a unique slot index. If every key is mapped into a unique slot index, then the hash function is known as a perfect hash function. It is very difficult to create a perfect hash function but our job as a programmer is to create such a hash function with the help of which the number of collisions are as few as possible. Collision is discussed ahead.
A good hash function should have following properties:
1) Efficiently computable.
2) Should uniformly distribute the keys (Each table position equally likely for each key) . 3)Should minimize collisions 4)Should have a high load factor(number of items in table divided by size of the table).
For example for phone numbers a bad hash function is to take first three digits. A better function is consider last three digits. Please note that this may not be the best hash function. There may be better ways.
Hash Table: An array that stores pointers to records corresponding to a given phone number. An entry in hash table is NIL if no existing phone number has hash function value equal to the index for the entry. In simple terms, we can say that hash table is a generalization of array. Hash table gives the functionality in which a collection of data is stored in such a way that it is easy to find those items later if required. This makes searching of an element very efficient.
Collision Handling: Since a hash function gets us a small number for a key which is a big integer or string, there is the possibility that two keys result in the same value. The situation where a newly inserted key maps to an already occupied slot in the hash table is called collision and must be handled using some collision handling technique. Following are the ways to handle collisions:
Chaining: The idea is to make each cell of the hash table point to a linked list of records that have the same hash function value. Chaining is simple but requires additional memory outside the table.
Open Addressing: In open addressing, all elements are stored in the hash table itself. Each table entry contains either a record or NIL. When searching for an element, we one by one examine table slots until the desired element is found or it is clear that the element is not in the table.
Space : O(n)
Search : O(1) [Average] O(n) [Worst case]
Insertion : O(1) [Average] O(n) [Worst Case]
Deletion : O(1) [Average] O(n) [Worst Case]
Hashing seems better than BST for all the operations. But in hashing, elements are unordered and in BST elements are stored in an ordered manner. Also, BST is easy to implement but hash functions can sometimes be very complex to generate. In BST, we can also efficiently find floor and ceil of values.
Example: Hashing can be used to remove duplicates from a set of elements. Can also be used to find the frequency of all items. For example, in web browsers, we can check visited URLs using hashing. In firewalls, we can use hashing to detect spam. We need to hash IP addresses. Hashing can be used in any situation where want search() insert() and delete() in O(1) time.
Please Login to comment...