Open In App

Find Median from Running Data Stream

Last Updated : 03 Oct, 2023
Improve
Improve
Like Article
Like
Save
Share
Report

Given that integers are read from a data stream. Find the median of elements read so far in an efficient way. 

There are two cases for median on the basis of data set size.

  • If the data set has an odd number then the middle one will be consider as median.
  • If the data set has an even number then there is no distinct middle value and the median will be the arithmetic mean of the two middle values.

Example:

Input Data Stream: 5, 15, 1, 3
Output: 5, 10,5, 4
Explanation:
After reading 1st element of stream – 5 -> median = 5
After reading 2nd element of stream – 5, 15 -> median = (5+15)/2 = 10
After reading 3rd element of stream – 5, 15, 1 -> median = 5
After reading 4th element of stream – 5, 15, 1, 3 -> median = (3+5)/2 = 4

Input Data Stream: 2, 2, 2, 2
Output: 2, 2, 2, 2
Explanation:
After reading 1st element of stream – 2 -> median = 2
After reading 2nd element of stream – 2, 2 -> median = (2+2)/2 = 2
After reading 3rd element of stream – 2, 2, 2 -> median = 2
After reading 4th element of stream – 2, 2, 2, 2 -> median = (2+2)/2 = 2

Recommended Practice

Find Median from Running Data Stream using Insertion Sort:

If we can sort the data as it appears, we can easily locate the median element. Insertion Sort is one such online algorithm that sorts the data appeared so far. At any instance of sorting, say after sorting i-th element, the first i elements of the array are sorted. The insertion sort doesn’t depend on future data to sort data input till that point. In other words, insertion sort considers data sorted so far while inserting the next element. This is the key part of insertion sort that makes it an online algorithm.

However, insertion sort takes O(n2) time to sort n elements. Perhaps we can use binary search on insertion sort to find the location of the next element in O(log n) time. Yet, we can’t do data movement in O(log n) time. No matter how efficient the implementation is, it takes polynomial time in case of insertion sort.

Below is the implementation of the above idea:

C++




// This code is contributed by Anjali Saxena
 
#include <bits/stdc++.h>
 
using namespace std;
 
// Function to find position to insert current element of
// stream using binary search
int binarySearch(float arr[], float item, int low, int high)
{
    if (low >= high) {
        return (item > arr[low]) ? (low + 1) : low;
    }
 
    int mid = (low + high) / 2;
 
    if (item == arr[mid])
        return mid + 1;
 
    if (item > arr[mid])
        return binarySearch(arr, item, mid + 1, high);
 
    return binarySearch(arr, item, low, mid - 1);
}
 
// Function to print median of stream of integers
void printMedian(float arr[], int n)
{
    int i, j, pos;
    float num;
    int count = 1;
 
    cout << "Median after reading 1"
         << " element is " << arr[0] << "\n";
 
    for (i = 1; i < n; i++) {
        float median;
        j = i - 1;
        num = arr[i];
 
        // find position to insert current element in sorted
        // part of array
        pos = binarySearch(arr, num, 0, j);
 
        // move elements to right to create space to insert
        // the current element
        while (j >= pos) {
            arr[j + 1] = arr[j];
            j--;
        }
 
        arr[j + 1] = num;
 
        // increment count of sorted elements in array
        count++;
 
        // if odd number of integers are read from stream
        // then middle element in sorted order is median
        // else average of middle elements is median
        if (count % 2 != 0) {
            median = arr[count / 2];
        }
        else {
            median = (arr[(count / 2) - 1] + arr[count / 2])
                     / 2;
        }
 
        cout << "Median after reading " << i + 1
             << " elements is " << median << "\n";
    }
}
 
// Driver Code
int main()
{
    float arr[] = { 5, 15, 1, 3, 2, 8, 7, 9, 10, 6, 11, 4 };
    int n = sizeof(arr) / sizeof(arr[0]);
 
    printMedian(arr, n);
 
    return 0;
}
 
// This code is modified by Susobhan Akhuli


Java




// Java code to implement the approach
import java.util.*;
 
class GFG {
 
    // Function to find position to insert current element
    // of stream using binary search
    static int binarySearch(float arr[], float item,
                            int low, int high)
    {
        if (low >= high) {
            return (item > arr[low]) ? (low + 1) : low;
        }
 
        int mid = (low + high) / 2;
 
        if (item == arr[mid])
            return mid + 1;
 
        if (item > arr[mid])
            return binarySearch(arr, item, mid + 1, high);
 
        return binarySearch(arr, item, low, mid - 1);
    }
 
    // Function to print median of stream of integers
    static void printMedian(float arr[], int n)
    {
        int i, j, pos;
        float num;
        int count = 1;
 
        System.out.println("Median after reading 1"
                           + " element is " + arr[0]);
 
        for (i = 1; i < n; i++) {
            float median;
            j = i - 1;
            num = arr[i];
 
            // find position to insert current element in
            // sorted part of array
            pos = binarySearch(arr, num, 0, j);
 
            // move elements to right to create space to
            // insert the current element
            while (j >= pos) {
                arr[j + 1] = arr[j];
                j--;
            }
 
            arr[j + 1] = num;
 
            // increment count of sorted elements in array
            count++;
 
            // if odd number of integers are read from
            // stream then middle element in sorted order is
            // median else average of middle elements is
            // median
            if (count % 2 != 0) {
                median = arr[count / 2];
            }
            else {
                median = (arr[(count / 2) - 1]
                          + arr[count / 2])
                         / 2;
            }
 
            System.out.println("Median after reading "
                               + (i + 1) + " elements is "
                               + median);
        }
    }
 
    // Driver code
    public static void main(String[] args)
    {
 
        float arr[]
            = { 5, 15, 1, 3, 2, 8, 7, 9, 10, 6, 11, 4 };
        int n = arr.length;
 
        printMedian(arr, n);
    }
}
 
// This code is contributed by sanjoy_62.
 
// This code is modified by Susobhan Akhuli


Python3




# Function to find position to insert current element of
# stream using binary search
 
 
def binarySearch(arr, item, low, high):
 
    if (low >= high):
        return (low + 1) if (item > arr[low]) else low
 
    mid = (low + high) // 2
 
    if (item == arr[mid]):
        return mid + 1
 
    if (item > arr[mid]):
        return binarySearch(arr, item, mid + 1, high)
 
    return binarySearch(arr, item, low, mid - 1)
 
# Function to print median of stream of integers
 
 
def printMedian(arr, n):
 
    i, j, pos, num = 0, 0, 0, 0
    count = 1
 
    print(f"Median after reading 1 element is {arr[0]}.0")
 
    for i in range(1, n):
        median = 0
        j = i - 1
        num = arr[i]
 
        # find position to insert current element in sorted
        # part of array
        pos = binarySearch(arr, num, 0, j)
 
        # move elements to right to create space to insert
        # the current element
        while (j >= pos):
            arr[j + 1] = arr[j]
            j -= 1
 
        arr[j + 1] = num
 
        # increment count of sorted elements in array
        count += 1
 
        # if odd number of integers are read from stream
        # then middle element in sorted order is median
        # else average of middle elements is median
        if (count % 2 != 0):
            median = arr[count // 2] / 1
 
        else:
            median = (arr[(count // 2) - 1] + arr[count // 2]) / 2
 
        print(f"Median after reading {i + 1} elements is {median} ")
 
 
# Driver Code
if __name__ == "__main__":
 
    arr = [5, 15, 1, 3, 2, 8, 7, 9, 10, 6, 11, 4]
    n = len(arr)
 
    printMedian(arr, n)
 
# This code is contributed by rakeshsahni
 
# This code is modified by Susobhan Akhuli


C#




// C# program for the above approach
using System;
class GFG{
 
  // Function to find position to insert current element of
  // stream using binary search
  static int binarySearch(float[] arr, float item, int low, int high)
  {
    if (low >= high) {
      return (item > arr[low]) ? (low + 1) : low;
    }
 
    int mid = (low + high) / 2;
 
    if (item == arr[mid])
      return mid + 1;
 
    if (item > arr[mid])
      return binarySearch(arr, item, mid + 1, high);
 
    return binarySearch(arr, item, low, mid - 1);
  }
 
  // Function to print median of stream of integers
  static void printMedian(float[] arr, int n)
  {
    int i, j, pos;
    float num;
    int count = 1;
 
    Console.WriteLine( "Median after reading 1"
                       + " element is " + arr[0]);
 
    for (i = 1; i < n; i++) {
      float median;
      j = i - 1;
      num = arr[i];
 
      // find position to insert current element in sorted
      // part of array
      pos = binarySearch(arr, num, 0, j);
 
      // move elements to right to create space to insert
      // the current element
      while (j >= pos) {
        arr[j + 1] = arr[j];
        j--;
      }
 
      arr[j + 1] = num;
 
      // increment count of sorted elements in array
      count++;
 
      // if odd number of integers are read from stream
      // then middle element in sorted order is median
      // else average of middle elements is median
      if (count % 2 != 0) {
        median = arr[count / 2];
      }
      else {
        median = (arr[(count / 2) - 1] + arr[count / 2])
          / 2;
      }
 
      Console.WriteLine( "Median after reading " + (i + 1)
                         + " elements is " + median );
    }
  }
 
 
// Driver Code
public static void Main(String[] args)
{
    float[] arr = { 5, 15, 1, 3, 2, 8, 7, 9, 10, 6, 11, 4 };
    int n = arr.Length;
 
    printMedian(arr, n);
}
}
 
// This code is contributed by code_hunt.
 
// This code is modified by Susobhan Akhuli


Javascript




<script>
    // JavaScript implementation for the above approach
 
    // Function to find position to insert current element of
    // stream using binary search
    const binarySearch = (arr, item, low, high) => {
        if (low >= high) {
            return (item > arr[low]) ? (low + 1) : low;
        }
 
        let mid = parseInt((low + high) / 2);
 
        if (item == arr[mid])
            return mid + 1;
 
        if (item > arr[mid])
            return binarySearch(arr, item, mid + 1, high);
 
        return binarySearch(arr, item, low, mid - 1);
    }
 
    // Function to print median of stream of integers
    const printMedian = (arr, n) => {
        let i, j, pos, num;
        let count = 1;
 
        document.write(`Median after reading 1 element is ${arr[0]}<br/>`);
 
        for (i = 1; i < n; i++) {
            let median;
            j = i - 1;
            num = arr[i];
 
            // find position to insert current element in sorted
            // part of array
            pos = binarySearch(arr, num, 0, j);
 
            // move elements to right to create space to insert
            // the current element
            while (j >= pos) {
                arr[j + 1] = arr[j];
                j--;
            }
 
            arr[j + 1] = num;
 
            // increment count of sorted elements in array
            count++;
 
            // if odd number of integers are read from stream
            // then middle element in sorted order is median
            // else average of middle elements is median
            if (count % 2 != 0) {
                median = arr[parseInt(count / 2)];
            }
            else {
                median = (arr[parseInt(count / 2) - 1] + arr[parseInt(count / 2)]) / 2;
            }
 
            document.write(`Median after reading ${i + 1} elements is ${median}<br/>`);
        }
    }
 
    // Driver Code
    let arr = [5, 15, 1, 3, 2, 8, 7, 9, 10, 6, 11, 4];
    let n = arr.length;
 
    printMedian(arr, n);
 
 // This code is contributed by rakeshsahni
 // This code is modified by Susobhan Akhuli
 
</script>


Output

Median after reading 1 element is 5
Median after reading 2 elements is 10
Median after reading 3 elements is 5
Median after reading 4 elements is 4
Median after reading 5 elements is 3
Median after re...

Time Complexity: O(n2)
Auxiliary Space: O(1)

Find Median from Running Data Stream using Augmented self-balanced binary search tree (AVL, RB, etc…)

  • At every node of BST, maintain a number of elements in the subtree rooted at that node. We can use a node as the root of a simple binary tree, whose left child is self-balancing BST with elements less than root and right child is self-balancing BST with elements greater than root. The root element always holds effective median.
  • If the left and right subtrees contain a same number of elements, the root node holds the average of left and right subtree root data. Otherwise, the root contains the same data as the root of subtree which is having more elements. After processing an incoming element, the left and right subtrees (BST) are differed atmost by 1.
  • Self-balancing BST is costly in managing the balancing factor of BST. However, they provide sorted data which we don’t need. We need median only. The next method makes use of Heaps to trace the median.

Find Median from Running Data Stream using Heaps

  • Similar to above balancing BST Method, we can use a max heap on the left side to represent elements that are less than effective median, and a min-heap on the right side to represent elements that are greater than effective median.
  • After processing an incoming element, the number of elements in heaps differs atmost by 1 element. When both heaps contain the same number of elements, we pick the average of heaps root data as effective median. When the heaps are not balanced, we select effective median from the root of the heap containing more elements.

Below is the implementation of the above approach:

C++14




// C++ code to implement the approach
 
#include <bits/stdc++.h>
using namespace std;
 
// Function to find the median of stream of data
void streamMed(int A[], int n)
{
    // Declared two max heap
    priority_queue<int> g, s;
   
    for (int i = 0; i < n; i++) {
        s.push(A[i]);
        int temp = s.top();
        s.pop();
       
        // Negation for treating it as min heap
        g.push(-1 * temp);
        if (g.size() > s.size()) {
            temp = g.top();
            g.pop();
            s.push(-1 * temp);
        }
        if (g.size() != s.size())
            cout << (double)s.top() << "\n";
        else
            cout << (double)((s.top() * 1.0
                              - g.top() * 1.0)
                             / 2)
                 << "\n";
    }
}
 
// Driver code
int main()
{
    int A[] = { 5, 15, 1, 3, 2, 8, 7, 9, 10, 6, 11, 4 };
    int N = sizeof(A) / sizeof(A[0]);
   
    // Function call
    streamMed(A, N);
    return 0;
}


Java




// Java code to implement the approach
 
import java.io.*;
import java.util.*;
 
class GFG {
   
    // Function to find the median of stream of data
    public static void streamMed(int A[], int N)
    {
        // Declaring two min heap
        PriorityQueue<Double> g = new PriorityQueue<>();
        PriorityQueue<Double> s = new PriorityQueue<>();
        for (int i = 0; i < N; i++) {
             
            // Negation for treating it as max heap
            s.add(-1.0 * A[i]);
            g.add(-1.0 * s.poll());
            if (g.size() > s.size())
                s.add(-1.0 * g.poll());
           
            if (g.size() != s.size())
                System.out.println(-1.0 * s.peek());
            else
                System.out.println((g.peek() - s.peek())
                                   / 2);
        }
    }
 
    // Driver code
    public static void main(String[] args)
    {
        int A[] = { 5, 15, 1, 3, 2, 8, 7, 9, 10, 6, 11, 4 };
        int N = A.length;
         
        // Function call
        streamMed(A, N);
    }
}


Python3




# Python code to implement the approach
 
from heapq import heappush, heappop, heapify
import math
 
# Function to find the median of stream of data
def streamMed(arr, N):
     
    # Declaring two min heap
    g = []
    s = []
    for i in range(len(arr)):
       
        # Negation for treating it as max heap
        heappush(s, -arr[i])
        heappush(g, -heappop(s))
        if len(g) > len(s):
            heappush(s, -heappop(g))
 
        if len(g) != len(s):
            print(-s[0])
        else:
            print((g[0] - s[0])/2)
 
 
# Driver code
if __name__ == '__main__':
    A = [5, 15, 1, 3, 2, 8, 7, 9, 10, 6, 11, 4]
    N = len(A)
     
    # Function call
    streamMed(A, N)


C#




// C# code to implement the approach
 
using System;
using System.Collections.Generic;
 
public class GFG {
 
  // Function to find the median of stream of data
  static void StreamMed(int[] arr, int N)
  {
    // Declaring two min heap
    SortedSet<int> g = new SortedSet<int>();
    SortedSet<int> s = new SortedSet<int>();
 
    for (int i = 0; i < N; i++) {
      // Negation for treating it as max heap
      s.Add(arr[i]);
      g.Add(s.Max);
      s.Remove(s.Max);
 
      if (g.Count > s.Count) {
        s.Add(g.Min);
        g.Remove(g.Min);
      }
 
      if (s.Count < g.Count)
        Console.WriteLine(g.Min);
      else if (s.Count > g.Count)
        Console.WriteLine(s.Max);
      else
        Console.WriteLine((g.Min + s.Max) / 2.0);
    }
  }
 
  static public void Main()
  {
 
    // Code
    int[] A = { 5, 15, 1, 3, 2, 8, 7, 9, 10, 6, 11, 4 };
    int N = A.Length;
 
    // Function call
    StreamMed(A, N);
  }
}
 
// This code is contributed by lokesh.


Javascript




//Javascript  code to implement the approach
function streamMed(arr) {
  // Declaring two min heap
  var g = [];
  var s = [];
 
  for (var i = 0; i < arr.length; i++) {
    // Negation for treating it as max heap
    s.push(-arr[i]);
    s.sort(function(a, b){ return a-b });
    g.push(-s.shift());
    g.sort(function(a, b){ return a-b });
    if (g.length > s.length) {
      s.unshift(-g.pop());
    }
 
    if (g.length != s.length) {
      console.log(-s[0]);
    } else {
      console.log((g[0] - s[0]) / 2);
    }
  }
}
 
// Driver code
var A = [5, 15, 1, 3, 2, 8, 7, 9, 10, 6, 11, 4];
streamMed(A);
//This Code is Contributed By Shivam Tiwari


Output

5
10
5
4
3
4
5
6
7
6.5
7
6.5

Time Complexity: O(n * log n), All the operations within the loop (push, pop) take O(log n) time in the worst case for a heap of size N.
Auxiliary Space: O(n)

Median of Stream of Running Integers using STL



Previous Article
Next Article

Similar Reads

Median Of Running Stream of Numbers - (using Set)
Given that integers are being read from a data stream. Find the median of all the elements read so far starting from the first integer until the last integer. This is also called Median of Running Integers. The given link already contains solution of this problem using Priority Queue. However, the following solution uses the same concept but the im
9 min read
Median of Stream of Running Integers using STL | Set 2
Given an array arr[] of size N representing integers required to be read as a data stream, the task is to calculate and print the median after reading every integer. Examples: Input: arr[] = { 5, 10, 15 } Output: 5 7.5 10 Explanation: After reading arr[0] from the data stream, the median is 5. After reading arr[1] from the data stream, the median i
6 min read
Median of Stream of Running Integers using STL
Given that integers are being read from a data stream. Find the median of all the elements read so far starting from the first integer till the last integer. This is also called the Median of Running Integers. The data stream can be any source of data, for example, a file, an array of integers, input stream etc. What is Median? Median can be define
9 min read
Find Median for each Array element by excluding the index at which Median is calculated
Given an array A[] of N integers where N is even, . the task is to generate an array of medians where the median of the array is calculated by taking the median of array A[] excluding A[i]-th element. Examples: Input N = 4, A = [2, 4, 4, 3]Output: [4, 3, 3, 4]Explanation: Median after removing A[0]: New sorted array will be [3, 4, 4]. Median = 4. M
10 min read
Expression for mean and variance in a running stream
Let we have a running stream of numbers as x1,x2,x3,...,xn. The formula for calculating mean and variance at any given point is given as : Mean = E(x) = u = 1/n ∑i=1n xiStandard Deviation = s = 1/n ∑i=1n (xi - u) 2Variance = s2 However, it would be a very slow approach if we calculate these expressions by looping through all numbers each time a new
2 min read
Mode in a stream of integers (running integers)
Given that integers are being read from a data stream. Find the mode of all the elements read so far starting from the first integer till the last integer. Mode is defined as the element which occurs the maximum time. If two or more elements have the same maximum frequency, then take the one with the last occurrence. Examples: Input: stream[] = {2,
6 min read
Facebook Data Privacy Settlement: Time is Running Out to Claim $725M
Facebook users who held an account at any time from 2007 to the end of last year can now apply for their share of the $725 million data privacy settlement. Facebook's parent company, Meta, has recently agreed to pay the sum to settle a host of data privacy-related class-action lawsuits alleging, among other things, that Facebook let third parties a
3 min read
Program for average of an array without running into overflow
Given an array arr[] of size N, the task is to find the average of the array elements without running into overflow. Examples: Input: arr[] = { INT_MAX, INT_MAX }Output:Average by Standard method: -1.0000000000Average by Efficient method: 2147483647.0000000000Explanation: The average of the two numbers by standard method is (sum / 2).Since the sum
6 min read
Shortest path between two cities without running out of fuel
Given a country with N cities connected by M bi-directional roads given in the form of 2D array roads[][], where P cities have fuel stations and (N - P) cities do not. The task is to compute the shortest path from City A to City B, such that a car with maximum fuel capacity of "capacity" can travel without running out of fuel. Note: 1 unit of fuel
10 min read
Find the median array for Binary tree
Prerequisite: Tree Traversals (Inorder, Preorder and Postorder), MedianGiven a Binary tree having integral nodes, the task is to find the median for each position in the preorder, postorder and inorder traversal of the tree. The median array is given as the array formed with the help of PreOrder, PostOrder, and Inorder traversal of a tree, such tha
11 min read