Category: dsa Date: 2026-02-24
Top K Frequent Elements System Design Discussion
Functional Requirements:
Non-functional Requirements:
We will use a distributed system architecture to handle large inputs. The architecture consists of three layers:
Technology Stack:
We will use a distributed hash table (DHT) to store the frequency counts. Each worker node will store the frequency counts for a subset of the input array.
DHT Design:
element-frequency-{worker-node-id}.{element-frequency-{worker-node-id}} -> {element-id}:{frequency}.To handle large inputs, we can scale the system horizontally by adding more worker nodes. Each worker node will process a subset of the input array and store the frequency counts in the DHT.
Scaling Strategy:
The following are potential bottlenecks in the system:
The following are trade-offs in the system design:
The first principle of system design is to “Keep it Simple, Stupid” (KISS). In this case, we can use a simple algorithm to find the top K frequent elements:
However, this algorithm may not be efficient for large inputs. To improve performance, we can use a distributed system architecture and a distributed hash table (DHT) to store the frequency counts.
Code Example:
// Define a class to represent a frequency count
case class FrequencyCount(elementId: Int, frequency: Int)
// Define a class to represent the Top K Frequent Elements result
case class TopKFrequentElements(k: Int, result: List[FrequencyCount])
// Define a function to count the frequency of each element
def countFrequency(inputArray: Array[Int]): Map[Int, Int] = {
inputArray.groupBy(identity).mapValues(_.size)
}
// Define a function to find the top K frequent elements
def findTopKFrequentElements(inputArray: Array[Int], k: Int): TopKFrequentElements = {
val frequencyCounts = countFrequency(inputArray)
val sortedFrequencyCounts = frequencyCounts.toList.sortBy(_._2)(Ordering.Int.reverse)
val topKFrequentElements = sortedFrequencyCounts.take(k).map { case (elementId, frequency) => FrequencyCount(elementId, frequency) }
TopKFrequentElements(k, topKFrequentElements)
}
Note that this is a simplified example and may not be suitable for production use. A more robust solution would involve using a distributed system architecture and a distributed hash table (DHT) to store the frequency counts.