Top K Frequent Elements
Category: dsa
Date: 2026-03-20
Top K Frequent Elements System Design
Requirements
Functional Requirements:
- Input: A stream of integers (e.g., from a web service or file)
- Output: The top K frequent elements in the stream, along with their frequencies
- Real-time processing: Handle new elements as they arrive in the stream
Non-Functional Requirements:
- Scalability: Support a large volume of incoming stream data (e.g., millions of elements per second)
- Low latency: Respond to queries within a few milliseconds
- Data consistency: Ensure accurate frequency counts for all elements
High-Level Architecture
- Stream Processor: Collects and processes the incoming stream data
- Frequency Counter: Maintains a data structure to store the frequency of each element
- Query Service: Handles Top K queries from clients
Database Design
For the Frequency Counter, we’ll use a combination of in-memory data structures and a database for persistence. This design balances performance and data durability:
- In-memory hash map (e.g., Java’s
ConcurrentHashMap): Stores the frequency of each element for fast lookups and updates
- Redis or Memcached: Stores the frequency counts in a database for persistence and scalability
Scaling Strategy
To handle a large volume of incoming stream data:
- Distributed Stream Processor: Use a distributed streaming system like Apache Kafka, Apache Storm, or Flink to process the stream in parallel
- Horizontal scaling: Add more nodes to the Frequency Counter and Query Service to handle increased loads
- Caching: Use Redis or Memcached to cache frequent elements and reduce database queries
Solution using the First Principle of System Design
The first principle of system design states: “Simpllicity is the ultimate sophistication.”
Our solution is simple and effective, with a clear separation of concerns:
- Stream Processor: Focuses on collecting and processing the stream data
- Frequency Counter: Maintains the frequency counts with a simple in-memory hash map
- Query Service: Handles Top K queries with a simple database query
By following this principle, we’ve avoided unnecessary complexity and ensured a scalable, high-performance system.
Bottlenecks
Potential bottlenecks:
- Stream Processor: Overwhelmed by a high volume of incoming stream data
- Frequency Counter: Struggles to keep up with the frequency updates
- Query Service: Faces high latency due to database queries
To mitigate these bottlenecks, we’ll implement horizontal scaling, caching, and efficient data structures.
Trade-offs
Trade-offs:
- Scalability vs. Complexity: A more complex system may be necessary for extreme scalability
- Latency vs. Data Consistency: Balancing low latency with accurate frequency counts may require compromises
- Memory vs. Disk Storage: Choosing between in-memory and disk-based storage depends on the specific use case
Additional Reading