K Most Frequent Strings

  • Find the k most frequent string in an array
  • Return them sorted by desc freq
  • If 2 strings have same freq sort them in lexicographical order

The point:

  • Use a max-heap

Complexity :

Time Space
O(n + klog(n)) O(n)
O(n + u + klog(u)) O(n)
  • O(n + klog(n)) because it takes O(n) to count freq. We pop off the heap k times => O(logn(n))
  • More precisely : O(n + u + klog(u)) : O(n) to count freq, O(u) to build a heap with u unique elements, klog(u) for extractions
  • Space : O(n) because hashmap and heap store at most n pairs

About Rust :

  • In Rust, a BinaryHeap is max-heap by default (while in Python a heap is min-heap by default)
  • Ord implies PartialOrd, so we must implement PartialOrd if we implement Ord
  • Ord, total order, means that all pairs of elements are comparable with each other, and that the comparison is consistent
  • PartialOrd : Not all elements are necessarily comparable. Example : Floating numbers (f32, f64) with NaN. Result is Some(Ordering) or None
  • In struct Pair, if there is a &str we MUST confirm to the compiler that the reference will live as long as the Pair => lifetime specifier
  • YES : tested on the Rust Playground

First implementation with Strings

use std::collections::{BinaryHeap, HashMap}; // In Rust, BinaryHeap is a max-heap by default
use std::cmp::Ordering;

// Define a pair struct to hold the string and its frequency
#[derive(Eq, PartialEq)]    // BinaryHeap => Eq, PartialEq, Ord, PartialOrd must be implemented
                            // Here leverage Eq and PartialEq which compare all fields
struct Pair {
    string: String,
    freq: usize,
}

// BinaryHeap => we must explain how to compare 2 objects
// Implement custom ordering for Pair
// In Rust, Ord implies PartialOrd, so we must implement PartialOrd if we implement Ord.
// Total order means that all pairs of elements are comparable with each other, and that the comparison is consistent.
impl Ord for Pair {
    fn cmp(&self, other: &Self) -> Ordering {        // Ordering::Less Ordering::Equal or Ordering::Greater
        // Prioritize lexicographical order for strings with equal frequencies
        if self.freq == other.freq {
            // Reverse the lexicographical order to match Python's heapq behavior
            // (min-heap via __lt__)
            other.string.cmp(&self.string) // rather than self.string.cmp(&other.string) 
        } else {
            // Primary comparison: frequency (ascending order)
            self.freq.cmp(&other.freq)
        }
    }
}

// Implement PartialOrd in terms of Ord
// PartialOrd : Not all elements are necessarily comparable.
// Example : Floating numbers (f32, f64) with NaN
impl PartialOrd for Pair {
    fn partial_cmp(&self, other: &Self) -> Option<Ordering> {
        Some(self.cmp(other))
    }
}

// Returns the k most frequent strings using a max-heap
fn k_most_frequent_strings_max_heap(strs: &[String], k: usize) -> Vec<String> {
    // Count the frequency of each string using a HashMap
    let mut freqs: HashMap<&String, usize> = HashMap::new();
    for s in strs {
        *freqs.entry(s).or_insert(0) += 1;
    }

    // Build the heap with custom Pair items
    let mut max_heap: BinaryHeap<Pair> = freqs
        .into_iter()
        .map(|(s, freq)| Pair {
            string: s.clone(),
            freq,
        })
        .collect();

    // Extract the top k elements
    let mut result = Vec::new();
    for _ in 0..k {
        if let Some(pair) = max_heap.pop() {
            result.push(pair.string);
        }
    }

    result
}


fn main(){     // no main() if this code runs in a Jupyter cell 
    let strs = vec!["go".to_string(),
                    "coding".to_string(), 
                    "byte".to_string(), 
                    "byte".to_string(), 
                    "go".to_string(), 
                    "interview".to_string(), 
                    "go".to_string()];

    println!("{:?}", k_most_frequent_strings_max_heap(&strs, 2));  // ["go", "byte"]
} // end of local scope OR end of main()       

Max-Heap : Use &str but does NOT work

  • Does not work (yet)
  • In struct Pair, if there is a &str we MUST confirm to the compiler that the reference will live as long as the Pair
    • This is done with ```rust struct Pair<’a> { string: &’a str, freq: usize, }
* Once the lifetime is specified in the ``Pair`` it must be propagated to the rest of the code
    * See compiler errors


```rust
use std::collections::{BinaryHeap, HashMap};
use std::cmp::Ordering;

// Define a pair struct to hold the string slice and its frequency
#[derive(Eq, PartialEq)]
struct Pair {
    string: &str,
    freq: usize,
}

// Implement custom ordering for Pair
impl Ord for Pair{
    // Will be call as A.cmp(B) where A and B are 2 pairs
    fn cmp(&self, other: &Self) -> Ordering {
        // Prioritize lexicographical order for strings with equal frequencies
        if self.freq == other.freq {
            // Reverse lex order to match Python's min-heap behavior
            other.string.cmp(self.string)
        } else {
            self.freq.cmp(&other.freq)
        }
    }
}

// Implement PartialOrd in terms of Ord
impl PartialOrd for Pair {
    fn partial_cmp(&self, other: &Self) -> Option<Ordering> {
        Some(self.cmp(other))
    }
}

// Returns the k most frequent strings using a max-heap
fn k_most_frequent_strings_max_heap<'a>(strs: &[&str], k: usize) -> Vec<&str> {
    // Count the frequency of each string using a HashMap
    let mut freqs: HashMap<&str, usize> = HashMap::new();
    for &s in strs {
        *freqs.entry(s).or_insert(0) += 1;
    }

    // Build the heap with custom Pair items
    let mut max_heap: BinaryHeap<Pair> = freqs
        .into_iter()
        .map(|(s, freq)| Pair { string: s, freq })
        .collect();

    // Extract the top k elements
    let mut result = Vec::new();
    for _ in 0..k {
        if let Some(pair) = max_heap.pop() {
            result.push(pair.string);
        }
    }

    result
}


fn main(){     // no main() if this code runs in a Jupyter cell 
    let strs = vec!["go", "coding", "byte", "byte", "go", "interview", "go"];
    println!("{:?}", k_most_frequent_strings_max_heap(&strs, 2));  // 
} // end of local scope OR end of main()       
            

Max-Heap : Use &str (with lifetime specifier)

  • Preferred solution?
use std::collections::{BinaryHeap, HashMap};
use std::cmp::Ordering;

// Define a pair struct to hold the string slice and its frequency
#[derive(Eq, PartialEq)]
struct Pair<'a> {
    string: &'a str,
    freq: usize,
}

// Implement custom ordering for Pair
impl<'a> Ord for Pair<'a> {
    fn cmp(&self, other: &Self) -> Ordering {
        // Prioritize lexicographical order for strings with equal frequencies
        if self.freq == other.freq {
            // Reverse lex order to match Python's min-heap behavior
            other.string.cmp(self.string)
        } else {
            self.freq.cmp(&other.freq)
        }
    }
}

// Implement PartialOrd in terms of Ord
impl<'a> PartialOrd for Pair<'a> {
    fn partial_cmp(&self, other: &Self) -> Option<Ordering> {
        Some(self.cmp(other))
    }
}

// Returns the k most frequent strings using a max-heap
fn k_most_frequent_strings_max_heap<'a>(strs: &'a [&'a str], k: usize) -> Vec<&'a str> {
    
    // Count the frequency of each string using a HashMap
    let mut freqs: HashMap<&'a str, usize> = HashMap::new();
    for &s in strs {
        *freqs.entry(s).or_insert(0) += 1;
    }

    // Build the heap with custom Pair items
    let mut max_heap: BinaryHeap<Pair> = freqs
        .into_iter()
        .map(|(s, freq)| Pair { string: s, freq })
        .collect();

    // Extract the top k elements
    let mut result = Vec::new();
    for _ in 0..k {
        if let Some(pair) = max_heap.pop() {
            result.push(pair.string);
        }
    }

    result
}


fn main(){     // no main() if this code runs in a Jupyter cell 
    let strs = vec!["go", "coding", "byte", "byte", "go", "interview", "go"];

    println!("{:?}", k_most_frequent_strings_max_heap(&strs, 2));  // ["go", "byte"]
} // end of local scope OR end of main()       
2
4
0
7

Min-Heap

The point:

  • Use a min-heap to save heap space (store k and not n strings)

Complexity :

Time Space
O(n log(k)) O(n)
  • O(n log(k)) because it takes O(n) to count freq. Populate the heap with n words. Each op is O(log(k)). Extract/pop k strings in O(klog(k)). Reverse the array in O(k). O(n) + O(nlog(k)) + O(klog(k))+ O(k) = O(nlog(k))
  • Space : O(k) because hashmap and heap store at most n pairs
use std::collections::{BinaryHeap, HashMap};
use std::cmp::Ordering;

// Define a pair struct to hold the string slice and its frequency
#[derive(Eq, PartialEq)]
struct Pair<'a> {
    str_val: &'a str,
    freq: usize,
}

// Implement custom ordering for Pair
impl<'a> Ord for Pair<'a> {
    fn cmp(&self, other: &Self) -> Ordering {
        // Prioritize lexicographical order for strings with equal frequencies
        if self.freq == other.freq {
            self.str_val.cmp(other.str_val) // rather than other.string.cmp(&self.string)
        } else {
            other.freq.cmp(&self.freq) // rather than self.freq.cmp(&other.freq)
        }
    }
}

// Implement PartialOrd in terms of Ord
impl<'a> PartialOrd for Pair<'a> {
    fn partial_cmp(&self, other: &Self) -> Option<Ordering> {
        Some(self.cmp(other))
    }
}

// Returns the k most frequent strings using a max-heap
fn k_most_frequent_strings_min_heap<'a>(strs: &[&'a str], k: usize) -> Vec<&'a str> {
    
    // Count frequency of each string
    let mut freqs: HashMap<&str, usize> = HashMap::new();
    for s in strs {
        *freqs.entry(*s).or_insert(0) += 1;
    }

    let mut min_heap: BinaryHeap<Pair> = BinaryHeap::new();

    // Build the min-heap with top k frequent strings
    for (&s, &f) in freqs.iter() {
        min_heap.push(Pair { str_val: s, freq: f });
        if min_heap.len() > k {
            min_heap.pop();
        }
    }

    // Extract the top k elements
    let mut res = Vec::with_capacity(k);
    while let Some(p) = min_heap.pop() {
        res.push(p.str_val);
    }
    res.reverse(); // in place
    res
}

fn main(){     // no main() if this code runs in a Jupyter cell 
    let strs = vec!["go", "coding", "byte", "byte", "go", "interview", "go"];

    println!("{:?}", k_most_frequent_strings_min_heap(&strs, 2));  // ["go", "byte"]
} // end of local scope OR end of main()       

Back to top

Published on: Jun 22 2025 at 09:00 AM | Last updated: Jun 22 2025 at 09:00 AM

Copyright © 1964-2025 - 40tude