Happy Numbers: Python, C++, and Rust Performance Comparison

Compare Happy Numbers implementations across Python, C++, and Rust with real benchmarks. Learn optimization techniques and see how Rust achieves 80x faster performance than Python.

Happy number algorithm visualization showing the iterative process of summing squared digits

Image from Wikipedia

Happy Numbers: Python, C++, and Rust Performance Comparison

Intro

A positive integer is called happy if repeatedly replacing the number by the sum of the squares of its digits eventually leads to 1.

If the process enters a loop that never reaches 1, the number is considered unhappy.

This simple definition makes Happy Numbers a great playground to:

Practice basic algorithms
Compare multiple implementations of the same logic
Measure real performance instead of guessing

By the end of this article, you will also be ready to solve your first algorithmic puzzle on CodinGame.

A First Python Implementation

Let’s start with a straightforward approach and check whether 24 is a happy number.

def sum_of_squared_digits(n: int) -> int:
    # Compute the sum of the squares of the digits of n
    total = 0
    while n:
        digit = n % 10
        total += digit * digit
        n //= 10
    return total

seen = set()
n = 24

while n != 1 and n not in seen:
    seen.add(n)
    n = sum_of_squared_digits(n)
    print(n)

print("Happy" if n == 1 else "Unhappy")

This output looks like :

20
4
16
37
58
89
145
42
20
Unhappy

This version avoids pow() and ** and directly multiplies digit * digit, which turns out to be faster in tight loops.

A More Compact Python Version

Here I try to simplify sum_of_squared_digits()

The one liner goes like this :
- Convert n in a string
- Then read each “char” (digit) of the string
- Convert each char as an int
- Elevate the int to power of 2

def sum_of_squared_digits(n:int)->int:
    # Convert the number to a string and sum squared digits
    return sum([int(i)**2 for i in str(n)])

n_set = set()
n = 24
while (n!=1 and n not in n_set):
    n_set.add(n)
    n = sum_of_squared_digits(n)
    print(n)

print("Happy") if n==1 else print("Unhappy")

This version is easier to read but introduces string conversion, which has a measurable cost.

At this point, intuition is not enough — it’s time to measure.

Rule #1 of performance work: never assume. Always benchmark.

Benchmark 1 — Python (String-Based)

import time

k_MAX=100_000

def sum_of_squared_digits(n:int)->int:
    return sum([int(i)**2 for i in str(n)])

start_time = time.time()  # Start timer

for n in range(1, k_MAX+1):
    n_set = set()
    n_init = n
    while (n!=1 and n not in n_set):
        n_set.add(n)
        n = sum_of_squared_digits(n)

end_time = time.time()
print(f"Execution time: {end_time - start_time:.6f} seconds")

Execution time: 0.797350 seconds

Benchmark 2 — Python (Arithmetic-Based)

import time

k_MAX = 100_000

def sum_of_squared_digits2(n_in:int)->int:
    n_out = 0
    while n_in:
        digit=n_in%10
        n_out += digit*digit
        n_in=n_in//10
    n_in=n_out
    return n_out

start_time = time.time()  # Start timer

for n in range(1, k_MAX + 1):
    n_set = set()
    n_init = n
    while n != 1 and n not in n_set:
        n_set.add(n)
        n = sum_of_squared_digits2(n)

end_time = time.time()
print(f"Execution time: {end_time - start_time:.6f} seconds")

Execution time: 0.482066 seconds

Despite being longer, the arithmetic-based version is about 1.6× faster due to:

No string allocation
Fewer temporary objects
Better cache locality

C++ Benchmark

If you don’t have a C++ compiler on your host (shame on you! 😁) you can copy, past and run the code below with an online C++ compiler.
I selected C++23 and -02 optimizations and did some test with C++20 and no optimization.

#include <iostream>
#include <unordered_set>
#include <chrono>

constexpr int k_MAX = 100'000;

int sum_of_squared_digits3(int n_in) {
    int n_out = 0;
    while (n_in != 0) {
        int digit = n_in % 10;
        n_out += digit * digit;
        n_in /= 10;
    }
    return n_out;
}

int main() {
    auto start_time = std::chrono::high_resolution_clock::now();

    for (int n = 1; n <= k_MAX; ++n) {
        std::unordered_set<int> n_set;
        int n_current = n;

        while (n_current != 1 && !n_set.contains(n_current)) {
            n_set.insert(n_current);
            n_current = sum_of_squared_digits3(n_current);
        }
    }

    auto end_time = std::chrono::high_resolution_clock::now();
    std::chrono::duration<double> elapsed_seconds = end_time - start_time;
    std::cout << "Execution time: " << elapsed_seconds.count() << " seconds\n";
    return 0;
}

Execution time: ~0.08 seconds

80 ms that’s roughly 6x faster than Python, with very similar algorithmic structure.

For such a simple algorithm, the syntax is very similar across languages. So, even if Python is the only programming language you know, you should understand what is happening here. Additionally, I kept the variable names the same.
34 lines in C++ vs 24 in Python
80 ms sec in C++ vs 480 ms in Python
6 times faster

Benchmarking C++ V2

#include <iostream>
#include <unordered_set>
#include <array>
#include <chrono>

constexpr int k_MAX = 100'000;

constexpr std::array<int, 10> k_squares = [] {
    std::array<int, 10> squares{};
    for (int i = 0; i < 10; ++i) {
        squares[i] = i * i;
    }
    return squares;
}();

int sum_of_squared_digits2(int n_in) {
    int n_out = 0;
    while (n_in != 0) {
        int digit = n_in % 10;
        n_out += k_squares[digit];
        n_in /= 10;
    }
    return n_out;
}


int main() {
    auto start_time = std::chrono::high_resolution_clock::now();

    for (int n = 1; n <= k_MAX; ++n) {
        std::unordered_set<int> n_set;
        int n_current = n;

        while (n_current != 1 && !n_set.contains(n_current)) {
            n_set.insert(n_current);
            n_current = sum_of_squared_digits2(n_current);
        }
    }

    auto end_time = std::chrono::high_resolution_clock::now();
    std::chrono::duration<double> elapsed_seconds = end_time - start_time;
    std::cout << "Execution time: " << elapsed_seconds.count() << " seconds\n";
    return 0;
}

The timing is similar.

I suspect the optimizer is doing a great job. I did’nt look at assembly language but you can use Compiler Explorer to do so. It is always very interesting. Read this article to see a real case where looking to assembly language was THE solution.

Again, even on Compiler Explorer, make sure to select C++ 20 or later otherwise the n_set.contains(n_current) function call is not be available.

Benchmarking Rust (Fast & Slow Pointers)

This one use fast and slow pointers
With many others, it is available on this dedicated page
You can copy and paste it in the Rust Playground

use std::time::Instant;
use std::hint::black_box;

fn get_next_number(mut x:u32)->u32{
    let mut next_num : u32 = 0;
    while x > 0{
        let digit = x % 10;
        x /= 10;
        next_num += digit.pow(2); // add the square of the digit to the next number
    }
    next_num
}

fn happy_number(n:u32)->bool{
    let mut slow = n;
    let mut fast = n;
    loop {
        slow = get_next_number(slow);
        fast = get_next_number(get_next_number(fast));

        if fast == 1 {
            return true;
        }

        if slow == fast {
            return false;
        }
    }
}

fn main(){

    const K_MAX: u32 = 1_000_000;
    let start = Instant::now();
    for i in 1..=K_MAX{ // Avoid 0
        black_box(happy_number(i)); // Prevents the optimized compiler from simply not calling the function if its result is not used
    }
    let duration = start.elapsed();
    println!("Execution time: {} ms", duration.as_millis());
}

Execution time: 62 ms

62 ms, this is 25% faster than C++.

Benchmarking Rust (using rayon)

Then using the crate rayon the code runs in 6 ms

13 times faster than C++
80 times faster than Python
40 LOC

use rayon::prelude::*;
use std::hint::black_box;
use std::time::Instant;

fn get_next_number(mut x: u32) -> u32 {
    let mut next_num: u32 = 0;
    while x > 0 {
        let digit = x % 10;
        x /= 10;
        next_num += digit.pow(2);
    }
    next_num
}

fn happy_number(n: u32) -> bool {
    let mut slow = n;
    let mut fast = n;
    loop {
        slow = get_next_number(slow);
        fast = get_next_number(get_next_number(fast));

        if fast == 1 {
            return true;
        }

        if slow == fast {
            return false;
        }
    }
}

fn main() {
    const K_MAX: u32 = 1_000_000;
    let start = Instant::now();
    (1..=K_MAX).into_par_iter().for_each(|i| {
        black_box(happy_number(i));
    });
    let duration = start.elapsed();
    println!("Execution time: {} ms", duration.as_millis());
}

Conclusion

Happy Numbers are deceptively simple, which makes them an excellent playground for experimentation and benchmarking.

Along the way, we’ve seen that:

Readability and performance often pull in opposite directions
Python is expressive and quick to iterate
C++ delivers raw performance
Rust offers an elegance, speed and strong guarantees

Of course, this algorithm could be pushed further. We could experiment with vectorization using NumPy, introduce multithreading, or explore other optimizations. But in practice, performance work is always a tradeoff between the time invested and the gains achieved. Optimizing for the sake of optimizing rarely pays off.

If this topic interests you, the same question (how far should we optimize?) is explored in more depth in this article about the Sieve of Eratosthenes, where looking at the problem from a lower-level perspective made all the difference.

The key takeaway remains simple and universal: Measure first. Optimize second.

At this point, you should be more than ready to solve the Happy Numbers puzzle on https://www.codingame.com/training/easy/happy-numbers

Pick your language of choice, play with different approaches, compare performance, and most importantly, have fun while learning.

Appendix

Cached Python Version

import time
from functools import lru_cache

k_MAX = 100_000

@lru_cache(maxsize=None)
def sum_of_squared_digits2(n_in: int) -> int:
    n_out = 0
    while n_in:
        digit = n_in % 10
        n_out += digit * digit
        n_in = n_in // 10
    return n_out


start_time = time.time()

for n in range(1, k_MAX + 1):
    n_set = set()
    n_init = n
    while n != 1 and n not in n_set:
        n_set.add(n)
        n = sum_of_squared_digits2(n)

end_time = time.time()
print(f"Execution time: {end_time - start_time:.6f} seconds")

Execution time: ~0.29 seconds

Slow and Fast Pointers in Python

Read this Python code
It is part of this highly recommended book