GeneticAlgorithmPython icon indicating copy to clipboard operation
GeneticAlgorithmPython copied to clipboard

Calling best_solution() after a run with elitism doesn't return best solution

Open raphaelboudreault opened this issue 5 months ago • 2 comments

Hello and thanks for that fantastic library!

I believe I found a bug using the best_solution() method. When it is called after a run where elitism was activated (as per default) without providing the pop_fitness optional parameter, the cal_pop_fitness() method is called, where the fitness value of the elites in the population are not re-computed (as expected). However, at the end of a run, more precisely at the end of the last generation, the previous_generation_fitness attribute is set equal to last_generation_fitness (line 1926 of pygad.py), which leads to an incoherent definition of previous_generation_fitness that is not equal to the "fitness value of the generation before the last", and thus leading to elites that are given the incorrect fitness values, providing a wrong best solution.

While this is an unexpected behavior (in my opinion), a workaround is to always use ga_instance.best_solution(ga_instance.last_generation_fitness) once the GA run is completed, as used in the examples in the docs.

Looking closely to the code, self.previous_generation_fitness = self.last_generation_fitness.copy() is called both at line 1812 and line 1926 of pygad.py, and I believe only the first call should be kept (at the beginning of the loop, before computing the new fitness values).

Thanks in advance!

raphaelboudreault avatar Aug 27 '25 00:08 raphaelboudreault

Hi.

No, you cannot remove the command in line 1926, because if you do so and the best solution is found in the last generation, it won't be returned by the best_solution() call.

However, there is a bug somewhere as you can see if you try this code:

import numpy as np
import pygad

def f(ga_instance, x, x_idx):
    return -sum(x**2)          # i'm minimizing the function x^2

pop_size = 100
crossover_probability = 0.7
mutation_probability = 0.1
num_genes = 3
num_generations = 10
num_parents_mating = pop_size

ga_instance = pygad.GA(num_generations=num_generations,
                       num_parents_mating=num_parents_mating,
                       fitness_func=f,
                       sol_per_pop=pop_size,
                       num_genes=num_genes,
                       crossover_probability=crossover_probability,
                       mutation_probability=mutation_probability,
                       #save_best_solutions=True,
                       random_seed=42,
                       )

ga_instance.run()

print(f'Worst   f(x): {-np.min(ga_instance.best_solutions_fitness):.6f}')
print(f'Average f(x): {-np.mean(ga_instance.best_solutions_fitness):.6f}')
print(f'Best    f(x): {-np.max(ga_instance.best_solutions_fitness):.6f}')
print("Best solutions found at generation", ga_instance.best_solution_generation)
print("-"*50)

solution, solution_fitness, solution_idx = ga_instance.best_solution()
print(f'Best solution : {solution.round(4)}')
print(f'Best fitness  : {-solution_fitness:.6f}')
print(f'Best solution index : {solution_idx}')

The output is:

Worst   f(x): 0.382537
Average f(x): 0.199507
Best    f(x): 0.131763
Best solutions found at generation 3
--------------------------------------------------
Best solution : [ 0.0524 -0.7452 -0.7311]
Best fitness  : 1.092448
Best solution index : 83

But the best solution fitness found in the ga_instance.best_solutions_fitness list, and the answer given by the best_solution() are not the same, and they should be.

However, if you enable the save_best_solutions=True option, the results are the same.

I think the error is somewhere in the self.last_generation_elitism list (or self.self.last_generation_elitism_indices), as the first solution (in the for loop in best_solution()) it enters the elseif below (lines 1536-1543), and it returns the wrong index of the solution/fitness.

elif (self.keep_elitism > 0) and (self.last_generation_elitism is not None) and (len(self.last_generation_elitism) > 0) and (list(sol) in last_generation_elitism_as_list):
    # Return the index of the elitism from the elitism array 'self.last_generation_elitism'.
    # This is not its index within the population. It is just its index in the 'self.last_generation_elitism' array.
    elitism_idx = last_generation_elitism_as_list.index(list(sol))
    # Use the returned elitism index to return its index in the last population.
    elitism_idx = self.last_generation_elitism_indices[elitism_idx]
    # Use the elitism's index to return its pre-calculated fitness value.
    fitness = self.previous_generation_fitness[elitism_idx]

pmguerre avatar Sep 19 '25 01:09 pmguerre

Just an addon:

If I call this way:

solution, solution_fitness, solution_idx = ga_instance.best_solution(pop_fitness=ga_instance.last_generation_fitness)

it returns the correct value, i.e. the same as given by the call to np.max(ga_instance.best_solutions_fitness)

So, the error is somewhere in the building of the pop_fitness list, after the algorithm is terminated...

pmguerre avatar Sep 19 '25 12:09 pmguerre