GeneticAlgorithmPython icon indicating copy to clipboard operation
GeneticAlgorithmPython copied to clipboard

allow_duplicate_genes not working

Open JanKulbinski opened this issue 4 years ago • 5 comments

Hi!

I am trying to solve TSP with GA and it seems like allow_duplicate_genes is not working.

Reproduction: TSP with 32 citites, each city is represented by number [0, ..., 31]

ga_instance = pygad.GA(num_generations=5,
                       num_parents_mating=2,
                       fitness_func=fitness,
                       init_range_low=0,
                       init_range_high=32,
                       num_genes=32,
                       gene_space=a = np.arange(0,32,1),
                       gene_type=int,
                       allow_duplicate_genes=False,
                       )

a = ga_instance.run()
solution, solution_fitness, solution_idx = ga_instance.best_solution()
print(f'{solution}')
solution.sort(axis=0)
print(solution)

It gives: [25 15 20 1 30 1 19 13 29 10 28 3 24 12 12 5 0 26 26 6 7 2 23 16 20 18 8 11 18 3 17 26] [ 0 1 1 2 3 3 5 6 7 8 10 11 12 12 13 15 16 17 18 18 19 20 20 23 24 25 26 26 26 28 29 30]

As you see numbers 1, 3, 12, 18, 20, 26 are duplicated

JanKulbinski avatar Apr 12 '21 14:04 JanKulbinski

Hi,

Thanks for using PyGAD!

I have some comments on your code: You set gene_space=a = np.arange(0,32,1) which is not valid. Where is the variable a? I wonder if that code is working. The parameter sol_per_pop is missing. This one and the num_genes must exist as long as the initial_population parameter is not used.

I am using the latest version of PyGAD and I did not see any duplicates while allow_duplicate_genes=False. Note that I built a fitness function that returns random fitness values.

This is the code I tested where I find the difference between the following 2 sets:

  1. The set of unique values in the solution.
  2. The set of unique gene values (i.e. np.arange(0,32,1))

As long as you use 32 genes and the gene space has only 32 values, then it is expected that the difference between those 2 sets must be empty. This is what happens in my code. So, I think there is no issue with the allow_duplicate_genes parameter.

If my code does not reflect yours, please let me know.

import pygad
import numpy as np

def fitness(sol, idx):
    ss = set(np.unique(sol))
    
    r = set(np.arange(0,32,1)) - ss
    print(r)
    
    if len(r) > 0 :
        print("\n\nSomething is WRONG\n\n")
    
    return np.random.rand()

ga_instance = pygad.GA(num_generations=50,
                       num_parents_mating=2,
                       fitness_func=fitness,
                       init_range_low=0,
                       init_range_high=32,
                       sol_per_pop = 10,
                       num_genes=32,
                       gene_space=np.arange(0,32,1),
                       gene_type=int,
                       allow_duplicate_genes=False)

ga_instance.run()
solution, solution_fitness, solution_idx = ga_instance.best_solution()
# print(f'{solution}')
solution.sort(axis=0)
# print(solution)

ss = set(np.unique(solution))

r = set(np.arange(0,32,1)) - ss
print(r)

if len(r) > 0 :
    print("\n\nSomething is WRONG\n\n")

ahmedfgad avatar Apr 12 '21 19:04 ahmedfgad

Yes, it is working. The cause was lack of a gene_space parameter . Thank you for the response and this amazing library

JanKulbinski avatar Apr 12 '21 20:04 JanKulbinski

ga_instance = pygad.GA(num_generations = num_generations,
                       num_parents_mating = num_parents_mating,
                       sol_per_pop  = population_size,
                       fitness_func = fitness_function,  
                       num_genes = list_size,
                       gene_type = int,
                       gene_space = np.arange(0,list_size,1),
                       allow_duplicate_genes = False,
                       mutation_type = None,
                       on_start=on_start,
                       on_fitness=on_fitness,
                       on_parents=on_parents,
                       on_crossover=on_crossover,
                       on_mutation=on_mutation,
                       on_generation=on_generation,
                       on_stop=on_stop,
                       save_solutions = True)

ga_instance.run()

print('From this')
print(ga_instance.initial_population)

print('To this...')
print(ga_instance.population)

And i getting solutions with duplicated genes like:

[10 3 14 4 17 10 5 0 7 6 11 8 15 16 13 1 14 4 2 9]]

Mutation is not enabled, but i guess there is something i'm missing... Should allow_duplicate_genes also block duplicates after the mating?

Thank

KevinGalassi avatar Apr 08 '22 13:04 KevinGalassi

ga_instance = pygad.GA(num_generations = num_generations,
                       num_parents_mating = num_parents_mating,
                       sol_per_pop  = population_size,
                       fitness_func = fitness_function,  
                       num_genes = list_size,
                       gene_type = int,
                       gene_space = np.arange(0,list_size,1),
                       allow_duplicate_genes = False,
                       mutation_type = None,
                       on_start=on_start,
                       on_fitness=on_fitness,
                       on_parents=on_parents,
                       on_crossover=on_crossover,
                       on_mutation=on_mutation,
                       on_generation=on_generation,
                       on_stop=on_stop,
                       save_solutions = True)

ga_instance.run()

print('From this')
print(ga_instance.initial_population)

print('To this...')
print(ga_instance.population)

And i getting solutions with duplicated genes like:

[10 3 14 4 17 10 5 0 7 6 11 8 15 16 13 1 14 4 2 9]]

Mutation is not enabled, but i guess there is something i'm missing... Should allow_duplicate_genes also block duplicates after the mating?

Thank

@KevinGalassi, allow_duplicate_genes works only after the mutation is applied. The reason is that even if there is a duplicate, then it can be solved using mutation because it can generate new values for a gene to solve the duplicate.

But for crossover, it only combines the genes from 2 solutions. Crossover is not meant to introduce new gene values by its own.

But I think it would be a good feature to support. A warning maybe used if mutation is disabled while allow_duplicate_genes=False.

ahmedfgad avatar Apr 08 '22 19:04 ahmedfgad

My bad, when I looked at the wiki I haven't found this information explicitly declared. I avoided mutation because the possibility of multiple genes with the same value, but the same problem may arise with crossover too.

BTW I'm trying to solve a kind of 'Travelling Saleman Problem', guess I'll look online.

Thanks

KevinGalassi avatar Apr 09 '22 08:04 KevinGalassi

I might be doing something wrong but allow_duplicate_genes=False is not working for me, even the best solutions for the fitness function I am using have duplicate genes.

For my case I am trying a fitness function that takes around 20 min, but here with a dummy fitness function also returns solutions with duplicated genes as the ones to be printed at the end:

def Genes_Trial(x, x_idx):
    rng_noise =  np.random.default_rng(678910)
    dummy_fit = rng_noise.random()*100
    x = np.sort(x)
    return dummy_fit


gene_space = np.arange(1,41,1)

ga_instance = pygad.GA(num_generations = 300,
                           num_parents_mating = 40,
                           sol_per_pop = 50,
                           num_genes = 6,
                           init_range_low = gene_space[0],
                           init_range_high = gene_space[-1],
                           gene_space = gene_space,
                           gene_type = int,
                           keep_elitism = 2,
                           mutation_probability = 0.025,
                           fitness_func = Genes_Trial,
                           save_solutions = False,
                           allow_duplicate_genes = False,
                           save_best_solutions = True,
                           random_seed=12345
                           )
ga_instance.run()

trial = ga_instance.solutions
trial = np.sort(trial)

unique_genes = []
for i_genes in range(trial.shape[0]):
    unique_genes.append(np.unique(trial[i_genes,:]))

for i_sol in range(len(unique_genes)):
    if len(unique_genes[i_sol])<n_sensors:print(np.array(ga_instance.solutions[i_sol]))

Initially I tried with adaptive mutation and thought that was the problem, then when mutation_type is left to defaults but the mutation_probability is set, there are duplicates. However, when mutation_probability is set to default, no duplicates are generated.

Then, I am not sure how to proceed since I am not sure mutation is happening at all when mutation_type and mutation_probability is set to default.

gabrieldelpozo avatar Oct 18 '22 09:10 gabrieldelpozo

@gabrieldelpozo,

A new release will be pushed soon with a fix to this issue. It happens as crossover creates duplicate genes that, for sometimes , are not solved.

ahmedfgad avatar Feb 22 '23 13:02 ahmedfgad