TextAttack icon indicating copy to clipboard operation
TextAttack copied to clipboard

Why does relaxing the constraint (RepeatModification) lead to less successful augmentation?

Open YanghaoZYH opened this issue 1 year ago • 0 comments

To Reproduce Run following code ...

from textattack.augmentation import Augmenter
from textattack.transformations import WordSwapEmbedding
from textattack.constraints.semantics import WordEmbeddingDistance
from textattack.constraints.grammaticality import PartOfSpeech
from textattack.constraints.pre_transformation import RepeatModification, StopwordModification
from textattack.shared import AttackedText

text_sample = "woody , what happened ?"
num_words_to_swap = len(AttackedText(text_sample).words) -1 # minus as what is a stop word
max_candidates = 50

num_samples = max_candidates**num_words_to_swap
print('max num_samples:', num_samples)

# Define constraints to ensure quality of perturbations
constraints = [StopwordModification(),RepeatModification()]
constraints.append(WordEmbeddingDistance(min_cos_sim=0.5))
constraints.append(PartOfSpeech(allow_verb_noun_swap=True))

# Define the transformation method
transformation = WordSwapEmbedding(
    max_candidates=50  # Number of candidates to generate per word
)

# Combine transformation and constraints in an Augmenter
augmenter = Augmenter(
    transformation=transformation,
    constraints=constraints,
    pct_words_to_swap=1,  # Percentage of words to swap per perturbation
    transformations_per_example=num_samples  # Number of perturbations to generate per input
)

perturbations = augmenter.augment(text_sample)
actural_num_samples = len(perturbations)
print('actural_num_samples: ',actural_num_samples)

Which gives me the output:

max num_samples: 2500
actural_num_samples:  532

But when I delete the RepeatModification constraint the other constraints and code remains the same:

constraints = [StopwordModification()]

gives me the output:

max num_samples: 2500
actural_num_samples:  277

Expected behavior I expect that easing the constraint should increase the num_samples, but it shows the opposite. Is there anything I misunderstood or is there a bug?

System Information (please complete the following information):

  • OS: Linux
  • Library versions torch==2.3.0, transformers==4.40.1
  • Textattack version 0.3.10

YanghaoZYH avatar May 03 '24 20:05 YanghaoZYH