Skip to content

API Reference

The finchGE API consists of:

finchge.core.Population(initialiser, population_size)

Container for Individuals in an evolutionary run.

Population is representation-agnostic and does not perform genotype, tree, or phenotype initialisation logic.

Create a Population.

Exactly one of individuals or initialiser must be provided.

Parameters:

Name Type Description Default
initialiser Optional[GEInitialiser]

Initializer used to generate individuals.

required
population_size Optional[int]

Number of individuals to generate (required if initialiser is used).

required

from_individuals(individuals, population_size) classmethod

Construct a Population directly from a list of Individuals.

This is used when offspring are produced by crossover/mutation strategies that already return fully-formed Individuals.

finchge.core.Individual(*, genotype=None, phenotype=None, used_genome=None, used_codon_count=0, invalid=False, tree=None)

Represents a single individual in an evolutionary algorithm.

An Individual encapsulates the genetic representation (genotype), its expressed form (phenotype), and metadata produced during evaluation and selection. Fitness values are not assigned at construction time and must be computed by a FitnessEvaluator.

Algorithm-specific information (e.g., Pareto rank, crowding distance, or reference-point association) is stored in the meta dictionary and is expected to be populated and cleared by the evolutionary algorithm.

Attributes:

Name Type Description
genotype Optional[list[int]]

Genetic representation of the individual. A list of integers or None if not yet initialized or derived from a tree-based initialiser.

phenotype str

Phenotypic representation derived from the genotype via grammar-based mapping. Empty until mapping occurs.

used_genome Optional[list[int]]

Portion of the genotype consumed during genotype-to-phenotype mapping. Set during evaluation.

used_codon_count Optional[int]

Number of codons consumed during mapping.

invalid bool

Indicates whether the individual is invalid (e.g., mapping failure). Invalid individuals should not participate in fitness-based selection.

tree Optional[str]

Serialized derivation tree (e.g., JSON) produced during mapping or tree-based initialisation.

fitness list[float]

Fitness values assigned during evaluation. Empty until evaluated. Supports both single-objective and multi-objective optimization.

meta Dict[str, Any]

Algorithm-specific metadata managed by the evolutionary algorithm, such as Pareto rank, crowding distance, or dominance information.

Initialize an Individual.

Parameters:

Name Type Description Default
genotype Optional[list[int]]

Genetic representation as a list of integers, or None if the individual is initialized from a tree or will be derived later.

None
phenotype str

Phenotypic representation of the individual. Defaults to an empty string and is populated during mapping.

None
used_genome Optional[list[int]]

Portion of the genotype actually consumed during mapping. Populated during evaluation.

None
used_codon_count Optional[int]

Number of codons consumed during genotype-to-phenotype mapping.

0
invalid bool

Whether the individual is invalid (e.g., mapping failure).

False
tree Optional[str]

Optional serialized derivation tree (e.g., JSON) associated with this individual.

None

clone()

Create a deep copy of this individual.

Returns:

Type Description
Individual

A new Individual with copied genotype, phenotype, fitness, and metadata.

from_genotype(genotype) classmethod

Construct an Individual from a genotype.

Parameters:

Name Type Description Default
genotype list[int]

Genetic representation as a list of integers.

required

Returns: A new Individual initialized with the given genotype.

from_tree(tree) classmethod

Construct an Individual from a derivation tree.

Parameters:

Name Type Description Default
tree str

Root of the derivation tree used to initialize the individual.

required

Returns:

Type Description
Individual

A new Individual initialized with the given tree representation.

get_meta(key, expected_type=None)

Retrieve algorithm-specific metadata stored on the individual.

Parameters:

Name Type Description Default
key str

Metadata key to retrieve.

required
expected_type Type[T] | None

Expected type of the metadata value if being strict.

None

Raises:

Type Description
ValueError

If the key is missing or the value does not match the expected type.

Returns:

Type Description
Any

The metadata value associated with the given key.

mutated(mutation_strategy)

Apply a mutation strategy to this individual.

Parameters:

Name Type Description Default
mutation_strategy GEMutationStrategy

Mutation operator used to produce a mutated individual.

required

Returns:

Type Description
Individual

A new Individual representing the mutated offspring.

sort_key(maximize)

Return a scalar key suitable for sorting individuals by fitness. Invalid or unevaluable individuals are always pushed to the end for minimization use +inf and for maximization: use -inf

finchge.grammar.Grammar(grammar_str, parser=None)

Represents a grammar and provides methods for mapping genotypes to phenotypes using grammatical evolution.

This class utilizes GrammarParser to parse the grammar (typically, Backus-Naur Form (BNF))

Parameters:

Name Type Description Default
grammar_str str

The grammar definition as a BNF-formatted string.

required
parser Optional[GrammarParser]

GrammarParser will be used if not provided

None

analyze()

Analyze grammar structure for initialisation and diagnostics. Computes recursion, min/max derivation depth, arity, and termination feasibility. Results are stored on Rule objects.

References

[1] Conor Ryan, J. J. Collins, and Michael O'Neill. 1998. Grammatical Evolution: Evolving Programs for an Arbitrary Language. In Proceedings of the First European Workshop on Genetic Programming (EuroGP '98). Springer-Verlag, Berlin, Heidelberg, 83-96.

[2] O’Neill, M. and Ryan, C. (2001,) Grammatical Evolution. IEEE Transactions on Evolutionary Computation, 5, 349-358. https://doi.org/10.1109/4235.942529

compute_min_ramp(*, population_size, max_init_depth)

Compute minimum ramp depth.

Ramping should start at the first depth where the grammar can generate enough distinct derivations to support population diversity across the requested ramp range.

compute_permutations_by_depth(max_depth)

Estimate the number of distinct derivation trees available at each exact depth.

The internal count is cumulative up to a depth budget. The stored permutation count subtracts the previous depth budget so each entry represents trees whose maximum depth is exactly that depth.

describe(expanded=True)

Returns summary information about the grammar, including rule counts and structure. Set expanded=False to display original contracted versions like [a-z] for range if the grammar uses that syntax Default setting is to display expanded version.

Parameters:

Name Type Description Default
expanded bool

flag whether to show expanded version True by default.

True

Returns:

Name Type Description
str str

A formatted string containing grammar statistics and structure.

from_file(filename, parser=None) classmethod

Create a Grammar instance from a file containing BNF rules. This method reads the BNF grammar from a text file and initializes the grammar with the same parameters as the direct constructor.

Parameters:

Name Type Description Default
filename str

Path to the file containing BNF grammar rules

required
parser Optional[GrammarParser]

GrammarParser will be used if not provided

None

finchge.grammar.parser

GrammarParser

Bases: ABC

Base class for all grammar parsers.

Any custom parser must implement the parse() method which returns a standardized tuple containing grammar components.

parse() abstractmethod

Parse input data and return grammar components.

Returns:

Name Type Description
tuple tuple[dict[str, Rule], dict[str, Rule], str, list[str], list[str]]

containing the rules, terminals and non-terminals

Response Contains
  • rules_original (dict[str, Rule]): Mapping of non-terminal symbols to Rule objects (original written form without range expansion).
  • rules (dict[str, Rule]): Mapping of non-terminal symbols to Rule objects in Expanded (range notation) form.
  • start_rule (str): The first non-terminal rule, considered the start rule.
  • terminals (set[str]): Set of terminal symbols.
  • non_terminals (set[str]): Set of non-terminal symbols.

BNFGrammarParser(grammar_str)

Bases: GrammarParser

Parses a grammar string written in Backus-Naur Form (BNF) into structured rules.

The parser identifies terminal and non-terminal symbols, splits productions into alternatives, and validates the structure of the grammar.

Attributes:

Name Type Description
grammar_str str

The raw BNF grammar string.

Args: grammar_str (str): The grammar definition in BNF format.

parse()

Parses the BNF grammar string into rules, terminals, and non-terminals.

Returns:

Name Type Description
Tuple tuple[dict[str, Rule], dict[str, Rule], str, list[str], list[str]]

Tuple containing rules (original), rules_expanded, start_rule, terminals and non-terminals.

Response contains
  • rules_original (dict[str, Rule]): Mapping of non-terminal symbols to Rule objects. (Original may be contracted form)
  • rules (dict[str, Rule]): Mapping of non-terminal symbols to Rule objects in Expanded (range notation) form.
  • start_rule (str): The first non-terminal rule, considered the start rule.
  • terminals (set[str]): List of terminal symbols.
  • non_terminals (set[str]): List of non-terminal symbols.

Raises:

Type Description
ValueError

If the grammar syntax is invalid or contains undefined symbols.

remove_comments(lines)

Remove comments and empty lines.

consolidate_multiline_rules(lines)

If same rule sreads across multiple lines, then just make them a single line Simple rule is unti we see a token before ::= , its the same rule

Parameters:

Name Type Description Default
lines list[str]
required

Returns:

split_choices(rhs_symbols)

Split RHS into production choices For now it simply finds choices based on presence of choice symbol (Can be improved later, if needed.)

finchge.grammar.rule

Rule(symbol, choices)

Represents a single production rule in a BNF grammar.

Each rule contains a non-terminal symbol (e.g., "") and a list of possible expansions (choices) for that symbol.

Parameters:

Name Type Description Default
symbol str

The non-terminal symbol on the left-hand side (LHS) of the rule.

required
choices list[list[str]]

A list of possible right-hand side (RHS) expansions. Each choice is a list of symbols (either terminals or non-terminals).

required

finchge.grammar.GenotypeMapper(*, grammar, max_tree_depth=None, max_recursion_depth=None, max_wraps=0, repair_strategy=None, random_state=None)

Bases: RandomStateMixin

Bidirectional genotype-phenotype codec for Grammatical Evolution.

GenotypeMapper implements the core mapping logic of Grammatical Evolution (GE), providing:

  • Decoding (map): genotype >>> derivation tree >>> phenotype
  • Encoding (reverse_map): derivation tree >>> genotype (that reproduces the same tree)
Follows classic GE semantics
  • Stack-based depth-first expansion
  • Modulo-based production selection
  • Rightmost-first expansion (LIFO stack)
  • Codon wrapping with a configurable limit
  • Explicit recursion depth control

This class is grammar-aware but grammar-agnostic: all syntactic structure comes from the provided Grammar instance.

Attributes:

Name Type Description
grammar Grammar

Grammar defining the genotype-to-phenotype mapping rules.

rules dict[str, Rule]

Grammar production rules indexed by non-terminal symbols.

non_terminals list[str]

Set of grammar non-terminal symbols.

start_rule str

Start symbol used for derivation.

max_recursion_depth int

Maximum allowed recursion depth during decoding.

max_wraps int

Maximum number of genotype wraps allowed during decoding.

repair_strategy Optional[RepairStrategy]

Optional phenotype repair strategy applied after decoding.

Initialize a GenotypeMapper with grammar and mapping constraints.

Parameters:

Name Type Description Default
grammar Grammar

Grammar defining production rules and non-terminals used for mapping.

required
max_tree_depth Optional[int]

Max depth limit for resulting trees.

None
max_recursion_depth int

Maximum depth allowed during derivation tree expansion. Mapping is marked invalid if exceeded. Defaults to 20.

None
max_wraps int

Maximum number of times the genotype may be wrapped when codons are exhausted. Mapping is marked invalid if exceeded. Defaults to 6.

0
repair_strategy Optional[RepairStrategy]

Optional strategy to repair or post-process the phenotype after successful decoding. Defaults to None.

None
random_state int

random state

None

map(genotype)

Decode a genotype into a derivation tree and phenotype using standard Grammatical Evolution (GE) mapping. This implementation performs stack-based depth-first expansion using leftmost derivation semantics.

Follows leftmost derivation sematnics
  • Children are attached to the tree in left-to-right order
  • Children are pushed onto the stack in reverse order ensuring leftmost child expands first

Codons are consumed sequentially and mapped to grammar productions using modulo selection.

Wrapping is supported when the genotype is exhausted, up to max_wraps times.

Mapping is aborted and marked invalid if
  • wrapping limit is exceeded
  • recursion depth exceeds max_recursion_depth
  • the final tree contains unresolved non-terminals

Parameters:

Name Type Description Default
genotype list[int]

List of integer codons representing the genotype.

required

Returns:

Name Type Description
MappingResult MappingResult

Object containing:

Text Only
    - phenotype (str): Generated phenotype string.
    - used_genome (list[int]): Codons actually consumed.
    - used_codon_count (int): Number of codons consumed.
    - invalid (bool): Whether mapping failed.
    - tree (TreeNode): Root of the derivation tree.
    - tree_json (str): Serialized tree representation.

reverse_map(tree, *, codon_size=127, pad_to_length=None, pad_mode='random')

Encode a derivation tree into a genotype that reproduces the same tree.

This method traverses the tree in the same order that map() consumes codons: leftmost depth-first expansion of non-terminals.

Parameters:

Name Type Description Default
tree Union[TreeNode, str]

TreeNode root or JSON serialized tree.

required
codon_size int

Maximum codon value (inclusive).

127
pad_to_length Optional[int]

Optional genome length padding.

None
pad_mode str

"zeros" or "random".

'random'

Returns:

Type Description
list[int]

list[int]: Genotype reproducing this tree.

finchge.grammar.mapper.MappingResult(phenotype, used_genome, used_codon_count, invalid, tree, tree_str) dataclass

Result of genotype to phenotype mapping.

finchge.grammar.derivation_tree.TreeNode(symbol)

Represents a derivation tree (root node) and its children. The Derivation Tree is used during genotype-to-phenotype mapping.

As TreeNode is used to represent both node and full derivation tree, the properties such as depth, max_depth should be used with caution. The propertydepth is the depth of current node, while max_depth should be used to get the depth of the tree.

Tree depth is derived dynamically from parent links. The property depth gives the depth of the current node, while max_depth gives the maximum depth of the full tree.

Each node holds a symbol (terminal or non-terminal) and may have children. The tree supports JSON and string (CSV-like) serialization. Args: symbol (str): The symbol associated with this node.

add_child(child)

Adds a child to the current node and updates depth, parent, and root.

Parameters:

Name Type Description Default
child TreeNode

Node to add as a child.

required

Raises:

Type Description
TypeError

If child is not a TreeNode instance.

clone()

Deep copy of this node and its entire subtree. Parent pointers are reconstructed correctly.

from_dict(data) classmethod

Constructs a TreeNode from a dictionary.

Parameters:

Name Type Description Default
data dict

Dictionary with keys 'symbol' and 'children'.

required

Returns:

Name Type Description
TreeNode TreeNode

Reconstructed tree.

Raises:

Type Description
KeyError

If required fields are missing.

from_json(json_str) classmethod

Constructs a TreeNode from a JSON string.

Parameters:

Name Type Description Default
json_str str

JSON representation of the tree.

required

Returns:

Name Type Description
TreeNode TreeNode

Reconstructed tree.

Raises:

Type Description
JSONDecodeError

If JSON string is invalid.

from_string(s) staticmethod

Deserialize a tree from its canonical string representation.

Parameters:

Name Type Description Default
s str

Serialized tree string.

required

Returns:

Name Type Description
TreeNode TreeNode

Root node of reconstructed tree.

replace_subtree_with(new_subtree)

Replace this subtree with new_subtree.

Parameters:

Name Type Description Default
new_subtree TreeNode

Root of the subtree that will replace self.

required

Returns:

Type Description
TreeNode

The root of the resulting tree.

size()

convert the tree into a phenotype

swap_subtree_with(other)

Swap this subtree with another subtree.

Returns the new roots of both trees.

to_dict()

Converts the node and its subtree into a dictionary.

Returns:

Name Type Description
dict dict[str, Any]

Dictionary representation of the tree.

to_json(indent=2)

Serializes the tree into a JSON-formatted string.

Parameters:

Name Type Description Default
indent int

Indentation for the JSON string.

2

Returns:

Name Type Description
str str

JSON string representation of the tree.

to_phenotype()

convert the tree into a phenotype

to_string()

Reversible tree serialization using length-prefixed symbols.

Format

Length followd by the symbol and the children in braces. For example.

Text Only
<len>:<symbol>{child1,child2,...}

Leaf : 1:x, internal node: 3:add{1:x,1:y}

Returns:

Name Type Description
str str

Canonical serialized tree string.

finchge.algorithm

GeneticAlgorithm(selection, crossover, mutation, replacement, elite_size, fitness_evaluator, random_state=None)

Bases: BaseAlgorithm

Genetic Algorithm

Parameters:

Name Type Description Default
selection GESelectionStrategy

Selection strategy instance or function.

required
crossover GECrossoverStrategy

Crossover strategy instance or function.

required
mutation GEMutationStrategy

Mutation strategy instance or function.

required
replacement GEReplacementStrategy

Replacement strategy instance or function.

required
elite_size int

Number of elite individuals to carry over.

required
random_state Optional[int]

random state

None

evolve_one_generation(population)

Perform one generation of evolution on the given population.

Parameters:

Name Type Description Default
population Population

The population to evolve.

required

Returns:

Name Type Description
Population Population

The evolved population.

sort_population(population)

Sorts the population based on fitness values. Supports the fitness_evaluator with single fitness function.

Parameters:

Name Type Description Default
population Population

The population to sort.

required

NSGA2(selection, crossover, mutation, replacement, elite_size, fitness_evaluator, random_state=None)

Bases: BaseAlgorithm

NSGA-II algorithm.

Parameters:

Name Type Description Default
selection GESelectionStrategy

Selection strategy.

required
crossover GECrossoverStrategy

Crossover strategy.

required
mutation GEMutationStrategy

Mutation strategy.

required
replacement GEReplacementStrategy

Replacement strategy.

required
elite_size int

Number of elite individuals to carry over.

required
fitness_evaluator FitnessEvaluator

Evaluator used to score individuals.

required
random_state Optional[int]

Seed for reproducible randomness.

None

evolve_one_generation(population)

Perform one generation of evolution on the given population.

Parameters:

Name Type Description Default
population Population

The population to evolve.

required

Returns:

Name Type Description
Population Population

The evolved population.

get_pareto_front(population)

Return the Pareto front (rank 0 individuals).

sort_population(population)

Sort the population using NSGA-II rank and crowding distance.

Parameters:

Name Type Description Default
population Population

The population to sort.

required

NSGA3(selection, crossover, mutation, fitness_evaluator, num_divisions=12, random_state=None)

Bases: BaseAlgorithm

NSGA-III algorithm.

Parameters:

Name Type Description Default
selection GESelectionStrategy

Selection strategy.

required
crossover GECrossoverStrategy

Crossover strategy.

required
mutation GEMutationStrategy

Mutation strategy.

required
fitness_evaluator FitnessEvaluator

Evaluator used to score individuals.

required
num_divisions int

Number of divisions for reference points.

12
random_state Optional[int]

Seed for reproducible randomness.

None

evolve_one_generation(population)

Perform one generation of evolution on the given population.

Parameters:

Name Type Description Default
population Population

The population to evolve.

required

Returns:

Name Type Description
Population Population

The evolved population.

get_pareto_front(population)

Return the Pareto front.

Parameters:

Name Type Description Default
population Population

Population to extract the Pareto front from.

required

Returns:

Type Description
list[Individual]

Individuals in the first non-dominated front.

finchge.initialisation

RandomGenomeInitialiser(genome_length, codon_size, random_state=None)

Bases: GEInitialiser

Random genome initialiser for Grammatical Evolution.

Generates Individuals with a fixed-length integer genome where each codon is sampled uniformly from [0, codon_size].

This initialiser requires genome_length and codon_size to be provided explicitly via configuration or constructor.

from_config(config) classmethod

Create a RandomGenomeInitialiser using FinchConfig object

Parameters:

Name Type Description Default
config FinchConfig

FinchConfig instance

required

Returns:

Type Description
RandomGenomeInitialiser

RandomGenomeInitialiser

Raises:

Type Description
ConfigError

If required configuration keys are missing or invalid.

RVDInitialiser(genome_length, codon_size, population_size, random_state=None, mapper=None)

Bases: GEInitialiser

Random Valid Distinct (RVD) initialiser.

Generates individuals by sampling random genomes while rejecting invalid mappings and duplicate phenotypes. - RVD initialisation [Nicolau, 2017].

Note

RVD initializer instance is supposed to be used once per run or the stored RVD state may cause issues. To reset the stored state call reset() , if same RVDInitializer instance has to be reused.

from_config(config) classmethod

Create a RVDInitializer from the [ge] config section.

Parameters:

Name Type Description Default
config FinchConfig

FinchConfig instance

required

Returns:

Type Description
RVDInitialiser

RandomGenomeInitialiser

Raises:

Type Description
ConfigError

If required configuration keys are missing or invalid.

reset()

If same RVD instance is used for multiple runs. must reset

FullTreeInitialiser(init_min_depth, init_max_depth, strict_full=True, random_state=None)

Bases: GETreeInitialiser

Full tree initialiser as defined by Koza (1992).

Produces trees where all internal nodes expand until max depth. Terminals are only allowed at the maximum depth.

GrowTreeInitialiser(init_min_depth, init_max_depth, random_state=None)

Bases: GETreeInitialiser

Grow tree initialiser as defined by Koza (1992).

Allows productions to terminate early below max depth, resulting in irregular tree shapes.

PIGrowInitialiser(init_max_depth, population_size, random_state=None)

Bases: GETreeInitialiser

Position-Independent Grow (PI-Grow) tree initialiser.

Generates derivation trees using grow-style initialisation with position-independent expansion.

from_config(config) classmethod

Create a PI-Grow initialiser from GE configuration.

Parameters:

Name Type Description Default
config FinchConfig

FinchConfig instance

required

Returns:

Type Description
PIGrowInitialiser

PIGrowInitializer

Raises:

Type Description
ConfigError

If required configuration keys are missing or invalid.

RHHInitialiser(init_max_depth, population_size, strict_full=True, random_state=None)

Bases: GETreeInitialiser

Ramped Half-and-Half (RHH) Initialisation for Grammatical Evolution.

This implementation of RHH Initialisation is the Grammatical Evolution is based on "Sensible Initialization" approach as an analogue of Koza’s Ramped Half-and-Half (RHH) method originally developed for tree-based Genetic Programming. It generates an initial population of derivation trees using a combination of Full and Grow construction strategies across a range of depth limits, promoting structural diversity while respecting grammar constraints.

In this approach, grammar production rules assume the role of GP functions in controlling tree growth. Recursive production rules are preferentially selected during Full initialisation to ensure tree expansion until the specified depth limit is reached. During Grow initialisation, both recursive and terminating productions may be selected, allowing trees of irregular shape to be generated.

Production feasibility is guided by grammar analysis, including recursion detection and minimum derivation depth calculation, ensuring that generated trees can terminate within the allowed depth budget.

References

Koza, J. R. (1992). Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press.

Ryan, C., Collins, J. J., & O’Neill, M. (1998). Grammatical Evolution: Evolving Programs for an Arbitrary Language. In Proceedings of the First European Workshop on Genetic Programming.

Ryan, C., & Azad, R. M. A. (2003). Sensible Initialisation in Grammatical Evolution. In Grammatical Evolution: Evolutionary Automatic Programming in an Arbitrary Language. Springer.

Fenton, M., McDermott, J., Fagan, D., Forstenlechner, S., Hemberg, E., & O’Neill, M. (2017). PonyGE2: Grammatical Evolution in Python. arXiv:1703.08535.

from_config(config) classmethod

Create a RampedHalfAndHalfInitializer from configuration.

Parameters:

Name Type Description Default
config FinchConfig

FinchConfig instance

required

Returns:

Type Description
RHHInitialiser

RampedHalfAndHalfInitializer

Raises:

Type Description
ConfigError

If required configuration keys are missing or invalid.

PTC2Initialiser(target_size, random_state=None, max_depth=None)

Bases: GETreeInitialiser

RampedPTC2Initialiser(init_min_size, init_max_size, population_size, max_depth=None, random_state=None)

Bases: GETreeInitialiser

Ramped size-based population initialiser using PTC2 family algorithms.

This initialiser generates individuals using Probabilistic Tree Creation 2 across a ramped distribution of target tree sizes.

Behaviour

• If max_depth is None: Uses PTC2 (size-controlled only).

• If max_depth is provided: Uses PTC2D (size + depth constrained).

finchge.operators.selection

GESelectionStrategy(max_best, random_state=None)

Bases: RandomStateMixin, ABC

TournamentSelection(max_best, tournament_size=3, random_state=None)

Bases: GESelectionStrategy

Selection strategy that chooses individuals using tournament selection.

In tournament selection, a fixed number of individuals (tournament_size) are randomly chosen from the population, and the best among them (based on fitness) is selected. This process repeats until the desired number of individuals is selected. Selection pressure increases with larger tournament sizes.

Parameters:

Name Type Description Default
max_best bool

Whether higher fitness values are better. True for maximization, False for minimization.

required
tournament_size int

The number of individuals competing in each tournament. Must be >= 2. Defaults to 3.

3
random_state Optional[int]

Random seed.

None

select(population_size, individuals)

Selects a subset of individuals using tournament selection.

Repeatedly selects a group of individuals (a "tournament") at random from the population, and chooses the best among them based on fitness. This process continues until the desired number of individuals is selected. Whether the best individual is determined by maximizing or minimizing fitness depends on the max_best setting.

Parameters:

Name Type Description Default
population_size int

The number of individuals to select for the next generation.

required
individuals list[Individual]

A list of individuals to select from. Each must have a fitness attribute, which is expected to be a list containing a single float value (e.g., [fitness]).

required

Returns:

Type Description
list[Individual]

A list of selected individuals based on tournament outcomes.

RouletteWheelSelection(max_best, random_state=None)

Bases: GESelectionStrategy

Fitness-proportionate selection (roulette wheel selection).

Roulette wheel selection is one of the earliest and most widely used selection mechanisms in evolutionary computation. The method assigns selection probabilities to individuals proportional to their fitness, enabling stochastic sampling while maintaining selection pressure toward fitter individuals.

Each individual occupies a proportion of a conceptual roulette wheel, where the area assigned to each individual is proportional to its fitness. Selection is performed by sampling from this distribution, ensuring that individuals with higher fitness are more likely—but not guaranteed—to be selected.

Fitness-proportionate selection originates from early genetic algorithm research and is commonly attributed to the foundational work on genetic algorithms by John Holland.

Parameters:

Name Type Description Default
max_best bool

Whether higher fitness values are better. True for maximization, False for minimization.

required
random_state Optional[int]

Random seed.

None

select(population_size, individuals)

Selects a subset of individuals using roulette wheel selection.

This method performs fitness-proportionate selection, where individuals are assigned selection probabilities based on their fitness values. The fitness values are shifted to ensure they are non-negative, and if necessary, inverted for minimization problems. If the total weight of the shifted fitness values is zero or negative, the method falls back to uniform random selection.

Parameters:

Name Type Description Default
population_size int

The number of individuals to select for the next generation.

required
individuals list[Individual]

A list of individuals to select from. Each individual must have a fitness attribute, which is expected to be a list containing a single float value (e.g., [fitness]).

required

Returns:

Type Description
list[Individual]

A list of selected individuals based on roulette wheel selection probabilities.

RankSelection(max_best, selection_pressure=1.5, random_state=None)

Bases: GESelectionStrategy

Selection strategy that selects individuals based on their rank in the population.

In rank selection, individuals are first sorted by fitness, and then assigned selection probabilities based on their rank rather than their raw fitness value. The selection pressure can be adjusted using the selection_pressure parameter, which influences how strongly the best individuals are favored. A higher selection pressure makes the selection more biased toward higher-ranked individuals.

Parameters:

Name Type Description Default
max_best bool

Whether higher fitness values are better. True for maximization, False for minimization.

required
selection_pressure float

The selection pressure that controls the bias toward higher-ranked individuals. Must be between 1.0 and 2.0. Defaults to 1.5.

1.5
random_state Optional[int]

Random seed.

None

Raises:

Type Description
ValueError

If selection_pressure is not between 1.0 and 2.0.

select(population_size, individuals)

Selects a subset of individuals using linear rank-based selection.

This method applies rank selection by sorting individuals based on fitness and assigning selection probabilities based on their rank rather than their raw fitness values. The selection pressure parameter controls how much more likely the best-ranked individuals are to be selected compared to lower-ranked ones.

Parameters:

Name Type Description Default
population_size int

The number of individuals to select.

required
individuals list[Individual]

A list of individuals to select from. Each individual must have a fitness attribute, which is a list containing a single float value representing its fitness.

required

Returns:

Type Description
list[Individual]

A list of selected individuals from the population based on rank-weighted probabilities.

TruncationSelection(max_best, truncation_threshold=0.5, random_state=None)

Bases: GESelectionStrategy

Truncation selection strategy.

Truncation selection is a deterministic selection mechanism in which only the top-performing portion of a population is allowed to reproduce. The selected elite subset is determined by sorting individuals according to fitness and selecting a predefined fraction of the best individuals.

The population is ranked by fitness and only the top k fraction of individuals is retained as eligible parents. Selection from this elite pool may then occur randomly or deterministically.

Parameters:

Name Type Description Default
max_best bool

Whether higher fitness values are better. True for maximization, False for minimization.

required
truncation_threshold float

Threshold value.

0.5
random_state Optional[int]

Random seed.

None

select(population_size, individuals)

Selects a subset of individuals using truncation selection.

First, the population is sorted by fitness (descending for maximization, ascending for minimization). Then, the top fraction specified by truncation_threshold is selected as the elite pool. Finally, individuals are randomly chosen from this elite pool to reach the desired population size.

Parameters:

Name Type Description Default
population_size int

The number of individuals to select for the next generation.

required
individuals list[Individual]

A list of individuals to select from. Each individual must have a fitness attribute that is comparable.

required

Returns:

Type Description
list[Individual]

A list of selected individuals randomly chosen from the elite portion of

list[Individual]

the population.

LexicaseSelection(case_key='errors', case_max_best=False, random_state=None)

Bases: GESelectionStrategy

Implements Lexicase Selection for parent selection in Genetic Programming.

This strategy filters a population by evaluating individuals against one case at a time in a randomized order. This is particularly effective for problems where different individuals might excel at different parts of the fitness landscape, as it preserves diversity better than standard tournament selection.

Attributes:

Name Type Description
case_key str

The key used to retrieve the per-case fitness vector from an individual's metadata.

required_case_keys tuple

A tuple containing the primary case key needed for evaluation.

Initializes the selection strategy.

Parameters:

Name Type Description Default
case_key str

Metadata key where the casewise error/fitness list is stored.

'errors'
case_max_best bool

Set to True if higher values in the case vector are better (e.g., accuracy). Defaults to False (e.g., error/loss).

False
random_state Optional[int]

Seed for the random number generator.

None

select(population_size, individuals)

Selects parents using Lexicase selection.

EpsilonLexicaseSelection(case_key='errors', epsilon=0.0, case_max_best=False, random_state=None)

Bases: LexicaseSelection

A variant of Lexicase Selection that uses a margin of error (epsilon).

Instead of requiring individuals to exactly match the 'best' performance on a given case, this strategy allows individuals to survive if they are within a specified distance (epsilon) of the best performer. This helps prevent premature convergence caused by tiny differences in floating-point fitness values.

Initializes the Epsilon Lexicase strategy.

Parameters:

Name Type Description Default
case_key str

Metadata key for the casewise fitness vector.

'errors'
epsilon float

The buffer allowed when comparing fitness values.

0.0
case_max_best bool

True if higher values are better.

False
random_state Optional[int]

Seed for the random number generator.

None

NSGA2TournamentSelection(max_best=False, tournament_size=2, exploration_prob=0.1, random_state=None)

Bases: GESelectionStrategy

Tournament selection using NSGA-II crowded-comparison operator.

This selection method implements the parent selection strategy introduced in the NSGA-II multi-objective evolutionary algorithm. Selection is based on Pareto rank and crowding distance to balance convergence toward the Pareto front with population diversity.

Individuals are compared using a crowded comparison operator:

  • Prefer individuals with lower Pareto rank.
  • If ranks are equal, prefer individuals with higher crowding distance.

Tournament selection is applied using this comparison operator.

crowded_comparison_operator(tournament)

Selects the best individual using NSGA-II's crowded comparison operator.

The operator selects individuals based on lower Pareto rank (better) If same rank, higher crowding distance (better)

Parameters:

Name Type Description Default
tournament list[Individual]

The individuals competing in the tournament.

required

Returns:

Type Description
Individual

The selected winner from the tournament.

select(population_size, individuals)

Selects individuals using NSGA-II tournament selection.

For each tournament: 1. With probability exploration_prob, a random individual is chosen 2. Otherwise, the crowded comparison operator selects the best individual based on Pareto rank and crowding distance

Parameters:

Name Type Description Default
population_size int

Number of individuals to select.

required
individuals list[Individual]

List of individuals to select from. Each individual must have 'rank' and 'crowding_distance' metadata.

required

Returns:

Type Description
list[Individual]

List of selected individuals.

NSGA3TournamentSelection(tournament_size=2, exploration_prob=0.1, random_state=None)

Bases: GESelectionStrategy

Tournament selection using NSGA-III rank-based selection.

This selection strategy implements the rank-based selection component of NSGA-III, a many-objective evolutionary algorithm designed for problems involving four or more objective functions.

Selection is performed using Pareto rank only. When multiple individuals share the same rank, selection among them is performed randomly to preserve diversity. NSGA-III relies on reference point-based niching rather than crowding distance.

rank_only_operator(tournament)

Selects an individual based on Pareto rank only. First finds the individuals with the best (lowest) rank, then randomly selects one from among them to maintain diversity.

Parameters:

Name Type Description Default
tournament list[Individual]

The individuals competing in the tournament.

required

Returns:

Type Description
Individual

A randomly selected individual from those with the best rank.

select(population_size, individuals)

Selects individuals using NSGA-III tournament selection.

For each tournament a random individual is chosen with probability exploration_prob, otherwise, the rank-only operator selects from the best-ranked individuals

Parameters:

Name Type Description Default
population_size int

Number of individuals to select.

required
individuals list[Individual]

List of individuals to select from. Each individual must have 'rank' metadata.

required

Returns:

Type Description
list[Individual]

List of selected individuals.

finchge.operators.crossover

OnePointCrossover(codon_size, crossover_proba, within_used=True, mode='fixed', random_state=None)

Bases: GECrossoverStrategy

One-point crossover operator.

A single crossover point is selected in each parent genome, and the genome segments after this point are exchanged to form offspring. Parent genomes are split at a randomly selected point. Offspring are constructed by concatenating prefix segments from one parent with suffix segments from the other. Supports both fixed and variable one-point crossover variants with mode parameter.

  • variable: Select one crossover point independently in each parent. This can change offspring genome lengths.

  • fixed: Select the same crossover point in both parents. This preserves offspring genome lengths.

Initialize the one-point crossover strategy.

This operator supports both fixed and variable one-point crossover:

  • "fixed": Uses the same crossover point in both parents. Offspring genome lengths are preserved. Equivalent to PonyGE-style fixed one-point crossover.

  • "variable": Uses independent crossover points in each parent. Offspring genome lengths may differ from parents. Equivalent to PonyGE-style variable one-point crossover.

Parameters:

Name Type Description Default
codon_size int

Maximum codon value (inclusive).

required
crossover_proba float

Probability that crossover is applied; otherwise, offspring are copies of the parents.

required
within_used bool

If True, crossover points are restricted to the used portion of each parent's genome (i.e., within used_codon_count). If False, the full genome length is considered.

True
mode Literal['variable', 'fixed']

Crossover mode. Either "fixed" or "variable".

'fixed'
random_state Optional[int]

Optional random seed for reproducibility.

None

cross(parent1, parent2)

Perform one-point crossover

Parameters:

Name Type Description Default
parent1 Individual

First parent.

required
parent2 Individual

Second parent.

required

Returns:

Type Description
Tuple[Individual, Individual]

Two offspring individuals.

SubtreeCrossover(crossover_proba, non_terminals, random_state=None)

Bases: GECrossoverStrategy

Subtree crossover exchanges randomly selected subtrees between two parent program trees while preserving syntactic correctness with respect to a grammar.

Two compatible non-terminal nodes are selected from parent trees, and the subtrees rooted at these nodes are swapped.

Initialize the subtree crossove.

Parameters:

Name Type Description Default
crossover_proba float

Probability of crossover occurring.

required
non_terminals list[str]

List of grammar non-terminal symbols that are valid crossover points.

required

cross(p_0, p_1)

Perform subtree crossover between two parent individuals.

If crossover does not occur, cloned trees are returned as new individuals.

Parameters:

Name Type Description Default
p_0 Individual

First parent individual.

required
p_1 Individual

Second parent individual.

required

Returns:

Type Description
Tuple[Individual, Individual]

A tuple containing two newly constructed offspring individuals.

TwoPointCrossover(codon_size, crossover_proba, within_used=True, mode='fixed', random_state=None)

Bases: GECrossoverStrategy

Two-point crossover generalizes one-point crossover by selecting two crossover points in each parent genome and exchanging the intermediate segments.

Offspring are formed by combining prefix and suffix regions from one parent with the middle region from the other parent.

Initialize the two-point crossover strategy.

This operator supports both fixed and variable two-point crossover:

  • "fixed": Uses the same two crossover points in both parents. A segment between the two points is exchanged, preserving offspring genome lengths. Equivalent to PonyGE-style fixed two-point crossover.

  • "variable": Uses independently selected crossover points in each parent. Exchanged segments may differ in length, allowing offspring genomes to grow or shrink. Equivalent to PonyGE-style variable two-point crossover.

Parameters:

Name Type Description Default
codon_size int

Maximum codon value (inclusive).

required
crossover_proba float

Probability that crossover is applied; otherwise, offspring are copies of the parents.

required
within_used bool

If True, crossover points are restricted to the used portion of each parent's genome (i.e., within used_codon_count). If False, the full genome length is considered.

True
mode Literal['fixed', 'variable']

Crossover mode. Either "fixed" or "variable".

'fixed'
random_state Optional[int]

Optional random seed for reproducibility.

None

cross(parent1, parent2)

Perform two-point crossover.

Parameters:

Name Type Description Default
parent1 Individual

First parent.

required
parent2 Individual

Second parent.

required

Returns:

Type Description
Tuple[Individual, Individual]

Two offspring individuals.

UniformCrossover(crossover_proba, within_used=True, random_state=None)

Bases: GECrossoverStrategy

Uniform crossover operator.

Uniform crossover recombines parent genomes by independently swapping genes between parents at each position with fixed probability.

Instead of using fixed crossover points, uniform crossover applies a binary mixing mask across the genome, providing maximal mixing of parental genetic material.

Initialize uniform crossover

Parameters:

Name Type Description Default
crossover_proba float

Probability of crossover occurring.

required
within_used bool

If True, crossover is limited to the used portion of each parent's genome.

True

cross(parent1, parent2)

Perform uniform crossover.

Parameters:

Name Type Description Default
parent1 Individual

First parent.

required
parent2 Individual

Second parent.

required

Returns:

Type Description
Tuple[Individual, Individual]

Two offspring individuals.

finchge.operators.mutation

CyclicMutation(mutation_probability, segment_size=3, random_state=None)

Bases: GEMutationStrategy

Cyclic mutation strategy.

Rotates fixed-size contiguous segments of the genome with a per-gene mutation probability.

Initialize the cyclic mutation strategy.

Parameters:

Name Type Description Default
mutation_probability float

Probability in [0.0, 1.0] that a given position triggers a cyclic rotation.

required
segment_size int

Size of the segment to rotate.

3

Raises:

Type Description
ValueError

If mutation_probability is not in [0.0, 1.0].

ValueError

If segment_size is less than 2.

mutate(individual)

Apply cyclic mutation to an individual.

Each genome position is independently selected with mutation_probability. When selected, the contiguous segment starting at that position is rotated by one step.

Parameters:

Name Type Description Default
individual Individual

Individual to mutate.

required

Returns:

Type Description
Individual

A new Individual with a mutated genotype.

DuplicationMutation(mutation_probability, segment_size=2, random_state=None)

Bases: GEMutationStrategy

Duplication mutation strategy.

With a given probability, duplicates a contiguous segment of fixed size and copies it into another non-overlapping location in the genome.

Initialize the duplication mutation strategy.

Parameters:

Name Type Description Default
mutation_probability float

Probability in [0.0, 1.0] of applying the duplication mutation.

required
segment_size int

Length of the segment to duplicate, must be > 0.

2

Raises:

Type Description
ValueError

If mutation_probability is not in [0.0, 1.0].

ValueError

If segment_size is not a positive integer.

mutate(individual)

Apply duplication mutation to an individual.

A segment of length segment_size is copied from one position and overwrites another non-overlapping segment with probability mutation_probability.

Parameters:

Name Type Description Default
individual Individual

Individual to mutate.

required

Returns:

Type Description
Individual

A new Individual with a mutated genotype.

GaussianMutation(mutation_probability, std_dev=1.0, random_state=None)

Bases: GEMutationStrategy

Gaussian noise mutation strategy.

Adds Gaussian noise to selected genes and clamps results to non-negative integers.

Initialize the Gaussian mutation strategy.

Parameters:

Name Type Description Default
mutation_probability float

Probability in [0.0, 1.0] that a given gene is mutated.

required
std_dev float

Standard deviation of the Gaussian noise.

1.0

Raises:

Type Description
ValueError

If mutation_probability is not in [0.0, 1.0].

mutate(individual)

Apply Gaussian mutation to an individual.

Each gene is independently selected with mutation_probability. Selected genes receive Gaussian noise, are rounded to integers, and clamped to be non-negative.

Parameters:

Name Type Description Default
individual Individual

Individual to mutate.

required

Returns:

Type Description
Individual

A new Individual with a mutated genotype.

IntFlipMutation(mutation_probability, codon_size, mode='per_codon', mutation_events=1, within_used=True, random_state=None)

Bases: GEMutationStrategy

Integer flip mutation strategy.

Parameters:

Name Type Description Default
mutation_probability float

Probability of mutation per gene.

required
codon_size int

Maximum codon value (inclusive).

required

mutate(individual)

Perform integer flip mutation on an Individual.

Parameters:

Name Type Description Default
individual Individual

Individual to apply mutation on.

required

Returns:

Name Type Description
Individual Individual

Mutated Individual.

InversionMutation(segment_probability, random_state=None)

Bases: GEMutationStrategy

Inversion mutation strategy.

With a given probability, selects a random contiguous segment of the genome and reverses its order.

Initialize the inversion mutation strategy.

Parameters:

Name Type Description Default
segment_probability float

Probability in [0.0, 1.0] of applying the inversion mutation.

required

Raises:

Type Description
ValueError

If segment_probability is not in [0.0, 1.0].

mutate(individual)

Apply inversion mutation to an individual.

With probability segment_probability, a random contiguous segment [start:end) of the genome is reversed.

Parameters:

Name Type Description Default
individual Individual

Individual to mutate.

required

Returns:

Type Description
Individual

A new Individual with a mutated genotype.

MultipleMutation(strategies, probabilities=None, random_state=None)

Bases: GEMutationStrategy

Multiple mutation strategies combiner.

Randomly selects and applies one mutation strategy according to provided probabilities.

Initialize the multiple mutation strategy.

Parameters:

Name Type Description Default
strategies list[GEMutationStrategy]

List of mutation strategies to choose from.

required
probabilities Optional[list[float]]

Optional selection probabilities for each strategy. If None, all strategies are selected uniformly.

None

Raises:

Type Description
ValueError

If probabilities length does not match strategies or if probabilities sum to zero.

mutate(individual)

Apply a randomly selected mutation strategy.

One mutation strategy is selected according to the configured probabilities and applied to the individual.

Parameters:

Name Type Description Default
individual Individual

Individual to mutate.

required

Returns:

Type Description
Individual

A mutated Individual.

SubtreeMutation(mutation_probability, non_terminals, tree_generator, mutation_max_depth, mutation_events=1, random_state=None)

Bases: GEMutationStrategy

Subtree mutation replaces randomly selected subtrees within a program tree with newly generated random subtrees. Mutation targets nodes representing non-terminal grammar symbols. The subtree rooted at the selected node is replaced with a newly generated subtree that satisfies grammar constraints. Subtree mutation is a core mutation mechanism in genetic programming and was introduced alongside subtree crossover in early GP research.

Characteristics:

  • Maintains syntactic validity of individuals.
  • Introduces structural novelty.
  • Controls search-space exploration via subtree depth limits.

mutate(individual)

Mutate an individual using subtree mutation.

Parameters:

Name Type Description Default
individual Individual

Parent individual.

required

Returns:

Type Description
Individual

A newly constructed mutated individual.

SwapMutation(mutation_probability, random_state=None)

Bases: GEMutationStrategy

Swap mutation strategy. Randomly swaps pairs of genes in the genotype based on a per gene mutation probability.

Initialize the swap mutation strategy.

Parameters:

Name Type Description Default
mutation_probability float

Probability in [0.0, 1.0] that a given gene position is selected for swapping.

required

Raises:

Type Description
ValueError

If mutation_probability is not in [0.0, 1.0].

mutate(individual)

Apply swap mutation to an individual.

Each gene position is independently selected with mutation_probability. Selected positions are randomly paired and their values swapped.

Parameters:

Name Type Description Default
individual Individual

Individual to mutate.

required

Returns:

Type Description
Individual

A new Individual with a mutated genotype.

finchge.operators.replacement

GenerationalReplacement(max_best, random_state=None)

Bases: GEReplacementStrategy

Elitist generational replacement strategy for single-objective optimization.

This replacement strategy implements classic generational replacement used in standard genetic algorithms. At each generation, the entire population is replaced by newly generated offspring, while preserving a fixed number of elite individuals from the previous population.

Individuals are ranked using a scalar fitness value, and elites are selected based on either maximization or minimization of that fitness.

Parameters:

Name Type Description Default
max_best bool

If True, higher fitness values are considered better (maximization). If False, lower fitness values are considered better (minimization).

required

Methods:

Name Description
replace

Replaces the old population with a new population while preserving elite individuals from the previous generation.

Notes
  • Assumes that each individual's fitness attribute is a scalar value.
  • Elites are selected exclusively from the old population.
  • Fitness values must already be evaluated prior to replacement.
See Also
  • SteadyStateReplacement
  • RandomElitistReplacement
  • NSGA2ElitistReplacement

replace(new_population, old_population, elite_size, population_size)

Replaces old population with new population, preserving elite individuals.

Parameters:

Name Type Description Default
new_population list[Individual]

List of new individuals

required
old_population list[Individual]

List of current individuals

required
elite_size int

Number of elite individuals to preserve

required
population_size int

Target size of the population

required

Returns:

Name Type Description
list list[Individual]

New population with elites

NSGA2ElitistReplacement(maximize_flags, random_state=None)

Bases: GEReplacementStrategy

NSGA-II elitist replacement strategy using Pareto rank and crowding distance.

This replacement strategy implements the environmental selection step of NSGA-II. It preserves elite individuals from the previous population and fills the remaining population by selecting individuals from successive Pareto fronts, prioritizing diversity via crowding distance.

The strategy assumes that: - Non-dominated sorting has already been performed. - Each individual has a valid Pareto rank stored in meta["rank"]. - Crowding distance is computed per front using NSGA-II rules.

This class is not compatible with NSGA-III, as it relies on crowding distance for diversity preservation. For NSGA-III, reference-point-based environmental selection must be used instead.

Parameters:

Name Type Description Default
maximize_flags list[bool]

List indicating whether each objective should be maximized (True) or minimized (False). This is used to perform non-dominated sorting when combining parent and offspring populations.

required

Methods:

Name Description
replace

Performs elitist environmental selection according to NSGA-II.

See Also
  • fast_non_dominated_sort
  • calculate_crowding_distance
  • NSGA3Replacement

replace(new_population, old_population, elite_size, population_size)

Replacement strategy that preserves elite solutions and maintains diversity.

RandomElitistReplacement(max_best, random_state=None)

Bases: GEReplacementStrategy

Random replacement strategy with elitism for single-objective optimization.

This replacement strategy preserves a fixed number of elite individuals from the current population and fills the remaining population slots by randomly sampling from the non-elite individuals of the old population and the newly generated individuals.

Elites are selected based on a scalar fitness value, using either maximization or minimization as specified at construction time. Diversity is introduced through random sampling rather than explicit distance or dominance measures.

Parameters:

Name Type Description Default
max_best bool

If True, higher fitness values are considered better (maximization). If False, lower fitness values are considered better (minimization).

required

Methods:

Name Description
replace

Preserves elite individuals and randomly selects remaining individuals to form the next generation.

Raises:

Type Description
ValueError

If elite_size is invalid.

ValueError

If any individual does not have a single-objective fitness representation (fitness list length != 1).

ValueError

If there are not enough eligible individuals to fill the population.

Notes
  • Assumes that fitness values are evaluated prior to replacement.
  • Does not use Pareto dominance, crowding distance, or reference-point niching.
  • Random sampling is performed without replacement.
See Also
  • GenerationalReplacement
  • SteadyStateReplacement
  • NSGA2ElitistReplacement

SteadyStateReplacement(max_best, random_state=None)

Bases: GEReplacementStrategy

Steady-state replacement strategy for single-objective optimization.

This replacement strategy implements a steady-state evolutionary model, where only a subset of individuals is replaced at each generation rather than replacing the entire population. A fixed number of elite individuals from the current population are preserved, while the remaining slots are filled with the best individuals from the newly generated population.

Selection is based on a scalar fitness value, using either maximization or minimization as specified at construction time.

Parameters:

Name Type Description Default
max_best bool

If True, higher fitness values are considered better (maximization). If False, lower fitness values are considered better (minimization).

required

Methods:

Name Description
replace

Preserves elite individuals from the old population and replaces the remaining individuals with the best-performing offspring.

Notes
  • Assumes that each individual's fitness attribute is a scalar value.
  • Fitness values must be evaluated prior to replacement.
  • Elites are selected exclusively from the old population.
  • The number of preserved individuals is controlled by elite_size.
See Also
  • GenerationalReplacement
  • RandomElitistReplacement
  • NSGA2ElitistReplacement

replace(new_population, old_population, elite_size, population_size)

Replaces worst individuals in old population with new individuals. Preserves elite_size best individuals.

finchge.config

FinchConfig(data)

Configuration manager for Grammar Evolution (GE) utils.

FinchConfig provides an interface for loading, accessing, and modifying GE configuration from YAML and INI files.

Can load conifg as dictionaries (dict[str, dict[str, Any]]) through a construtor of from_file. Supports automatic file discovery and typed access to configuration sections. When working in a single proplem project with one configuration, FinchConfig can detect the file automatiaally. Although automatic config detection is handy during quick tasks such as checking grammar and other components. For more organized utils it is recommended to provide the actual file. As this class may silently pick the config file from current working directory if available any.

Attributes:

Name Type Description
_data

Internal storage for configuration data as nested dictionaries.

Initializes a FinchConfig instance with configuration data.

Parameters:

Name Type Description Default
data dict[str, dict[str, Any]]

Nested dictionary where outer keys are section names and inner dictionaries contain section key-value pairs. Example:

Text Only
    {"eperiment": {"start_symbol": "<expr>"}, ...}

required
Note

Prefer using the class methods from_file, from_yaml, or from_ini to create instances rather than calling this constructor directly.

experiment property

Retrieves the experiment configuration section. Returns: Dictionary of experiment configuration parameters.

ge property

Retrieves the grammatical evolution configuration section.

Typically contains
  • start_symbol: The starting non-terminal symbol
  • rules: Grammar production rules
  • max_depth: Maximum derivation depth
  • crossover_rate: Probability of applying crossover
  • mutation_rate: Probability of applying mutation
  • selection_method: Selection strategy (tournament, roulette, etc.)
  • tournament_size: If using tournament selection
  • other related parameters

Returns:

Type Description
dict[str, Any]

Dictionary of grammar configuration parameters.

parallel property

Retrieves the parallel configuration section.

Returns:

Type Description
dict[str, Any]

Dictionary of experiment configuration parameters.

copy(update=None)

Creates a deep copy of the configuration with optional updates.

Parameters:

Name Type Description Default
update dict[str, dict[str, Any]] | None

Optional dictionary of updates to apply to the copy. Format: {section_name: {key: value, ...}, ...} If a section doesn't exist, it will be created.

None

Returns:

Type Description
FinchConfig

A new FinchConfig instance with the copied (and optionally updated) data.

Example
Text Only
>>> new_config = config.copy({
...     "experiment": {"num_generations": 500},
...     "new_section": {"key": "value"}
... })

from_dict(data) classmethod

Loads dict as FinchConfig. Internally finchGE uses FinchConfig class. Args: data (dict[str, Any]): configuration as a dictionary

Returns: FinchConfig instance

from_file(path=None) classmethod

Creates a FinchConfig instance from a configuration file.

Automatically detects file format based on extension and loads the configuration. If no path is provided, searches for common config filenames in the current directory.

If the config files follow expected name (ge_config) and are in current working directory, they can be automatically detected. Othewise they have to be passed as path argument.

Parameters:

Name Type Description Default
path str | None

Optional path to configuration file. If None, searches for 'ge_config.yaml', 'ge_config.yml', or 'ge_config.ini' in the current directory.

None

Returns:

Type Description
FinchConfig

FinchConfig instance loaded with configuration data.

Raises:

Type Description
FileNotFoundError

If no configuration file is found.

RuntimeError

If multiple configuration files are found (when path is None).

ValueError

If the file format is unsupported or the file content is invalid.

from_ini(path) classmethod

Creates a FinchConfig instance from an INI configuration file.

Parses INI sections and converts string values to appropriate Python types using ast.literal_eval. Values that cannot be evaluated remain as strings.

Parameters:

Name Type Description Default
path str

Path to the INI configuration file.

required

Returns:

Type Description
FinchConfig

FinchConfig instance with parsed configuration data.

Raises:

Type Description
FileNotFoundError

If the INI file does not exist.

Error

If the INI file is malformed.

Example INI format
Text Only
[grammar]
start_symbol = "<expr>"
max_depth = 10

[initialisation]
population_size = 100
method = "ramped_half_and_half"

from_yaml(path) classmethod

Creates a FinchConfig instance from a YAML configuration file.

Parameters:

Name Type Description Default
path str

Path to the YAML configuration file.

required

Returns:

Type Description
FinchConfig

FinchConfig instance with parsed configuration data.

Raises:

Type Description
FileNotFoundError

If the YAML file does not exist.

YAMLError

If the YAML file is malformed.

ValueError

If the top-level YAML structure is not a dictionary.

Example YAML format:

Text Only
    grammar:
        start_symbol: "<expr>"
        max_depth: 10
    initialisation:
        population_size: 100
        method: "ramped_half_and_half"

section(name)

Retrieves a configuration section by name.

Parameters:

Name Type Description Default
name str

Name of the configuration section.

required

Returns:

Type Description
dict[str, Any]

Dictionary containing the section's key-value pairs.

dict[str, Any]

Returns an empty dictionary if the section does not exist.

to_dict()

Converts the FinchConfig instance into a dict

Returns: dictionary for the FinchConfig

to_json(indent=2)

Converts the GECOnfig instance to json Args: indent: indent for the json

Returns: json string

ConfigValidator

Defines validation rules for FinchGE configuration files.

This class contains the schema and validation logic used to verify the structure and values of a FinchGE configuration.

Attributes:

Name Type Description
MANDATORY_SECTIONS set[str]

Sections that must be present for a valid configuration.

RECOMMENDED_SECTIONS set[str]

Sections that are optional but strongly recommended.

OPTIONAL_SECTIONS set[str]

Fully optional sections.

REQUIRED_FIELDS dict[str, set[str]]

Required fields for each configuration section.

FIELD_VALIDATORS dict[str, dict[str, Callable]]

Field-level validators keyed by section and field name.

finchge.fitness

GEFitnessFunction(maximize=False, case_data_key=None)

Bases: ABC

Base class for a fitness function used in genetic algorithms.

This abstract class defines the interface for evaluating the fitness of a given phenotype. Subclasses must implement the evaluate method.

Parameters:

Name Type Description Default
maximize bool

Indicates whether the goal is to maximize (True) or minimize (False) the fitness score.

False

required_context_keys property

Declare what context keys this fitness function needs. Override this in subclasses that need more than y_pred/y_true.

evaluate(context) abstractmethod

Evaluate the fitness score using the provided context.

Parameters:

Name Type Description Default
context EvaluationContext

EvaluationContext Typeddict containing evaluation inputs, such as 'y_pred', 'y_val', 'x_val', etc.

required

Returns:

Name Type Description
Fitness Fitness

The computed fitness.

AccuracyFitness()

Bases: GEFitnessFunction

Fitness function that evaluates model accuracy on a validation set.

This metric computes the classification accuracy between the predicted labels and the ground-truth labels from the validation data. It is intended to be used in supervised classification tasks, and is a maximization objective.

Methods:

Name Description
evaluate

Computes accuracy using 'y_pred' and 'y_true' from the context.

evaluate(context)

Evaluates the accuracy of the model's predictions.

Parameters:

Name Type Description Default
context EvaluationContext

EvaluationContext Typeddict containing evaluation inputs. Must include: - 'y_pred': Predicted labels from the model (numpy array or array-like). - 'y_test': True labels for the test set (numpy array or array-like).

required

Returns:

Name Type Description
float Fitness

Accuracy score between 0.0 and 1.0

Raises:

Type Description
ValueError

If y_test and y_pred have different shapes or are empty

MAEFitness()

Bases: GEFitnessFunction

Mean Absolute Error fitness for regression problems. Lower values indicate better fit (minimization objective).

evaluate(context)

Calculate Mean Absolute Error between true and predicted values.

Parameters:

Name Type Description Default
context dict[str, Any]

Must contain: - 'y_true': Ground truth values (array-like) - 'y_pred': Predicted values (array-like)

required

MSEFitness()

RMSEFitness()

CrossEntropyFitness(eps=1e-12)

Bases: GEFitnessFunction

CrossEntropy (log loss) fitness for classification.

Lower values are better (minimize). Requires predicted probabilities.

HingeLossFitness()

Bases: GEFitnessFunction

Mean Hinge Loss fitness for binary classification problems. This fitness evaluates how well a model separates two classes by measuring the margin between predictions and true labels. Hinge loss is calculated as hinge_i = max(0, 1 - y_true_i * y_pred_score_i) Should be used only for margin-based classifiers. The ground truth should be encoded between -1 and 1 , not 0/1. Similarly, the prediction scores should be raw model outputs (real-valued scores), representing the model’s confidence or margin, not class labels. The overall fitness value is the mean hinge loss across all samples.

evaluate(context)

Parameters:

Name Type Description Default
context dict[str, Any]

Context containing y_true labels encoded as -1 or +1, and y_pred_score with raw model outputs.

required

Returns:

Type Description
Fitness

Fitness value for the hinge loss.

RewardFitness(maximize=True, optimal_fitness=None)

Bases: GEFitnessFunction

Fitness function for control problems with reward maximization. Simply returns the reward (higher is better).

Initialize reward fitness.

Parameters:

Name Type Description Default
maximize bool

Whether to maximize (True) or minimize (False)

True
optimal_fitness Optional[float]

Known optimal fitness value

None

evaluate(context)

Evaluate fitness from context.

Expected context keys

y_pred: Array of rewards from runner

Parameters:

Name Type Description Default
context dict[str, Any]

context with y_pred

required

Returns:

Type Description
Fitness

Total reward (sum across episodes)

StringMatchFitness(target)

finchge.grammar.repair_strategy

RepairStrategy

Bases: ABC

Abstract base class for phenotype repair strategies.

repair(phenotype) abstractmethod

Repair the given phenotype and return repaired phenotype

finchge.core.GrammaticalEvolution(fitness_evaluator, grammar=None, config=None, initialiser=None, algorithm=None, expt_logger=None, checkpoint_manager=None, random_state=None)

Bases: RandomStateMixin

Grammatical evolution class for running the evolution. This is the evolution controller class responsible for running FinchGE utils. This class is responsible for wiring up different components. This includes loading experiment configurations, loading grammar, initializing the initial population, setting up the fitness evaluator, and using a genetic algorithm to run the evolution. GrammaticalEvolution class is also responsible for running the evolution loop.

Parameters:

Name Type Description Default
fitness_evaluator FitnessEvaluator

Evaluator to evaluate the fitness of individuals.

required
grammar Optional[Grammar]

BNF Grammar to be used. If not provided config must be available and must contain grammar_file value

None
config Optional[FinchConfig | dict[str, Any]]

Configuration settings for the GE algorithm.

None
initialiser GEInitialiser | GETreeInitialiser | None

Initializer class either integer based or tree based initializer

None
algorithm Optional[Any]

Evolutionary algorithm to be used e.g., GA, NSGA.

None
expt_logger BaseLogger | None

Logger used for experiment-level records.

None
checkpoint_manager CheckpointManager | None

Checkpoint manager used to save and resume runs.

None
random_state Optional[int]

Seed for reproducible randomness.

None

Note: If algorithm is not provided, GA will be used (provided that config is available)

halt(min_gens_allowed=0)

halt signal to stop generation loop : can specify minimum allowed generations, if halt signal received it will wait if min_generations are not satisfied yet, if halt signal is received after minimum generations allowed, will halt immediately

Parameters:

Name Type Description Default
min_gens_allowed int

Minimum number of generations before halt

0

run()

Finds the fittest individual ( or a pareto front for multi-objective) in the population and logs the results.

finchge.fitness.FitnessEvaluator(fitness_functions, mapper, runner=None, encode_trees=False, parallel_config=None, require_case_data=False)

Fitness evaluator is used to evaluate the fitness of the individuals. It is a wrapper for fitness function to allow convenient fitness evaluation especially for use cases where data and models are involved. For example in hyperparameter optimization or neural architecture search. The phenotype is converted to model using the 'phenotype_to_model' parameter.

Supports both model-based evaluation (e.g., using train/validation data) and phenotype-only evaluation (e.g., string matching or symbolic tasks).

Parameters:

Name Type Description Default
fitness_functions Union[GEFitnessFunction, list[GEFitnessFunction]]

One or more fitness function instances.

required
mapper GenotypeMapper

Genotype mapper.

required
runner PhenotypeRunner | None

Runner for evaluating phenotypes.

None
encode_trees bool

Whether to encode trees to integer genotypes. This is needed if genome-based operators are used after tree-based initialization.

False
parallel_config dict[str, Any] | None

Parallel section of the config.

None
require_case_data bool

Whether case-based evaluation is required, for example when using lexicase selection.

False

count_objectives()

Counts the number of objectives.

Returns:

Name Type Description
int int

Number of fitness functions (objectives).

derive_eval_seed(base_seed, phenotype, env_version)

Although we share same random state using private RNG shared in different components, It will not work for parallelized evaluation. Each process, or even each machine, will run evaluations in a different order, so random state will fail.

So, we must use some deterministic way to evaluate. One way is to use different seeds in deterministic way Same phenotype and base seed combination should result in same seed in each run.

So we derive a 32-bit seed for evaluation/training in a worker process. Same (base_seed, phenotype, env_version) will give same evaluation seed.

evaluate_individual(individual)

Evaluates a given phenotype by training the corresponding model and applying fitness functions.

Parameters:

Name Type Description Default
individual Individual

An individual configuration or representation used to generate a model.

required

Returns:

Name Type Description
list None

A list of fitness scores corresponding to each fitness function.

evaluate_population(population)

NEW: Evaluates entire population. Uses parallel execution if enabled, sequential otherwise.

Users should call this instead of manually calling evaluate().

get_env_version()

Return current environment version. To distinguish environments in adaptive environments.

get_fitness_functions()

Returns the list of fitness function instances.

Returns:

Name Type Description
list list[GEFitnessFunction]

List of fitness function objects.

get_maximize_flags()

Returns the maximize flags for each objective :return:

get_objective_names()

Retrieves human-readable names of the objectives.

Returns:

Name Type Description
list list[str]

List of objective names derived from fitness function class names.

is_multi_objective()

Determines if the evaluation is multi-objective.

Returns:

Name Type Description
bool bool

True if more than one fitness function is provided, False otherwise.

refesh_mapping(ind)

Ensure that genotype, phenotype, and tree are consistent.

Rules
  • If already mapped, do nothing.
  • If tree exists and genotype is absent:
    • reverse-map genotype when encode_trees=True
    • otherwise derive phenotype directly from tree
  • If genotype exists, map through the mapper

set_env_version(version)

Update environment version (e.g. generation, dataset update).

shutdown() async

Cleanup parallel resources

finchge.utils.cache

CacheManager(*, cache_type='lru', experiment_id=None, evaluator_namespace=None, **kwargs)

Bases: Generic[V]

Central cache manager.

Responsibilities: - key construction (hashing, namespacing) - backend selection (LRU / disk ) - fitness cache access

finchge.utils.logger

BaseLogger

Bases: ABC

Interface for experiment tracking. Observer Pattern

on_generation_end(generation, population, best=None, pareto_front=None, fitness_stats=None) abstractmethod

Called at the end of each generation.

on_run_end(result) abstractmethod

Called when a run ends.

on_run_start(log_dir, obj_names, config) abstractmethod

Called when a run starts.

ExperimentLogger(exclude=None, compress_genotypes=False, log_population_samples=False, sample_size=5, custom_log_hook=None)

Bases: BaseLogger

Experiment logger

Logs essential data for post-hoc analysis. This provides minimal essential logs only. To add more logs, this class can be extended or wrapped to add custom metrics and analysis.

on_generation_end(generation, population, best=None, pareto_front=None, fitness_stats=None, extra_data=None)

Log data at the end of each generation.

Parameters:

Name Type Description Default
extra_data Optional[dict[str, Any]]

Additional user-provided data to log

None

on_run_end(result)

Finalize logging at the end of the run.

on_run_start(log_dir, obj_names, config, algorithm=None)

Initialize logging when the experiment run starts.

Parameters:

Name Type Description Default
algorithm Optional[Any]

Optional reference to algorithm instance for custom logging hooks

None

setup_logging(logger_id='finch', group_name=None, verbose=False)

Setup logging for a specific project instance.

get_log_dir(logger_id='finch')

summary Retrieve the log directory for a given project instance.

Parameters:

Name Type Description Default
logger_id str

description. Defaults to "finch".

'finch'

Raises:

Type Description
KeyError

description

Returns:

Name Type Description
str str

description

finchge.utils.random_mixin

RandomStateMixin(*args, random_state=None, **kwargs)

Mixin to add random state control All classes using randomness either using python random or numpy random should extend from RandomMixin. And, should provide random_state parmeter in the constructor, Otherwise, the determinism is not guaranteed across runs.

__getstate__()

Called when pickling. Removes unpicklable RNGs and stores only the seed.

__setstate__(state)

Called when unpickling. Restores object state and reinitializes RNGs.

get_seed_info()

Get information about the seed for logging

finchge.utils.results

ResultHelper(project_id='finch')

StatsHelper

Utility class for computing population-level fitness statistics.

compute_fitness_stats(*, individuals, objective_names=None) staticmethod

Compute fitness statistics for a population.

Parameters:

Name Type Description Default
individuals list[Individual]

List of evaluated Individuals.

required
objective_names Optional[list[str]]

Optional list of objective names. If None, objectives will be indexed numerically.

None

Returns:

Type Description
list[Dict[str, Any]]

A list of dictionaries, one per objective, containing:

list[Dict[str, Any]]
  • fitness_name
list[Dict[str, Any]]
  • min
list[Dict[str, Any]]
  • max
list[Dict[str, Any]]
  • avg
list[Dict[str, Any]]
  • std
list[Dict[str, Any]]
  • total_count
list[Dict[str, Any]]
  • valid_count

finchge.runners

PhenotypeRunner(random_state=None)

Bases: RandomStateMixin, ABC

Base class for all phenotype runners.

Every runner must implement run() which returns (predictions, targets) as numpy arrays.

get_metadata()

Optional metadata about the executor/problem.

Returns:

Type Description
dict[str, Any]

Dictionary with metadata (can be empty)

run(phenotype, context_hints=None) abstractmethod

Run phenotype and return context dictionary.

Parameters:

Name Type Description Default
phenotype str

Program to evaluate

required
context_hints Optional[set[str]]

What context keys the fitness functions need. None means assume minimal (y_pred/y_true only).

None

Returns:

Type Description
dict[str, Any]

Dictionary with at least 'y_pred' and 'y_true', plus any hinted keys.

SymbolicRegressionRunner(data_train, data_val=None, data_test=None, random_state=None)

Bases: DirectEvalRunner

Runner for symbolic regression.

The phenotype is interpreted as a symbolic expression and evaluated directly on the active evaluation split.

ControlRunner(env_factory, n_episodes=1, optimal_fitness=None, random_state=None)

Bases: PhenotypeRunner, ABC, Generic[T]

Base class for control problem runners.

Type parameter T represents the environment type.

Initialize control runner.

Parameters:

Name Type Description Default
env_factory Callable[[], T]

Function that creates new environment instances

required
n_episodes int

Number of episodes to run per evaluation

1
optimal_fitness Optional[float]

Known optimal fitness value

None

get_metadata()

Return metadata about the control problem.

run(phenotype, context_hints=None)

Run multiple episodes and return rewards.

LogicRunner(X, y, interpreter=None, random_state=None)

Bases: PhenotypeRunner

Runner for logic problems like multiplexer.

Evaluates phenotype on all truth table combinations. Returns predictions and true values for fitness calculation.

run(phenotype, context_hints=None)

Evaluate phenotype on all input combinations.

Parameters:

Name Type Description Default
context_hints Optional[set[str]]

What context keys the fitness functions need. None means assume minimal (y_pred/y_true only).

None
phenotype str

Logic program string

required

Returns:

Type Description
dict[str, Any]

dict with context having y_pred, y_true and other context info as requested in context_hints argument

MLModelRunner(data_train, data_val, model_parser, data_test=None, random_state=None)

Bases: TrainEvalRunner

Runner for problems where phenotype builds and trains a model. This runner can be used in problems such as hyperparameter optimization and architecture search etc.

The phenotype is parsed into a trainable model or configuration, fitted on the training split, and evaluated on the active evaluation split.

finchge.utils.checkpoint

CheckpointManager

Bases: Protocol

Checkpoint manager interface.

exists()

Return True if at least one checkpoint exists.

load_latest(*, expected_config_hash=None)

Load and return the latest checkpoint.

save(*, generation, population, algorithm, config, py_rng_state, np_rng_state)

Persist a checkpoint and return the path written.

should_save(generation)

Return True if this generation should be checkpointed. Based on 'every' parameter's value, decide whether to save or not.

CheckpointState(version, config_hash, generation, population, algorithm, rng_state) dataclass

All state required to resume a GE run deterministically.

FileCheckpointManager(directory, *, every=10, keep_last=5, filename_prefix='checkpoint_gen_')

Bases: CheckpointManager

Filesystem checkpoint manager using pickle. Population and Algorithm must be pickle-serializable. RNG state is stored to guarantee deterministic resume. Files are written as: checkpoint_gen_[GEN].pkl

load_latest(*, expected_config_hash=None)

Load latest checkpoint and verify config hash.

should_save(generation)

Return True if this generation should be checkpointed. Based on 'every' parameter's value, decide whether to save or not.

RNGState(py_state, np_state) dataclass

Serializable RNG states for deterministic resume.

stable_config_hash(config)

Create a hash of config for checkpoint compatibility checks.

finchge.symbolic

SymbolicExpression(expression)

Parse and evaluate a restricted symbolic expression.

The class provides a small, controlled expression language intended for symbolic regression and genetic programming. Expressions are parsed using Python's AST and evaluated using protected numerical operators to avoid NaN/Inf where possible.

Supported features
  • Variables of the form x0, x1, x2, ...
  • Basic arithmetic operations (+, -, , /, *)
  • Selected mathematical functions (sin, cos, exp, log, etc.)
  • Constant expressions
  • Optional normalization of NumPy-style column syntax (e.g. x[:, 0] → x0)
  • Simple structural metrics (node count, depth)
Notes

This is not a full symbolic algebra system. Expressions are evaluated numerically and are not automatically simplified.

Attributes:

Name Type Description
original_expression str

The input expression string as provided by the user.

expression str

Normalized expression string after preprocessing (e.g. slice syntax rewritten).

variables list[str]

Sorted list of variable names used in the expression (e.g. ['x0', 'x2']).

_tree Expression

Parsed AST representation of the expression.

Raises:

Type Description
ExpressionSyntaxError

If the expression contains unsupported syntax.

ExpressionEvaluationError

If evaluation fails at runtime.

eval(X)

Evaluate the expression on X.

Parameters

X: Input data. Expected shape is (n_samples, n_features). A scalar is treated as shape (1, 1). A 1D array is treated as a single sample with multiple features.

Returns

np.ndarray Shape (n_samples,).

GERegressor(grammar, config, fitness_functions, generations=100, population_size=100, random_state=None)

Bases: RandomStateMixin, BaseEstimator, RegressorMixin

Scikit-learn compatible symbolic regression model based on Grammatical Evolution (GE).

The estimator evolves mathematical expressions defined by a grammar and selects the best-performing individual according to the provided fitness function(s). The resulting symbolic expression can then be evaluated to make predictions.

Attributes:

Name Type Description
ge_result_ GEResult

Result object returned by the GE run.

selected_individual_ Individual | None

Individual currently selected for prediction.

selected_expression_ SymbolicExpression | None

Parsed symbolic expression corresponding to the selected individual.

Initialize the symbolic regression estimator.

Parameters:

Name Type Description Default
grammar Grammar

Grammar defining the search space of valid symbolic expressions.

required
config FinchConfig | dict[str, Any]

Configuration controlling GE behavior (operators, mutation rates, limits, etc.). A dictionary will be converted into a FinchConfig.

required
fitness_functions GEFitnessFunction | list[GEFitnessFunction]

Fitness function(s) used to evaluate candidate expressions.

required
generations int

Number of evolutionary generations. Defaults to 100.

100
population_size int

Number of individuals per generation. Defaults to 100.

100
random_state int | None

Random seed for reproducibility.

None

fit(X, y)

Run grammatical evolution to discover a symbolic expression that fits the provided data.

This launches the evolutionary search, evaluates individuals using the configured fitness function(s), and stores the resulting GE run along with the best individual (if available).

Parameters:

Name Type Description Default
X Any

Training features.

required
y Any

Target values.

required

Returns:

Name Type Description
GERegressor GERegressor

The fitted estimator.

predict(X)

Evaluate the selected symbolic expression on the input data.

Parameters:

Name Type Description Default
X Any

Input features.

required

Returns:

Type Description
NDArray[float64]

NDArray[np.float64]: Predicted target values.

Raises:

Type Description
RuntimeError

If the estimator has not been fitted or no individual has been selected.

select_individual(individual)

Manually select an individual from the GE population for prediction.

Useful for multi-objective runs where no single best individual is automatically chosen.

Parameters:

Name Type Description Default
individual Individual

Individual whose phenotype will be used as the prediction expression.

required

finchge.grammar.range_handlers

ArraySliceHandler()

Bases: RangeHandler

Handles x[:, 0], x[:, 1], etc. This will be useful if we don't have many columns we can just have a grammar like: ::= x[:, 0] | x[:, 1] However, when the number of columns is higher we may need ArraySliceRangeHandler.

ArraySliceRangeHandler()

Bases: RangeHandler

Handles Array slicer with longer range. For example: x[:, 0..4] will expand to x[:, 0], x[:, 1], ..., x[:, 4]

If it's not a range and we just need to pick columns randomly we can use individual array slices.

CharClassHandler()

Bases: RangeHandler

Handles character classes: [a-z], [A-Za-z0-9_]

CharRangeHandler()

Bases: RangeHandler

Handles character ranges: 'a'..'z' or 'a'..'z' step 2

NumericRangeHandler()

Bases: RangeHandler

Handles numeric ranges: 10..50, -5..5, 10..50 step 5

RangeHandler(priority=0)

Bases: ABC

Base class for all range handlers. This is the interface for different types of range handlers. There are several range handlers currently supported by finchGE BNFGrammarParser. To add custom range support new range handler can be created and should register by calling.

parser.range_registry.register(MyCustomRangeHandler())

All range handlers must inherit from RangeHandler.

pattern abstractmethod property

Return regex pattern that matches this range type.

can_handle(token) abstractmethod

Check if this handler can process the token.

expand(token) abstractmethod

Expand the range into individual symbols.

RangeHandlerRegistry()

Registry for all range handlers with priority support.

expand_token(token)

Expand a token using the appropriate handler (checking in priority order).

get_combined_pattern()

Get a combined regex pattern for all handlers in priority order.

get_handler(token)

Find a handler that can process the token (checking in priority order).

register(handler)

Register a new handler, maintaining priority order.

SymbolicVariableRangeHandler()

Bases: RangeHandler

Handles variable ranges for evaluation .

Replaces ArraySliceRangeHandler. Example: x[0..4] expands to x0, x1, x2, x3, x4 OR x0..4 expands to x0, x1, x2, x3, x4

SymolicVariableHandler()

Bases: RangeHandler

Handles individual variables like x0, x1, etc in symbolic regression, supporting easy evaluation with SymPy like libraries.

finchge.benchmarks

Benchmark(random_state=None)

Bases: RandomStateMixin, ABC

Abstract base class for all benchmarks.

uses_data()

Check if this benchmark uses traditional train/test data.

Returns:

Type Description
bool

True if the benchmark implements _generate_data, False for control problems

finchge.benchmarks.regression

KozaQuarticBenchmark(random_state=None, train_samples=None, test_samples=None, x_range=None, train_type='uniform', test_type='grid')

Bases: Benchmark

KeijzerBenchmark(version, random_state=None, train_samples=None, test_samples=None)

Bases: Benchmark

NguyenBenchmark(version, random_state=None, train_samples=None, test_samples=None, x_range=None, noise_std=None, train_type='grid', test_type='uniform')

Bases: Benchmark

VladislavlevaBenchmark(version, random_state=None, train_samples=1024, test_samples=5000, x_range=None, noise_std=None)

Bases: Benchmark

finchge.benchmarks.logic

MultiplexerBenchmark(version, random_state=None)

Bases: Benchmark

finchge.benchmarks.control

MazeBenchmark(version='medium', max_steps=None)

Bases: Benchmark

grammar()

Return the Grammar object.

MazeEnvironment(grid, max_steps=200, step_penalty=-1.0, goal_reward=100.0)

Bases: ControlEnvironment

from_version(version='medium') classmethod

Creating env for given version

get_available_actions()

Returns only the actions that won't result in hitting a wall or going OOB.

MazeInterpreter()

Interpreter for maze navigation programs.

SantaFeTrailBenchmark(random_state=None, max_steps=None)

Bases: Benchmark

Santa Fe Trail (Artificial Ant) benchmark.

The ant must navigate a 32x32 grid with 89 food pellets along a winding trail. Fitness is the number of food pellets eaten within the time limit.

grammar_str()

Return grammar for Santa Fe Trail problem.

SantaFeEnvironment(grid, max_steps, total_food, start_pos, start_dir)

Bases: ControlEnvironment

from_default() classmethod

Static factory to create environment

get_action_space()

Return complete action space.

get_available_actions()

Return available actions (all actions always available).

SantaFeInterpreter()

Interpreter for Santa Fe ant programs using cleaner grammar.

evaluate(tokens, observation, pc=0)

Evaluate tokenized program.

Parameters:

Name Type Description Default
tokens List[str]

List of tokens from tokenize()

required
observation bool

Current food-ahead sensor value

required
pc int

Program counter (index into tokens)

0

Returns:

Type Description
Tuple[Optional[str], int]

Tuple of (action, new_pc)

program_to_string(tokens)

Convert tokens back to readable program string. Tokens separated by space

tokenize(program)

Convert program string to list of tokens.

CartPoleBenchmark(random_state=None, max_steps=None, n_episodes=1)

Bases: Benchmark

grammar()

Return the Grammar object.

CartPoleEnvironment(max_steps, gravity, mass_cart, mass_pole, length, force_mag, tau, theta_threshold_degrees, x_threshold, random_state=None)

Bases: ControlEnvironment

CartPoleInterpreter()

Interpreter for cartpole balancing programs.