API Reference
The finchGE API consists of:
finchge.core.Population(initialiser, population_size)
Container for Individuals in an evolutionary run.
Population is representation-agnostic and does not perform genotype, tree, or phenotype initialisation logic.
Create a Population.
Exactly one of individuals or initialiser must be provided.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
initialiser
|
Optional[GEInitialiser]
|
Initializer used to generate individuals. |
required |
population_size
|
Optional[int]
|
Number of individuals to generate (required if initialiser is used). |
required |
from_individuals(individuals, population_size)
classmethod
Construct a Population directly from a list of Individuals.
This is used when offspring are produced by crossover/mutation strategies that already return fully-formed Individuals.
finchge.core.Individual(*, genotype=None, phenotype=None, used_genome=None, used_codon_count=0, invalid=False, tree=None)
Represents a single individual in an evolutionary algorithm.
An Individual encapsulates the genetic representation (genotype), its expressed form (phenotype), and metadata produced during evaluation and selection. Fitness values are not assigned at construction time and must be computed by a FitnessEvaluator.
Algorithm-specific information (e.g., Pareto rank, crowding distance, or
reference-point association) is stored in the meta dictionary and is expected
to be populated and cleared by the evolutionary algorithm.
Attributes:
| Name | Type | Description |
|---|---|---|
genotype |
Optional[list[int]]
|
Genetic representation of the individual. A list of integers or None if not yet initialized or derived from a tree-based initialiser. |
phenotype |
str
|
Phenotypic representation derived from the genotype via grammar-based mapping. Empty until mapping occurs. |
used_genome |
Optional[list[int]]
|
Portion of the genotype consumed during genotype-to-phenotype mapping. Set during evaluation. |
used_codon_count |
Optional[int]
|
Number of codons consumed during mapping. |
invalid |
bool
|
Indicates whether the individual is invalid (e.g., mapping failure). Invalid individuals should not participate in fitness-based selection. |
tree |
Optional[str]
|
Serialized derivation tree (e.g., JSON) produced during mapping or tree-based initialisation. |
fitness |
list[float]
|
Fitness values assigned during evaluation. Empty until evaluated. Supports both single-objective and multi-objective optimization. |
meta |
Dict[str, Any]
|
Algorithm-specific metadata managed by the evolutionary algorithm, such as Pareto rank, crowding distance, or dominance information. |
Initialize an Individual.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
genotype
|
Optional[list[int]]
|
Genetic representation as a list of integers, or None if the individual is initialized from a tree or will be derived later. |
None
|
phenotype
|
str
|
Phenotypic representation of the individual. Defaults to an empty string and is populated during mapping. |
None
|
used_genome
|
Optional[list[int]]
|
Portion of the genotype actually consumed during mapping. Populated during evaluation. |
None
|
used_codon_count
|
Optional[int]
|
Number of codons consumed during genotype-to-phenotype mapping. |
0
|
invalid
|
bool
|
Whether the individual is invalid (e.g., mapping failure). |
False
|
tree
|
Optional[str]
|
Optional serialized derivation tree (e.g., JSON) associated with this individual. |
None
|
clone()
Create a deep copy of this individual.
Returns:
| Type | Description |
|---|---|
Individual
|
A new Individual with copied genotype, phenotype, fitness, and metadata. |
from_genotype(genotype)
classmethod
Construct an Individual from a genotype.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
genotype
|
list[int]
|
Genetic representation as a list of integers. |
required |
Returns: A new Individual initialized with the given genotype.
from_tree(tree)
classmethod
Construct an Individual from a derivation tree.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tree
|
str
|
Root of the derivation tree used to initialize the individual. |
required |
Returns:
| Type | Description |
|---|---|
Individual
|
A new Individual initialized with the given tree representation. |
get_meta(key, expected_type=None)
Retrieve algorithm-specific metadata stored on the individual.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
key
|
str
|
Metadata key to retrieve. |
required |
expected_type
|
Type[T] | None
|
Expected type of the metadata value if being strict. |
None
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If the key is missing or the value does not match the expected type. |
Returns:
| Type | Description |
|---|---|
Any
|
The metadata value associated with the given key. |
mutated(mutation_strategy)
Apply a mutation strategy to this individual.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
mutation_strategy
|
GEMutationStrategy
|
Mutation operator used to produce a mutated individual. |
required |
Returns:
| Type | Description |
|---|---|
Individual
|
A new Individual representing the mutated offspring. |
sort_key(maximize)
Return a scalar key suitable for sorting individuals by fitness. Invalid or unevaluable individuals are always pushed to the end for minimization use +inf and for maximization: use -inf
finchge.grammar.Grammar(grammar_str, parser=None)
Represents a grammar and provides methods for mapping genotypes to phenotypes using grammatical evolution.
This class utilizes GrammarParser to parse the grammar (typically, Backus-Naur Form (BNF))
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
grammar_str
|
str
|
The grammar definition as a BNF-formatted string. |
required |
parser
|
Optional[GrammarParser]
|
GrammarParser will be used if not provided |
None
|
analyze()
Analyze grammar structure for initialisation and diagnostics. Computes recursion, min/max derivation depth, arity, and termination feasibility. Results are stored on Rule objects.
References
[1] Conor Ryan, J. J. Collins, and Michael O'Neill. 1998. Grammatical Evolution: Evolving Programs for an Arbitrary Language. In Proceedings of the First European Workshop on Genetic Programming (EuroGP '98). Springer-Verlag, Berlin, Heidelberg, 83-96.
[2] O’Neill, M. and Ryan, C. (2001,) Grammatical Evolution. IEEE Transactions on Evolutionary Computation, 5, 349-358. https://doi.org/10.1109/4235.942529
compute_min_ramp(*, population_size, max_init_depth)
Compute minimum ramp depth.
Ramping should start at the first depth where the grammar can generate enough distinct derivations to support population diversity across the requested ramp range.
compute_permutations_by_depth(max_depth)
Estimate the number of distinct derivation trees available at each exact depth.
The internal count is cumulative up to a depth budget. The stored permutation count subtracts the previous depth budget so each entry represents trees whose maximum depth is exactly that depth.
describe(expanded=True)
Returns summary information about the grammar, including rule counts and structure. Set expanded=False to display original contracted versions like [a-z] for range if the grammar uses that syntax Default setting is to display expanded version.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
expanded
|
bool
|
flag whether to show expanded version True by default. |
True
|
Returns:
| Name | Type | Description |
|---|---|---|
str |
str
|
A formatted string containing grammar statistics and structure. |
from_file(filename, parser=None)
classmethod
Create a Grammar instance from a file containing BNF rules. This method reads the BNF grammar from a text file and initializes the grammar with the same parameters as the direct constructor.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filename
|
str
|
Path to the file containing BNF grammar rules |
required |
parser
|
Optional[GrammarParser]
|
GrammarParser will be used if not provided |
None
|
finchge.grammar.parser
GrammarParser
Bases: ABC
Base class for all grammar parsers.
Any custom parser must implement the parse() method which returns a standardized tuple containing grammar components.
parse()
abstractmethod
Parse input data and return grammar components.
Returns:
| Name | Type | Description |
|---|---|---|
tuple |
tuple[dict[str, Rule], dict[str, Rule], str, list[str], list[str]]
|
containing the rules, terminals and non-terminals |
Response Contains
rules_original(dict[str, Rule]): Mapping of non-terminal symbols to Rule objects (original written form without range expansion).rules(dict[str, Rule]): Mapping of non-terminal symbols to Rule objects in Expanded (range notation) form.start_rule(str): The first non-terminal rule, considered the start rule.terminals(set[str]): Set of terminal symbols.non_terminals(set[str]): Set of non-terminal symbols.
BNFGrammarParser(grammar_str)
Bases: GrammarParser
Parses a grammar string written in Backus-Naur Form (BNF) into structured rules.
The parser identifies terminal and non-terminal symbols, splits productions into alternatives, and validates the structure of the grammar.
Attributes:
| Name | Type | Description |
|---|---|---|
grammar_str |
str
|
The raw BNF grammar string. |
Args: grammar_str (str): The grammar definition in BNF format.
parse()
Parses the BNF grammar string into rules, terminals, and non-terminals.
Returns:
| Name | Type | Description |
|---|---|---|
Tuple |
tuple[dict[str, Rule], dict[str, Rule], str, list[str], list[str]]
|
Tuple containing rules (original), rules_expanded, start_rule, terminals and non-terminals. |
Response contains
rules_original(dict[str, Rule]): Mapping of non-terminal symbols to Rule objects. (Original may be contracted form)rules(dict[str, Rule]): Mapping of non-terminal symbols to Rule objects in Expanded (range notation) form.start_rule(str): The first non-terminal rule, considered the start rule.terminals(set[str]): List of terminal symbols.non_terminals(set[str]): List of non-terminal symbols.
Raises:
| Type | Description |
|---|---|
ValueError
|
If the grammar syntax is invalid or contains undefined symbols. |
remove_comments(lines)
Remove comments and empty lines.
consolidate_multiline_rules(lines)
If same rule sreads across multiple lines, then just make them a single line Simple rule is unti we see a token before ::= , its the same rule
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
lines
|
list[str]
|
|
required |
Returns:
split_choices(rhs_symbols)
Split RHS into production choices For now it simply finds choices based on presence of choice symbol (Can be improved later, if needed.)
finchge.grammar.rule
Rule(symbol, choices)
Represents a single production rule in a BNF grammar.
Each rule contains a non-terminal symbol (e.g., "
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
symbol
|
str
|
The non-terminal symbol on the left-hand side (LHS) of the rule. |
required |
choices
|
list[list[str]]
|
A list of possible right-hand side (RHS) expansions. Each choice is a list of symbols (either terminals or non-terminals). |
required |
finchge.grammar.GenotypeMapper(*, grammar, max_tree_depth=None, max_recursion_depth=None, max_wraps=0, repair_strategy=None, random_state=None)
Bases: RandomStateMixin
Bidirectional genotype-phenotype codec for Grammatical Evolution.
GenotypeMapper implements the core mapping logic of Grammatical Evolution (GE),
providing:
- Decoding (map): genotype >>> derivation tree >>> phenotype
- Encoding (reverse_map): derivation tree >>> genotype (that reproduces the same tree)
Follows classic GE semantics
- Stack-based depth-first expansion
- Modulo-based production selection
- Rightmost-first expansion (LIFO stack)
- Codon wrapping with a configurable limit
- Explicit recursion depth control
This class is grammar-aware but grammar-agnostic: all syntactic structure
comes from the provided Grammar instance.
Attributes:
| Name | Type | Description |
|---|---|---|
grammar |
Grammar
|
Grammar defining the genotype-to-phenotype mapping rules. |
rules |
dict[str, Rule]
|
Grammar production rules indexed by non-terminal symbols. |
non_terminals |
list[str]
|
Set of grammar non-terminal symbols. |
start_rule |
str
|
Start symbol used for derivation. |
max_recursion_depth |
int
|
Maximum allowed recursion depth during decoding. |
max_wraps |
int
|
Maximum number of genotype wraps allowed during decoding. |
repair_strategy |
Optional[RepairStrategy]
|
Optional phenotype repair strategy applied after decoding. |
Initialize a GenotypeMapper with grammar and mapping constraints.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
grammar
|
Grammar
|
Grammar defining production rules and non-terminals used for mapping. |
required |
max_tree_depth
|
Optional[int]
|
Max depth limit for resulting trees. |
None
|
max_recursion_depth
|
int
|
Maximum depth allowed during derivation tree expansion. Mapping is marked invalid if exceeded. Defaults to 20. |
None
|
max_wraps
|
int
|
Maximum number of times the genotype may be wrapped when codons are exhausted. Mapping is marked invalid if exceeded. Defaults to 6. |
0
|
repair_strategy
|
Optional[RepairStrategy]
|
Optional strategy to repair or post-process the phenotype after successful decoding. Defaults to None. |
None
|
random_state
|
int
|
random state |
None
|
map(genotype)
Decode a genotype into a derivation tree and phenotype using standard Grammatical Evolution (GE) mapping. This implementation performs stack-based depth-first expansion using leftmost derivation semantics.
Follows leftmost derivation sematnics
- Children are attached to the tree in left-to-right order
- Children are pushed onto the stack in reverse order ensuring leftmost child expands first
Codons are consumed sequentially and mapped to grammar productions using modulo selection.
Wrapping is supported when the genotype is exhausted, up to
max_wraps times.
Mapping is aborted and marked invalid if
- wrapping limit is exceeded
- recursion depth exceeds
max_recursion_depth - the final tree contains unresolved non-terminals
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
genotype
|
list[int]
|
List of integer codons representing the genotype. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
MappingResult |
MappingResult
|
Object containing: Text Only
|
reverse_map(tree, *, codon_size=127, pad_to_length=None, pad_mode='random')
Encode a derivation tree into a genotype that reproduces the same tree.
This method traverses the tree in the same order that map() consumes
codons: leftmost depth-first expansion of non-terminals.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tree
|
Union[TreeNode, str]
|
TreeNode root or JSON serialized tree. |
required |
codon_size
|
int
|
Maximum codon value (inclusive). |
127
|
pad_to_length
|
Optional[int]
|
Optional genome length padding. |
None
|
pad_mode
|
str
|
"zeros" or "random". |
'random'
|
Returns:
| Type | Description |
|---|---|
list[int]
|
list[int]: Genotype reproducing this tree. |
finchge.grammar.mapper.MappingResult(phenotype, used_genome, used_codon_count, invalid, tree, tree_str)
dataclass
Result of genotype to phenotype mapping.
finchge.grammar.derivation_tree.TreeNode(symbol)
Represents a derivation tree (root node) and its children. The Derivation Tree is used during genotype-to-phenotype mapping.
As TreeNode is used to represent both node and full derivation tree, the properties such as depth,
max_depth should be used with caution. The propertydepth is the depth of current node,
while max_depth should be used to get the depth of the tree.
Tree depth is derived dynamically from parent links.
The property depth gives the depth of the current node,
while max_depth gives the maximum depth of the full tree.
Each node holds a symbol (terminal or non-terminal) and may have children. The tree supports JSON and string (CSV-like) serialization. Args: symbol (str): The symbol associated with this node.
add_child(child)
Adds a child to the current node and updates depth, parent, and root.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
child
|
TreeNode
|
Node to add as a child. |
required |
Raises:
| Type | Description |
|---|---|
TypeError
|
If |
clone()
Deep copy of this node and its entire subtree. Parent pointers are reconstructed correctly.
from_dict(data)
classmethod
Constructs a TreeNode from a dictionary.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
dict
|
Dictionary with keys 'symbol' and 'children'. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
TreeNode |
TreeNode
|
Reconstructed tree. |
Raises:
| Type | Description |
|---|---|
KeyError
|
If required fields are missing. |
from_json(json_str)
classmethod
Constructs a TreeNode from a JSON string.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
json_str
|
str
|
JSON representation of the tree. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
TreeNode |
TreeNode
|
Reconstructed tree. |
Raises:
| Type | Description |
|---|---|
JSONDecodeError
|
If JSON string is invalid. |
from_string(s)
staticmethod
Deserialize a tree from its canonical string representation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
s
|
str
|
Serialized tree string. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
TreeNode |
TreeNode
|
Root node of reconstructed tree. |
replace_subtree_with(new_subtree)
size()
convert the tree into a phenotype
swap_subtree_with(other)
Swap this subtree with another subtree.
Returns the new roots of both trees.
to_dict()
Converts the node and its subtree into a dictionary.
Returns:
| Name | Type | Description |
|---|---|---|
dict |
dict[str, Any]
|
Dictionary representation of the tree. |
to_json(indent=2)
Serializes the tree into a JSON-formatted string.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
indent
|
int
|
Indentation for the JSON string. |
2
|
Returns:
| Name | Type | Description |
|---|---|---|
str |
str
|
JSON string representation of the tree. |
to_phenotype()
convert the tree into a phenotype
to_string()
finchge.algorithm
GeneticAlgorithm(selection, crossover, mutation, replacement, elite_size, fitness_evaluator, random_state=None)
Bases: BaseAlgorithm
Genetic Algorithm
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
selection
|
GESelectionStrategy
|
Selection strategy instance or function. |
required |
crossover
|
GECrossoverStrategy
|
Crossover strategy instance or function. |
required |
mutation
|
GEMutationStrategy
|
Mutation strategy instance or function. |
required |
replacement
|
GEReplacementStrategy
|
Replacement strategy instance or function. |
required |
elite_size
|
int
|
Number of elite individuals to carry over. |
required |
random_state
|
Optional[int]
|
random state |
None
|
evolve_one_generation(population)
Perform one generation of evolution on the given population.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
population
|
Population
|
The population to evolve. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
Population |
Population
|
The evolved population. |
sort_population(population)
Sorts the population based on fitness values. Supports the fitness_evaluator with single fitness function.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
population
|
Population
|
The population to sort. |
required |
NSGA2(selection, crossover, mutation, replacement, elite_size, fitness_evaluator, random_state=None)
Bases: BaseAlgorithm
NSGA-II algorithm.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
selection
|
GESelectionStrategy
|
Selection strategy. |
required |
crossover
|
GECrossoverStrategy
|
Crossover strategy. |
required |
mutation
|
GEMutationStrategy
|
Mutation strategy. |
required |
replacement
|
GEReplacementStrategy
|
Replacement strategy. |
required |
elite_size
|
int
|
Number of elite individuals to carry over. |
required |
fitness_evaluator
|
FitnessEvaluator
|
Evaluator used to score individuals. |
required |
random_state
|
Optional[int]
|
Seed for reproducible randomness. |
None
|
evolve_one_generation(population)
Perform one generation of evolution on the given population.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
population
|
Population
|
The population to evolve. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
Population |
Population
|
The evolved population. |
get_pareto_front(population)
Return the Pareto front (rank 0 individuals).
sort_population(population)
Sort the population using NSGA-II rank and crowding distance.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
population
|
Population
|
The population to sort. |
required |
NSGA3(selection, crossover, mutation, fitness_evaluator, num_divisions=12, random_state=None)
Bases: BaseAlgorithm
NSGA-III algorithm.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
selection
|
GESelectionStrategy
|
Selection strategy. |
required |
crossover
|
GECrossoverStrategy
|
Crossover strategy. |
required |
mutation
|
GEMutationStrategy
|
Mutation strategy. |
required |
fitness_evaluator
|
FitnessEvaluator
|
Evaluator used to score individuals. |
required |
num_divisions
|
int
|
Number of divisions for reference points. |
12
|
random_state
|
Optional[int]
|
Seed for reproducible randomness. |
None
|
evolve_one_generation(population)
Perform one generation of evolution on the given population.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
population
|
Population
|
The population to evolve. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
Population |
Population
|
The evolved population. |
get_pareto_front(population)
Return the Pareto front.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
population
|
Population
|
Population to extract the Pareto front from. |
required |
Returns:
| Type | Description |
|---|---|
list[Individual]
|
Individuals in the first non-dominated front. |
finchge.initialisation
RandomGenomeInitialiser(genome_length, codon_size, random_state=None)
Bases: GEInitialiser
Random genome initialiser for Grammatical Evolution.
Generates Individuals with a fixed-length integer genome where each codon is sampled uniformly from [0, codon_size].
This initialiser requires genome_length and codon_size to be provided explicitly via configuration or constructor.
from_config(config)
classmethod
Create a RandomGenomeInitialiser using FinchConfig object
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
FinchConfig
|
FinchConfig instance |
required |
Returns:
| Type | Description |
|---|---|
RandomGenomeInitialiser
|
RandomGenomeInitialiser |
Raises:
| Type | Description |
|---|---|
ConfigError
|
If required configuration keys are missing or invalid. |
RVDInitialiser(genome_length, codon_size, population_size, random_state=None, mapper=None)
Bases: GEInitialiser
Random Valid Distinct (RVD) initialiser.
Generates individuals by sampling random genomes while rejecting invalid mappings and duplicate phenotypes. - RVD initialisation [Nicolau, 2017].
Note
RVD initializer instance is supposed to be used once per run or the stored RVD state may cause issues. To reset the stored state call reset() , if same RVDInitializer instance has to be reused.
from_config(config)
classmethod
Create a RVDInitializer from the [ge] config section.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
FinchConfig
|
FinchConfig instance |
required |
Returns:
| Type | Description |
|---|---|
RVDInitialiser
|
RandomGenomeInitialiser |
Raises:
| Type | Description |
|---|---|
ConfigError
|
If required configuration keys are missing or invalid. |
reset()
If same RVD instance is used for multiple runs. must reset
FullTreeInitialiser(init_min_depth, init_max_depth, strict_full=True, random_state=None)
Bases: GETreeInitialiser
Full tree initialiser as defined by Koza (1992).
Produces trees where all internal nodes expand until max depth. Terminals are only allowed at the maximum depth.
GrowTreeInitialiser(init_min_depth, init_max_depth, random_state=None)
Bases: GETreeInitialiser
Grow tree initialiser as defined by Koza (1992).
Allows productions to terminate early below max depth, resulting in irregular tree shapes.
PIGrowInitialiser(init_max_depth, population_size, random_state=None)
Bases: GETreeInitialiser
Position-Independent Grow (PI-Grow) tree initialiser.
Generates derivation trees using grow-style initialisation with position-independent expansion.
from_config(config)
classmethod
Create a PI-Grow initialiser from GE configuration.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
FinchConfig
|
FinchConfig instance |
required |
Returns:
| Type | Description |
|---|---|
PIGrowInitialiser
|
PIGrowInitializer |
Raises:
| Type | Description |
|---|---|
ConfigError
|
If required configuration keys are missing or invalid. |
RHHInitialiser(init_max_depth, population_size, strict_full=True, random_state=None)
Bases: GETreeInitialiser
Ramped Half-and-Half (RHH) Initialisation for Grammatical Evolution.
This implementation of RHH Initialisation is the Grammatical Evolution is based on "Sensible Initialization" approach as an analogue of Koza’s Ramped Half-and-Half (RHH) method originally developed for tree-based Genetic Programming. It generates an initial population of derivation trees using a combination of Full and Grow construction strategies across a range of depth limits, promoting structural diversity while respecting grammar constraints.
In this approach, grammar production rules assume the role of GP functions in controlling tree growth. Recursive production rules are preferentially selected during Full initialisation to ensure tree expansion until the specified depth limit is reached. During Grow initialisation, both recursive and terminating productions may be selected, allowing trees of irregular shape to be generated.
Production feasibility is guided by grammar analysis, including recursion detection and minimum derivation depth calculation, ensuring that generated trees can terminate within the allowed depth budget.
References
Koza, J. R. (1992). Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press.
Ryan, C., Collins, J. J., & O’Neill, M. (1998). Grammatical Evolution: Evolving Programs for an Arbitrary Language. In Proceedings of the First European Workshop on Genetic Programming.
Ryan, C., & Azad, R. M. A. (2003). Sensible Initialisation in Grammatical Evolution. In Grammatical Evolution: Evolutionary Automatic Programming in an Arbitrary Language. Springer.
Fenton, M., McDermott, J., Fagan, D., Forstenlechner, S., Hemberg, E., & O’Neill, M. (2017). PonyGE2: Grammatical Evolution in Python. arXiv:1703.08535.
from_config(config)
classmethod
Create a RampedHalfAndHalfInitializer from configuration.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
FinchConfig
|
FinchConfig instance |
required |
Returns:
| Type | Description |
|---|---|
RHHInitialiser
|
RampedHalfAndHalfInitializer |
Raises:
| Type | Description |
|---|---|
ConfigError
|
If required configuration keys are missing or invalid. |
PTC2Initialiser(target_size, random_state=None, max_depth=None)
Bases: GETreeInitialiser
RampedPTC2Initialiser(init_min_size, init_max_size, population_size, max_depth=None, random_state=None)
Bases: GETreeInitialiser
Ramped size-based population initialiser using PTC2 family algorithms.
This initialiser generates individuals using Probabilistic Tree Creation 2 across a ramped distribution of target tree sizes.
Behaviour
• If max_depth is None: Uses PTC2 (size-controlled only).
• If max_depth is provided: Uses PTC2D (size + depth constrained).
finchge.operators.selection
GESelectionStrategy(max_best, random_state=None)
Bases: RandomStateMixin, ABC
TournamentSelection(max_best, tournament_size=3, random_state=None)
Bases: GESelectionStrategy
Selection strategy that chooses individuals using tournament selection.
In tournament selection, a fixed number of individuals (tournament_size) are randomly chosen from the population, and the best among them (based on fitness) is selected. This process repeats until the desired number of individuals is selected. Selection pressure increases with larger tournament sizes.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
max_best
|
bool
|
Whether higher fitness values are better. True for maximization, False for minimization. |
required |
tournament_size
|
int
|
The number of individuals competing in each tournament. Must be >= 2. Defaults to 3. |
3
|
random_state
|
Optional[int]
|
Random seed. |
None
|
select(population_size, individuals)
Selects a subset of individuals using tournament selection.
Repeatedly selects a group of individuals (a "tournament") at random from the
population, and chooses the best among them based on fitness. This process
continues until the desired number of individuals is selected. Whether the
best individual is determined by maximizing or minimizing fitness depends on
the max_best setting.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
population_size
|
int
|
The number of individuals to select for the next generation. |
required |
individuals
|
list[Individual]
|
A list of individuals to select from. Each must have a |
required |
Returns:
| Type | Description |
|---|---|
list[Individual]
|
A list of selected individuals based on tournament outcomes. |
RouletteWheelSelection(max_best, random_state=None)
Bases: GESelectionStrategy
Fitness-proportionate selection (roulette wheel selection).
Roulette wheel selection is one of the earliest and most widely used selection mechanisms in evolutionary computation. The method assigns selection probabilities to individuals proportional to their fitness, enabling stochastic sampling while maintaining selection pressure toward fitter individuals.
Each individual occupies a proportion of a conceptual roulette wheel, where the area assigned to each individual is proportional to its fitness. Selection is performed by sampling from this distribution, ensuring that individuals with higher fitness are more likely—but not guaranteed—to be selected.
Fitness-proportionate selection originates from early genetic algorithm research and is commonly attributed to the foundational work on genetic algorithms by John Holland.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
max_best
|
bool
|
Whether higher fitness values are better. True for maximization, False for minimization. |
required |
random_state
|
Optional[int]
|
Random seed. |
None
|
select(population_size, individuals)
Selects a subset of individuals using roulette wheel selection.
This method performs fitness-proportionate selection, where individuals are assigned selection probabilities based on their fitness values. The fitness values are shifted to ensure they are non-negative, and if necessary, inverted for minimization problems. If the total weight of the shifted fitness values is zero or negative, the method falls back to uniform random selection.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
population_size
|
int
|
The number of individuals to select for the next generation. |
required |
individuals
|
list[Individual]
|
A list of individuals to select from. Each individual must have a
|
required |
Returns:
| Type | Description |
|---|---|
list[Individual]
|
A list of selected individuals based on roulette wheel selection probabilities. |
RankSelection(max_best, selection_pressure=1.5, random_state=None)
Bases: GESelectionStrategy
Selection strategy that selects individuals based on their rank in the population.
In rank selection, individuals are first sorted by fitness, and then assigned
selection probabilities based on their rank rather than their raw fitness value.
The selection pressure can be adjusted using the selection_pressure parameter,
which influences how strongly the best individuals are favored. A higher selection
pressure makes the selection more biased toward higher-ranked individuals.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
max_best
|
bool
|
Whether higher fitness values are better. True for maximization, False for minimization. |
required |
selection_pressure
|
float
|
The selection pressure that controls the bias toward higher-ranked individuals. Must be between 1.0 and 2.0. Defaults to 1.5. |
1.5
|
random_state
|
Optional[int]
|
Random seed. |
None
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If selection_pressure is not between 1.0 and 2.0. |
select(population_size, individuals)
Selects a subset of individuals using linear rank-based selection.
This method applies rank selection by sorting individuals based on fitness and assigning selection probabilities based on their rank rather than their raw fitness values. The selection pressure parameter controls how much more likely the best-ranked individuals are to be selected compared to lower-ranked ones.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
population_size
|
int
|
The number of individuals to select. |
required |
individuals
|
list[Individual]
|
A list of individuals to select from. Each individual must have a
|
required |
Returns:
| Type | Description |
|---|---|
list[Individual]
|
A list of selected individuals from the population based on rank-weighted probabilities. |
TruncationSelection(max_best, truncation_threshold=0.5, random_state=None)
Bases: GESelectionStrategy
Truncation selection strategy.
Truncation selection is a deterministic selection mechanism in which only the top-performing portion of a population is allowed to reproduce. The selected elite subset is determined by sorting individuals according to fitness and selecting a predefined fraction of the best individuals.
The population is ranked by fitness and only the top k fraction of individuals is retained as eligible parents. Selection from this elite pool may then occur randomly or deterministically.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
max_best
|
bool
|
Whether higher fitness values are better. True for maximization, False for minimization. |
required |
truncation_threshold
|
float
|
Threshold value. |
0.5
|
random_state
|
Optional[int]
|
Random seed. |
None
|
select(population_size, individuals)
Selects a subset of individuals using truncation selection.
First, the population is sorted by fitness (descending for maximization,
ascending for minimization). Then, the top fraction specified by
truncation_threshold is selected as the elite pool. Finally, individuals
are randomly chosen from this elite pool to reach the desired population size.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
population_size
|
int
|
The number of individuals to select for the next generation. |
required |
individuals
|
list[Individual]
|
A list of individuals to select from. Each individual must have a
|
required |
Returns:
| Type | Description |
|---|---|
list[Individual]
|
A list of selected individuals randomly chosen from the elite portion of |
list[Individual]
|
the population. |
LexicaseSelection(case_key='errors', case_max_best=False, random_state=None)
Bases: GESelectionStrategy
Implements Lexicase Selection for parent selection in Genetic Programming.
This strategy filters a population by evaluating individuals against one case at a time in a randomized order. This is particularly effective for problems where different individuals might excel at different parts of the fitness landscape, as it preserves diversity better than standard tournament selection.
Attributes:
| Name | Type | Description |
|---|---|---|
case_key |
str
|
The key used to retrieve the per-case fitness vector from an individual's metadata. |
required_case_keys |
tuple
|
A tuple containing the primary case key needed for evaluation. |
Initializes the selection strategy.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
case_key
|
str
|
Metadata key where the casewise error/fitness list is stored. |
'errors'
|
case_max_best
|
bool
|
Set to True if higher values in the case vector are better (e.g., accuracy). Defaults to False (e.g., error/loss). |
False
|
random_state
|
Optional[int]
|
Seed for the random number generator. |
None
|
select(population_size, individuals)
Selects parents using Lexicase selection.
EpsilonLexicaseSelection(case_key='errors', epsilon=0.0, case_max_best=False, random_state=None)
Bases: LexicaseSelection
A variant of Lexicase Selection that uses a margin of error (epsilon).
Instead of requiring individuals to exactly match the 'best' performance on a given case, this strategy allows individuals to survive if they are within a specified distance (epsilon) of the best performer. This helps prevent premature convergence caused by tiny differences in floating-point fitness values.
Initializes the Epsilon Lexicase strategy.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
case_key
|
str
|
Metadata key for the casewise fitness vector. |
'errors'
|
epsilon
|
float
|
The buffer allowed when comparing fitness values. |
0.0
|
case_max_best
|
bool
|
True if higher values are better. |
False
|
random_state
|
Optional[int]
|
Seed for the random number generator. |
None
|
NSGA2TournamentSelection(max_best=False, tournament_size=2, exploration_prob=0.1, random_state=None)
Bases: GESelectionStrategy
Tournament selection using NSGA-II crowded-comparison operator.
This selection method implements the parent selection strategy introduced in the NSGA-II multi-objective evolutionary algorithm. Selection is based on Pareto rank and crowding distance to balance convergence toward the Pareto front with population diversity.
Individuals are compared using a crowded comparison operator:
- Prefer individuals with lower Pareto rank.
- If ranks are equal, prefer individuals with higher crowding distance.
Tournament selection is applied using this comparison operator.
crowded_comparison_operator(tournament)
Selects the best individual using NSGA-II's crowded comparison operator.
The operator selects individuals based on lower Pareto rank (better) If same rank, higher crowding distance (better)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tournament
|
list[Individual]
|
The individuals competing in the tournament. |
required |
Returns:
| Type | Description |
|---|---|
Individual
|
The selected winner from the tournament. |
select(population_size, individuals)
Selects individuals using NSGA-II tournament selection.
For each tournament:
1. With probability exploration_prob, a random individual is chosen
2. Otherwise, the crowded comparison operator selects the best individual
based on Pareto rank and crowding distance
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
population_size
|
int
|
Number of individuals to select. |
required |
individuals
|
list[Individual]
|
List of individuals to select from. Each individual must have 'rank' and 'crowding_distance' metadata. |
required |
Returns:
| Type | Description |
|---|---|
list[Individual]
|
List of selected individuals. |
NSGA3TournamentSelection(tournament_size=2, exploration_prob=0.1, random_state=None)
Bases: GESelectionStrategy
Tournament selection using NSGA-III rank-based selection.
This selection strategy implements the rank-based selection component of NSGA-III, a many-objective evolutionary algorithm designed for problems involving four or more objective functions.
Selection is performed using Pareto rank only. When multiple individuals share the same rank, selection among them is performed randomly to preserve diversity. NSGA-III relies on reference point-based niching rather than crowding distance.
rank_only_operator(tournament)
Selects an individual based on Pareto rank only. First finds the individuals with the best (lowest) rank, then randomly selects one from among them to maintain diversity.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tournament
|
list[Individual]
|
The individuals competing in the tournament. |
required |
Returns:
| Type | Description |
|---|---|
Individual
|
A randomly selected individual from those with the best rank. |
select(population_size, individuals)
Selects individuals using NSGA-III tournament selection.
For each tournament a random individual is chosen with probability exploration_prob,
otherwise, the rank-only operator selects from the best-ranked individuals
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
population_size
|
int
|
Number of individuals to select. |
required |
individuals
|
list[Individual]
|
List of individuals to select from. Each individual must have 'rank' metadata. |
required |
Returns:
| Type | Description |
|---|---|
list[Individual]
|
List of selected individuals. |
finchge.operators.crossover
OnePointCrossover(codon_size, crossover_proba, within_used=True, mode='fixed', random_state=None)
Bases: GECrossoverStrategy
One-point crossover operator.
A single crossover point is selected in each parent genome, and the genome segments after this point are exchanged to form offspring. Parent genomes are split at a randomly selected point. Offspring are constructed by concatenating prefix segments from one parent with suffix segments from the other. Supports both fixed and variable one-point crossover variants with mode parameter.
-
variable: Select one crossover point independently in each parent. This can change offspring genome lengths.
-
fixed: Select the same crossover point in both parents. This preserves offspring genome lengths.
Initialize the one-point crossover strategy.
This operator supports both fixed and variable one-point crossover:
-
"fixed": Uses the same crossover point in both parents. Offspring genome lengths are preserved. Equivalent to PonyGE-style fixed one-point crossover.
-
"variable": Uses independent crossover points in each parent. Offspring genome lengths may differ from parents. Equivalent to PonyGE-style variable one-point crossover.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
codon_size
|
int
|
Maximum codon value (inclusive). |
required |
crossover_proba
|
float
|
Probability that crossover is applied; otherwise, offspring are copies of the parents. |
required |
within_used
|
bool
|
If True, crossover points are restricted to the
used portion of each parent's genome (i.e., within
|
True
|
mode
|
Literal['variable', 'fixed']
|
Crossover mode. Either "fixed" or "variable". |
'fixed'
|
random_state
|
Optional[int]
|
Optional random seed for reproducibility. |
None
|
cross(parent1, parent2)
Perform one-point crossover
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
parent1
|
Individual
|
First parent. |
required |
parent2
|
Individual
|
Second parent. |
required |
Returns:
| Type | Description |
|---|---|
Tuple[Individual, Individual]
|
Two offspring individuals. |
SubtreeCrossover(crossover_proba, non_terminals, random_state=None)
Bases: GECrossoverStrategy
Subtree crossover exchanges randomly selected subtrees between two parent program trees while preserving syntactic correctness with respect to a grammar.
Two compatible non-terminal nodes are selected from parent trees, and the subtrees rooted at these nodes are swapped.
Initialize the subtree crossove.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
crossover_proba
|
float
|
Probability of crossover occurring. |
required |
non_terminals
|
list[str]
|
List of grammar non-terminal symbols that are valid crossover points. |
required |
cross(p_0, p_1)
Perform subtree crossover between two parent individuals.
If crossover does not occur, cloned trees are returned as new individuals.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
p_0
|
Individual
|
First parent individual. |
required |
p_1
|
Individual
|
Second parent individual. |
required |
Returns:
| Type | Description |
|---|---|
Tuple[Individual, Individual]
|
A tuple containing two newly constructed offspring individuals. |
TwoPointCrossover(codon_size, crossover_proba, within_used=True, mode='fixed', random_state=None)
Bases: GECrossoverStrategy
Two-point crossover generalizes one-point crossover by selecting two crossover points in each parent genome and exchanging the intermediate segments.
Offspring are formed by combining prefix and suffix regions from one parent with the middle region from the other parent.
Initialize the two-point crossover strategy.
This operator supports both fixed and variable two-point crossover:
-
"fixed": Uses the same two crossover points in both parents. A segment between the two points is exchanged, preserving offspring genome lengths. Equivalent to PonyGE-style fixed two-point crossover.
-
"variable": Uses independently selected crossover points in each parent. Exchanged segments may differ in length, allowing offspring genomes to grow or shrink. Equivalent to PonyGE-style variable two-point crossover.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
codon_size
|
int
|
Maximum codon value (inclusive). |
required |
crossover_proba
|
float
|
Probability that crossover is applied; otherwise, offspring are copies of the parents. |
required |
within_used
|
bool
|
If True, crossover points are restricted to the
used portion of each parent's genome (i.e., within
|
True
|
mode
|
Literal['fixed', 'variable']
|
Crossover mode. Either "fixed" or "variable". |
'fixed'
|
random_state
|
Optional[int]
|
Optional random seed for reproducibility. |
None
|
cross(parent1, parent2)
Perform two-point crossover.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
parent1
|
Individual
|
First parent. |
required |
parent2
|
Individual
|
Second parent. |
required |
Returns:
| Type | Description |
|---|---|
Tuple[Individual, Individual]
|
Two offspring individuals. |
UniformCrossover(crossover_proba, within_used=True, random_state=None)
Bases: GECrossoverStrategy
Uniform crossover operator.
Uniform crossover recombines parent genomes by independently swapping genes between parents at each position with fixed probability.
Instead of using fixed crossover points, uniform crossover applies a binary mixing mask across the genome, providing maximal mixing of parental genetic material.
Initialize uniform crossover
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
crossover_proba
|
float
|
Probability of crossover occurring. |
required |
within_used
|
bool
|
If True, crossover is limited to the used portion of each parent's genome. |
True
|
cross(parent1, parent2)
Perform uniform crossover.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
parent1
|
Individual
|
First parent. |
required |
parent2
|
Individual
|
Second parent. |
required |
Returns:
| Type | Description |
|---|---|
Tuple[Individual, Individual]
|
Two offspring individuals. |
finchge.operators.mutation
CyclicMutation(mutation_probability, segment_size=3, random_state=None)
Bases: GEMutationStrategy
Cyclic mutation strategy.
Rotates fixed-size contiguous segments of the genome with a per-gene mutation probability.
Initialize the cyclic mutation strategy.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
mutation_probability
|
float
|
Probability in [0.0, 1.0] that a given position triggers a cyclic rotation. |
required |
segment_size
|
int
|
Size of the segment to rotate. |
3
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If mutation_probability is not in [0.0, 1.0]. |
ValueError
|
If segment_size is less than 2. |
mutate(individual)
Apply cyclic mutation to an individual.
Each genome position is independently selected with
mutation_probability. When selected, the contiguous
segment starting at that position is rotated by one step.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
individual
|
Individual
|
Individual to mutate. |
required |
Returns:
| Type | Description |
|---|---|
Individual
|
A new Individual with a mutated genotype. |
DuplicationMutation(mutation_probability, segment_size=2, random_state=None)
Bases: GEMutationStrategy
Duplication mutation strategy.
With a given probability, duplicates a contiguous segment of fixed size and copies it into another non-overlapping location in the genome.
Initialize the duplication mutation strategy.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
mutation_probability
|
float
|
Probability in [0.0, 1.0] of applying the duplication mutation. |
required |
segment_size
|
int
|
Length of the segment to duplicate, must be > 0. |
2
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If mutation_probability is not in [0.0, 1.0]. |
ValueError
|
If segment_size is not a positive integer. |
mutate(individual)
Apply duplication mutation to an individual.
A segment of length segment_size is copied from one position and
overwrites another non-overlapping segment with probability
mutation_probability.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
individual
|
Individual
|
Individual to mutate. |
required |
Returns:
| Type | Description |
|---|---|
Individual
|
A new Individual with a mutated genotype. |
GaussianMutation(mutation_probability, std_dev=1.0, random_state=None)
Bases: GEMutationStrategy
Gaussian noise mutation strategy.
Adds Gaussian noise to selected genes and clamps results to non-negative integers.
Initialize the Gaussian mutation strategy.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
mutation_probability
|
float
|
Probability in [0.0, 1.0] that a given gene is mutated. |
required |
std_dev
|
float
|
Standard deviation of the Gaussian noise. |
1.0
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If mutation_probability is not in [0.0, 1.0]. |
mutate(individual)
Apply Gaussian mutation to an individual.
Each gene is independently selected with
mutation_probability. Selected genes receive
Gaussian noise, are rounded to integers, and
clamped to be non-negative.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
individual
|
Individual
|
Individual to mutate. |
required |
Returns:
| Type | Description |
|---|---|
Individual
|
A new Individual with a mutated genotype. |
IntFlipMutation(mutation_probability, codon_size, mode='per_codon', mutation_events=1, within_used=True, random_state=None)
Bases: GEMutationStrategy
Integer flip mutation strategy.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
mutation_probability
|
float
|
Probability of mutation per gene. |
required |
codon_size
|
int
|
Maximum codon value (inclusive). |
required |
mutate(individual)
Perform integer flip mutation on an Individual.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
individual
|
Individual
|
Individual to apply mutation on. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
Individual |
Individual
|
Mutated Individual. |
InversionMutation(segment_probability, random_state=None)
Bases: GEMutationStrategy
Inversion mutation strategy.
With a given probability, selects a random contiguous segment of the genome and reverses its order.
Initialize the inversion mutation strategy.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
segment_probability
|
float
|
Probability in [0.0, 1.0] of applying the inversion mutation. |
required |
Raises:
| Type | Description |
|---|---|
ValueError
|
If segment_probability is not in [0.0, 1.0]. |
mutate(individual)
Apply inversion mutation to an individual.
With probability segment_probability, a random contiguous
segment [start:end) of the genome is reversed.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
individual
|
Individual
|
Individual to mutate. |
required |
Returns:
| Type | Description |
|---|---|
Individual
|
A new Individual with a mutated genotype. |
MultipleMutation(strategies, probabilities=None, random_state=None)
Bases: GEMutationStrategy
Multiple mutation strategies combiner.
Randomly selects and applies one mutation strategy according to provided probabilities.
Initialize the multiple mutation strategy.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
strategies
|
list[GEMutationStrategy]
|
List of mutation strategies to choose from. |
required |
probabilities
|
Optional[list[float]]
|
Optional selection probabilities for each strategy. If None, all strategies are selected uniformly. |
None
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If probabilities length does not match strategies or if probabilities sum to zero. |
mutate(individual)
Apply a randomly selected mutation strategy.
One mutation strategy is selected according to the configured probabilities and applied to the individual.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
individual
|
Individual
|
Individual to mutate. |
required |
Returns:
| Type | Description |
|---|---|
Individual
|
A mutated Individual. |
SubtreeMutation(mutation_probability, non_terminals, tree_generator, mutation_max_depth, mutation_events=1, random_state=None)
Bases: GEMutationStrategy
Subtree mutation replaces randomly selected subtrees within a program tree with newly generated random subtrees. Mutation targets nodes representing non-terminal grammar symbols. The subtree rooted at the selected node is replaced with a newly generated subtree that satisfies grammar constraints. Subtree mutation is a core mutation mechanism in genetic programming and was introduced alongside subtree crossover in early GP research.
Characteristics:
- Maintains syntactic validity of individuals.
- Introduces structural novelty.
- Controls search-space exploration via subtree depth limits.
mutate(individual)
Mutate an individual using subtree mutation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
individual
|
Individual
|
Parent individual. |
required |
Returns:
| Type | Description |
|---|---|
Individual
|
A newly constructed mutated individual. |
SwapMutation(mutation_probability, random_state=None)
Bases: GEMutationStrategy
Swap mutation strategy. Randomly swaps pairs of genes in the genotype based on a per gene mutation probability.
Initialize the swap mutation strategy.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
mutation_probability
|
float
|
Probability in [0.0, 1.0] that a given gene position is selected for swapping. |
required |
Raises:
| Type | Description |
|---|---|
ValueError
|
If mutation_probability is not in [0.0, 1.0]. |
mutate(individual)
Apply swap mutation to an individual.
Each gene position is independently selected with
mutation_probability. Selected positions are randomly
paired and their values swapped.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
individual
|
Individual
|
Individual to mutate. |
required |
Returns:
| Type | Description |
|---|---|
Individual
|
A new Individual with a mutated genotype. |
finchge.operators.replacement
GenerationalReplacement(max_best, random_state=None)
Bases: GEReplacementStrategy
Elitist generational replacement strategy for single-objective optimization.
This replacement strategy implements classic generational replacement used in standard genetic algorithms. At each generation, the entire population is replaced by newly generated offspring, while preserving a fixed number of elite individuals from the previous population.
Individuals are ranked using a scalar fitness value, and elites are selected based on either maximization or minimization of that fitness.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
max_best
|
bool
|
If True, higher fitness values are considered better (maximization). If False, lower fitness values are considered better (minimization). |
required |
Methods:
| Name | Description |
|---|---|
replace |
Replaces the old population with a new population while preserving elite individuals from the previous generation. |
Notes
- Assumes that each individual's
fitnessattribute is a scalar value. - Elites are selected exclusively from the old population.
- Fitness values must already be evaluated prior to replacement.
See Also
- SteadyStateReplacement
- RandomElitistReplacement
- NSGA2ElitistReplacement
replace(new_population, old_population, elite_size, population_size)
Replaces old population with new population, preserving elite individuals.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
new_population
|
list[Individual]
|
List of new individuals |
required |
old_population
|
list[Individual]
|
List of current individuals |
required |
elite_size
|
int
|
Number of elite individuals to preserve |
required |
population_size
|
int
|
Target size of the population |
required |
Returns:
| Name | Type | Description |
|---|---|---|
list |
list[Individual]
|
New population with elites |
NSGA2ElitistReplacement(maximize_flags, random_state=None)
Bases: GEReplacementStrategy
NSGA-II elitist replacement strategy using Pareto rank and crowding distance.
This replacement strategy implements the environmental selection step of NSGA-II. It preserves elite individuals from the previous population and fills the remaining population by selecting individuals from successive Pareto fronts, prioritizing diversity via crowding distance.
The strategy assumes that:
- Non-dominated sorting has already been performed.
- Each individual has a valid Pareto rank stored in meta["rank"].
- Crowding distance is computed per front using NSGA-II rules.
This class is not compatible with NSGA-III, as it relies on crowding distance for diversity preservation. For NSGA-III, reference-point-based environmental selection must be used instead.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
maximize_flags
|
list[bool]
|
List indicating whether each objective should be maximized (True) or minimized (False). This is used to perform non-dominated sorting when combining parent and offspring populations. |
required |
Methods:
| Name | Description |
|---|---|
replace |
Performs elitist environmental selection according to NSGA-II. |
See Also
- fast_non_dominated_sort
- calculate_crowding_distance
- NSGA3Replacement
replace(new_population, old_population, elite_size, population_size)
Replacement strategy that preserves elite solutions and maintains diversity.
RandomElitistReplacement(max_best, random_state=None)
Bases: GEReplacementStrategy
Random replacement strategy with elitism for single-objective optimization.
This replacement strategy preserves a fixed number of elite individuals from the current population and fills the remaining population slots by randomly sampling from the non-elite individuals of the old population and the newly generated individuals.
Elites are selected based on a scalar fitness value, using either maximization or minimization as specified at construction time. Diversity is introduced through random sampling rather than explicit distance or dominance measures.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
max_best
|
bool
|
If True, higher fitness values are considered better (maximization). If False, lower fitness values are considered better (minimization). |
required |
Methods:
| Name | Description |
|---|---|
replace |
Preserves elite individuals and randomly selects remaining individuals to form the next generation. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
ValueError
|
If any individual does not have a single-objective fitness representation (fitness list length != 1). |
ValueError
|
If there are not enough eligible individuals to fill the population. |
Notes
- Assumes that fitness values are evaluated prior to replacement.
- Does not use Pareto dominance, crowding distance, or reference-point niching.
- Random sampling is performed without replacement.
See Also
- GenerationalReplacement
- SteadyStateReplacement
- NSGA2ElitistReplacement
SteadyStateReplacement(max_best, random_state=None)
Bases: GEReplacementStrategy
Steady-state replacement strategy for single-objective optimization.
This replacement strategy implements a steady-state evolutionary model, where only a subset of individuals is replaced at each generation rather than replacing the entire population. A fixed number of elite individuals from the current population are preserved, while the remaining slots are filled with the best individuals from the newly generated population.
Selection is based on a scalar fitness value, using either maximization or minimization as specified at construction time.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
max_best
|
bool
|
If True, higher fitness values are considered better (maximization). If False, lower fitness values are considered better (minimization). |
required |
Methods:
| Name | Description |
|---|---|
replace |
Preserves elite individuals from the old population and replaces the remaining individuals with the best-performing offspring. |
Notes
- Assumes that each individual's
fitnessattribute is a scalar value. - Fitness values must be evaluated prior to replacement.
- Elites are selected exclusively from the old population.
- The number of preserved individuals is controlled by
elite_size.
See Also
- GenerationalReplacement
- RandomElitistReplacement
- NSGA2ElitistReplacement
replace(new_population, old_population, elite_size, population_size)
Replaces worst individuals in old population with new individuals. Preserves elite_size best individuals.
finchge.config
FinchConfig(data)
Configuration manager for Grammar Evolution (GE) utils.
FinchConfig provides an interface for loading, accessing, and modifying GE configuration from YAML and INI files.
Can load conifg as dictionaries (dict[str, dict[str, Any]]) through a construtor of from_file. Supports automatic file discovery and typed access to configuration sections. When working in a single proplem project with one configuration, FinchConfig can detect the file automatiaally. Although automatic config detection is handy during quick tasks such as checking grammar and other components. For more organized utils it is recommended to provide the actual file. As this class may silently pick the config file from current working directory if available any.
Attributes:
| Name | Type | Description |
|---|---|---|
_data |
Internal storage for configuration data as nested dictionaries. |
Initializes a FinchConfig instance with configuration data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
dict[str, dict[str, Any]]
|
required |
Note
Prefer using the class methods from_file, from_yaml, or from_ini
to create instances rather than calling this constructor directly.
experiment
property
Retrieves the experiment configuration section. Returns: Dictionary of experiment configuration parameters.
ge
property
Retrieves the grammatical evolution configuration section.
Typically contains
- start_symbol: The starting non-terminal symbol
- rules: Grammar production rules
- max_depth: Maximum derivation depth
- crossover_rate: Probability of applying crossover
- mutation_rate: Probability of applying mutation
- selection_method: Selection strategy (tournament, roulette, etc.)
- tournament_size: If using tournament selection
- other related parameters
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
Dictionary of grammar configuration parameters. |
parallel
property
Retrieves the parallel configuration section.
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
Dictionary of experiment configuration parameters. |
copy(update=None)
Creates a deep copy of the configuration with optional updates.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
update
|
dict[str, dict[str, Any]] | None
|
Optional dictionary of updates to apply to the copy. Format: {section_name: {key: value, ...}, ...} If a section doesn't exist, it will be created. |
None
|
Returns:
| Type | Description |
|---|---|
FinchConfig
|
A new FinchConfig instance with the copied (and optionally updated) data. |
from_dict(data)
classmethod
Loads dict as FinchConfig. Internally finchGE uses FinchConfig class. Args: data (dict[str, Any]): configuration as a dictionary
Returns: FinchConfig instance
from_file(path=None)
classmethod
Creates a FinchConfig instance from a configuration file.
Automatically detects file format based on extension and loads the configuration. If no path is provided, searches for common config filenames in the current directory.
If the config files follow expected name (ge_config) and are in current working directory, they can be automatically detected. Othewise they have to be passed as path argument.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str | None
|
Optional path to configuration file. If None, searches for 'ge_config.yaml', 'ge_config.yml', or 'ge_config.ini' in the current directory. |
None
|
Returns:
| Type | Description |
|---|---|
FinchConfig
|
FinchConfig instance loaded with configuration data. |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If no configuration file is found. |
RuntimeError
|
If multiple configuration files are found (when path is None). |
ValueError
|
If the file format is unsupported or the file content is invalid. |
from_ini(path)
classmethod
Creates a FinchConfig instance from an INI configuration file.
Parses INI sections and converts string values to appropriate Python types
using ast.literal_eval. Values that cannot be evaluated remain as strings.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str
|
Path to the INI configuration file. |
required |
Returns:
| Type | Description |
|---|---|
FinchConfig
|
FinchConfig instance with parsed configuration data. |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If the INI file does not exist. |
Error
|
If the INI file is malformed. |
from_yaml(path)
classmethod
Creates a FinchConfig instance from a YAML configuration file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str
|
Path to the YAML configuration file. |
required |
Returns:
| Type | Description |
|---|---|
FinchConfig
|
FinchConfig instance with parsed configuration data. |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If the YAML file does not exist. |
YAMLError
|
If the YAML file is malformed. |
ValueError
|
If the top-level YAML structure is not a dictionary. |
Example YAML format:
section(name)
Retrieves a configuration section by name.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Name of the configuration section. |
required |
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
Dictionary containing the section's key-value pairs. |
dict[str, Any]
|
Returns an empty dictionary if the section does not exist. |
to_dict()
Converts the FinchConfig instance into a dict
Returns: dictionary for the FinchConfig
to_json(indent=2)
Converts the GECOnfig instance to json Args: indent: indent for the json
Returns: json string
ConfigValidator
Defines validation rules for FinchGE configuration files.
This class contains the schema and validation logic used to verify the structure and values of a FinchGE configuration.
Attributes:
| Name | Type | Description |
|---|---|---|
MANDATORY_SECTIONS |
set[str]
|
Sections that must be present for a valid configuration. |
RECOMMENDED_SECTIONS |
set[str]
|
Sections that are optional but strongly recommended. |
OPTIONAL_SECTIONS |
set[str]
|
Fully optional sections. |
REQUIRED_FIELDS |
dict[str, set[str]]
|
Required fields for each configuration section. |
FIELD_VALIDATORS |
dict[str, dict[str, Callable]]
|
Field-level validators keyed by section and field name. |
finchge.fitness
GEFitnessFunction(maximize=False, case_data_key=None)
Bases: ABC
Base class for a fitness function used in genetic algorithms.
This abstract class defines the interface for evaluating the fitness
of a given phenotype. Subclasses must implement the evaluate method.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
maximize
|
bool
|
Indicates whether the goal is to maximize (True) or minimize (False) the fitness score. |
False
|
required_context_keys
property
Declare what context keys this fitness function needs. Override this in subclasses that need more than y_pred/y_true.
evaluate(context)
abstractmethod
Evaluate the fitness score using the provided context.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
context
|
EvaluationContext
|
EvaluationContext Typeddict containing evaluation inputs, such as 'y_pred', 'y_val', 'x_val', etc. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
Fitness |
Fitness
|
The computed fitness. |
AccuracyFitness()
Bases: GEFitnessFunction
Fitness function that evaluates model accuracy on a validation set.
This metric computes the classification accuracy between the predicted labels and the ground-truth labels from the validation data. It is intended to be used in supervised classification tasks, and is a maximization objective.
Methods:
| Name | Description |
|---|---|
evaluate |
Computes accuracy using 'y_pred' and 'y_true' from the context. |
evaluate(context)
Evaluates the accuracy of the model's predictions.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
context
|
EvaluationContext
|
EvaluationContext Typeddict containing evaluation inputs. Must include: - 'y_pred': Predicted labels from the model (numpy array or array-like). - 'y_test': True labels for the test set (numpy array or array-like). |
required |
Returns:
| Name | Type | Description |
|---|---|---|
float |
Fitness
|
Accuracy score between 0.0 and 1.0 |
Raises:
| Type | Description |
|---|---|
ValueError
|
If y_test and y_pred have different shapes or are empty |
MAEFitness()
Bases: GEFitnessFunction
Mean Absolute Error fitness for regression problems. Lower values indicate better fit (minimization objective).
evaluate(context)
Calculate Mean Absolute Error between true and predicted values.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
context
|
dict[str, Any]
|
Must contain: - 'y_true': Ground truth values (array-like) - 'y_pred': Predicted values (array-like) |
required |
MSEFitness()
Bases: GEFitnessFunction
RMSEFitness()
Bases: GEFitnessFunction
CrossEntropyFitness(eps=1e-12)
Bases: GEFitnessFunction
CrossEntropy (log loss) fitness for classification.
Lower values are better (minimize). Requires predicted probabilities.
HingeLossFitness()
Bases: GEFitnessFunction
Mean Hinge Loss fitness for binary classification problems. This fitness evaluates how well a model separates two classes by measuring the margin between predictions and true labels. Hinge loss is calculated as hinge_i = max(0, 1 - y_true_i * y_pred_score_i) Should be used only for margin-based classifiers. The ground truth should be encoded between -1 and 1 , not 0/1. Similarly, the prediction scores should be raw model outputs (real-valued scores), representing the model’s confidence or margin, not class labels. The overall fitness value is the mean hinge loss across all samples.
evaluate(context)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
context
|
dict[str, Any]
|
Context containing |
required |
Returns:
| Type | Description |
|---|---|
Fitness
|
Fitness value for the hinge loss. |
RewardFitness(maximize=True, optimal_fitness=None)
Bases: GEFitnessFunction
Fitness function for control problems with reward maximization. Simply returns the reward (higher is better).
Initialize reward fitness.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
maximize
|
bool
|
Whether to maximize (True) or minimize (False) |
True
|
optimal_fitness
|
Optional[float]
|
Known optimal fitness value |
None
|
evaluate(context)
Evaluate fitness from context.
Expected context keys
y_pred: Array of rewards from runner
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
context
|
dict[str, Any]
|
context with y_pred |
required |
Returns:
| Type | Description |
|---|---|
Fitness
|
Total reward (sum across episodes) |
StringMatchFitness(target)
Bases: GEFitnessFunction
finchge.grammar.repair_strategy
RepairStrategy
Bases: ABC
Abstract base class for phenotype repair strategies.
repair(phenotype)
abstractmethod
Repair the given phenotype and return repaired phenotype
finchge.core.GrammaticalEvolution(fitness_evaluator, grammar=None, config=None, initialiser=None, algorithm=None, expt_logger=None, checkpoint_manager=None, random_state=None)
Bases: RandomStateMixin
Grammatical evolution class for running the evolution. This is the evolution controller class responsible for running FinchGE utils. This class is responsible for wiring up different components. This includes loading experiment configurations, loading grammar, initializing the initial population, setting up the fitness evaluator, and using a genetic algorithm to run the evolution. GrammaticalEvolution class is also responsible for running the evolution loop.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
fitness_evaluator
|
FitnessEvaluator
|
Evaluator to evaluate the fitness of individuals. |
required |
grammar
|
Optional[Grammar]
|
BNF Grammar to be used. If not provided config must be available and must contain grammar_file value |
None
|
config
|
Optional[FinchConfig | dict[str, Any]]
|
Configuration settings for the GE algorithm. |
None
|
initialiser
|
GEInitialiser | GETreeInitialiser | None
|
Initializer class either integer based or tree based initializer |
None
|
algorithm
|
Optional[Any]
|
Evolutionary algorithm to be used e.g., GA, NSGA. |
None
|
expt_logger
|
BaseLogger | None
|
Logger used for experiment-level records. |
None
|
checkpoint_manager
|
CheckpointManager | None
|
Checkpoint manager used to save and resume runs. |
None
|
random_state
|
Optional[int]
|
Seed for reproducible randomness. |
None
|
Note: If algorithm is not provided, GA will be used (provided that config is available)
halt(min_gens_allowed=0)
halt signal to stop generation loop : can specify minimum allowed generations, if halt signal received it will wait if min_generations are not satisfied yet, if halt signal is received after minimum generations allowed, will halt immediately
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
min_gens_allowed
|
int
|
Minimum number of generations before halt |
0
|
run()
Finds the fittest individual ( or a pareto front for multi-objective) in the population and logs the results.
finchge.fitness.FitnessEvaluator(fitness_functions, mapper, runner=None, encode_trees=False, parallel_config=None, require_case_data=False)
Fitness evaluator is used to evaluate the fitness of the individuals. It is a wrapper for fitness function to allow convenient fitness evaluation especially for use cases where data and models are involved. For example in hyperparameter optimization or neural architecture search. The phenotype is converted to model using the 'phenotype_to_model' parameter.
Supports both model-based evaluation (e.g., using train/validation data) and phenotype-only evaluation (e.g., string matching or symbolic tasks).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
fitness_functions
|
Union[GEFitnessFunction, list[GEFitnessFunction]]
|
One or more fitness function instances. |
required |
mapper
|
GenotypeMapper
|
Genotype mapper. |
required |
runner
|
PhenotypeRunner | None
|
Runner for evaluating phenotypes. |
None
|
encode_trees
|
bool
|
Whether to encode trees to integer genotypes. This is needed if genome-based operators are used after tree-based initialization. |
False
|
parallel_config
|
dict[str, Any] | None
|
Parallel section of the config. |
None
|
require_case_data
|
bool
|
Whether case-based evaluation is required, for example when using lexicase selection. |
False
|
count_objectives()
Counts the number of objectives.
Returns:
| Name | Type | Description |
|---|---|---|
int |
int
|
Number of fitness functions (objectives). |
derive_eval_seed(base_seed, phenotype, env_version)
Although we share same random state using private RNG shared in different components, It will not work for parallelized evaluation. Each process, or even each machine, will run evaluations in a different order, so random state will fail.
So, we must use some deterministic way to evaluate. One way is to use different seeds in deterministic way Same phenotype and base seed combination should result in same seed in each run.
So we derive a 32-bit seed for evaluation/training in a worker process. Same (base_seed, phenotype, env_version) will give same evaluation seed.
evaluate_individual(individual)
Evaluates a given phenotype by training the corresponding model and applying fitness functions.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
individual
|
Individual
|
An individual configuration or representation used to generate a model. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
list |
None
|
A list of fitness scores corresponding to each fitness function. |
evaluate_population(population)
NEW: Evaluates entire population. Uses parallel execution if enabled, sequential otherwise.
Users should call this instead of manually calling evaluate().
get_env_version()
Return current environment version. To distinguish environments in adaptive environments.
get_fitness_functions()
Returns the list of fitness function instances.
Returns:
| Name | Type | Description |
|---|---|---|
list |
list[GEFitnessFunction]
|
List of fitness function objects. |
get_maximize_flags()
Returns the maximize flags for each objective :return:
get_objective_names()
Retrieves human-readable names of the objectives.
Returns:
| Name | Type | Description |
|---|---|---|
list |
list[str]
|
List of objective names derived from fitness function class names. |
is_multi_objective()
Determines if the evaluation is multi-objective.
Returns:
| Name | Type | Description |
|---|---|---|
bool |
bool
|
True if more than one fitness function is provided, False otherwise. |
refesh_mapping(ind)
Ensure that genotype, phenotype, and tree are consistent.
Rules
- If already mapped, do nothing.
- If tree exists and genotype is absent:
- reverse-map genotype when encode_trees=True
- otherwise derive phenotype directly from tree
- If genotype exists, map through the mapper
set_env_version(version)
Update environment version (e.g. generation, dataset update).
shutdown()
async
Cleanup parallel resources
finchge.utils.cache
CacheManager(*, cache_type='lru', experiment_id=None, evaluator_namespace=None, **kwargs)
Bases: Generic[V]
Central cache manager.
Responsibilities: - key construction (hashing, namespacing) - backend selection (LRU / disk ) - fitness cache access
finchge.utils.logger
BaseLogger
Bases: ABC
Interface for experiment tracking. Observer Pattern
on_generation_end(generation, population, best=None, pareto_front=None, fitness_stats=None)
abstractmethod
Called at the end of each generation.
on_run_end(result)
abstractmethod
Called when a run ends.
on_run_start(log_dir, obj_names, config)
abstractmethod
Called when a run starts.
ExperimentLogger(exclude=None, compress_genotypes=False, log_population_samples=False, sample_size=5, custom_log_hook=None)
Bases: BaseLogger
Experiment logger
Logs essential data for post-hoc analysis. This provides minimal essential logs only. To add more logs, this class can be extended or wrapped to add custom metrics and analysis.
on_generation_end(generation, population, best=None, pareto_front=None, fitness_stats=None, extra_data=None)
Log data at the end of each generation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
extra_data
|
Optional[dict[str, Any]]
|
Additional user-provided data to log |
None
|
on_run_end(result)
Finalize logging at the end of the run.
on_run_start(log_dir, obj_names, config, algorithm=None)
Initialize logging when the experiment run starts.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
algorithm
|
Optional[Any]
|
Optional reference to algorithm instance for custom logging hooks |
None
|
setup_logging(logger_id='finch', group_name=None, verbose=False)
Setup logging for a specific project instance.
get_log_dir(logger_id='finch')
summary Retrieve the log directory for a given project instance.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
logger_id
|
str
|
description. Defaults to "finch". |
'finch'
|
Raises:
| Type | Description |
|---|---|
KeyError
|
description |
Returns:
| Name | Type | Description |
|---|---|---|
str |
str
|
description |
finchge.utils.random_mixin
RandomStateMixin(*args, random_state=None, **kwargs)
Mixin to add random state control All classes using randomness either using python random or numpy random should extend from RandomMixin. And, should provide random_state parmeter in the constructor, Otherwise, the determinism is not guaranteed across runs.
__getstate__()
Called when pickling. Removes unpicklable RNGs and stores only the seed.
__setstate__(state)
Called when unpickling. Restores object state and reinitializes RNGs.
get_seed_info()
Get information about the seed for logging
finchge.utils.results
ResultHelper(project_id='finch')
StatsHelper
Utility class for computing population-level fitness statistics.
compute_fitness_stats(*, individuals, objective_names=None)
staticmethod
Compute fitness statistics for a population.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
individuals
|
list[Individual]
|
List of evaluated Individuals. |
required |
objective_names
|
Optional[list[str]]
|
Optional list of objective names. If None, objectives will be indexed numerically. |
None
|
Returns:
| Type | Description |
|---|---|
list[Dict[str, Any]]
|
A list of dictionaries, one per objective, containing: |
list[Dict[str, Any]]
|
|
list[Dict[str, Any]]
|
|
list[Dict[str, Any]]
|
|
list[Dict[str, Any]]
|
|
list[Dict[str, Any]]
|
|
list[Dict[str, Any]]
|
|
list[Dict[str, Any]]
|
|
finchge.runners
PhenotypeRunner(random_state=None)
Bases: RandomStateMixin, ABC
Base class for all phenotype runners.
Every runner must implement run() which returns (predictions, targets) as numpy arrays.
get_metadata()
Optional metadata about the executor/problem.
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
Dictionary with metadata (can be empty) |
run(phenotype, context_hints=None)
abstractmethod
Run phenotype and return context dictionary.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
phenotype
|
str
|
Program to evaluate |
required |
context_hints
|
Optional[set[str]]
|
What context keys the fitness functions need. None means assume minimal (y_pred/y_true only). |
None
|
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
Dictionary with at least 'y_pred' and 'y_true', plus any hinted keys. |
SymbolicRegressionRunner(data_train, data_val=None, data_test=None, random_state=None)
Bases: DirectEvalRunner
Runner for symbolic regression.
The phenotype is interpreted as a symbolic expression and evaluated directly on the active evaluation split.
ControlRunner(env_factory, n_episodes=1, optimal_fitness=None, random_state=None)
Bases: PhenotypeRunner, ABC, Generic[T]
Base class for control problem runners.
Type parameter T represents the environment type.
Initialize control runner.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
env_factory
|
Callable[[], T]
|
Function that creates new environment instances |
required |
n_episodes
|
int
|
Number of episodes to run per evaluation |
1
|
optimal_fitness
|
Optional[float]
|
Known optimal fitness value |
None
|
get_metadata()
Return metadata about the control problem.
run(phenotype, context_hints=None)
Run multiple episodes and return rewards.
LogicRunner(X, y, interpreter=None, random_state=None)
Bases: PhenotypeRunner
Runner for logic problems like multiplexer.
Evaluates phenotype on all truth table combinations. Returns predictions and true values for fitness calculation.
run(phenotype, context_hints=None)
Evaluate phenotype on all input combinations.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
context_hints
|
Optional[set[str]]
|
What context keys the fitness functions need. None means assume minimal (y_pred/y_true only). |
None
|
phenotype
|
str
|
Logic program string |
required |
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
dict with context having y_pred, y_true and other context info as requested in context_hints argument |
MLModelRunner(data_train, data_val, model_parser, data_test=None, random_state=None)
Bases: TrainEvalRunner
Runner for problems where phenotype builds and trains a model. This runner can be used in problems such as hyperparameter optimization and architecture search etc.
The phenotype is parsed into a trainable model or configuration, fitted on the training split, and evaluated on the active evaluation split.
finchge.utils.checkpoint
CheckpointManager
Bases: Protocol
Checkpoint manager interface.
exists()
Return True if at least one checkpoint exists.
load_latest(*, expected_config_hash=None)
Load and return the latest checkpoint.
save(*, generation, population, algorithm, config, py_rng_state, np_rng_state)
Persist a checkpoint and return the path written.
should_save(generation)
Return True if this generation should be checkpointed. Based on 'every' parameter's value, decide whether to save or not.
CheckpointState(version, config_hash, generation, population, algorithm, rng_state)
dataclass
All state required to resume a GE run deterministically.
FileCheckpointManager(directory, *, every=10, keep_last=5, filename_prefix='checkpoint_gen_')
Bases: CheckpointManager
Filesystem checkpoint manager using pickle.
Population and Algorithm must be pickle-serializable.
RNG state is stored to guarantee deterministic resume.
Files are written as: checkpoint_gen_[GEN].pkl
load_latest(*, expected_config_hash=None)
Load latest checkpoint and verify config hash.
should_save(generation)
Return True if this generation should be checkpointed. Based on 'every' parameter's value, decide whether to save or not.
RNGState(py_state, np_state)
dataclass
Serializable RNG states for deterministic resume.
stable_config_hash(config)
Create a hash of config for checkpoint compatibility checks.
finchge.symbolic
SymbolicExpression(expression)
Parse and evaluate a restricted symbolic expression.
The class provides a small, controlled expression language intended for symbolic regression and genetic programming. Expressions are parsed using Python's AST and evaluated using protected numerical operators to avoid NaN/Inf where possible.
Supported features
- Variables of the form x0, x1, x2, ...
- Basic arithmetic operations (+, -, , /, *)
- Selected mathematical functions (sin, cos, exp, log, etc.)
- Constant expressions
- Optional normalization of NumPy-style column syntax (e.g. x[:, 0] → x0)
- Simple structural metrics (node count, depth)
Notes
This is not a full symbolic algebra system. Expressions are evaluated numerically and are not automatically simplified.
Attributes:
| Name | Type | Description |
|---|---|---|
original_expression |
str
|
The input expression string as provided by the user. |
expression |
str
|
Normalized expression string after preprocessing (e.g. slice syntax rewritten). |
variables |
list[str]
|
Sorted list of variable names used in the expression (e.g. ['x0', 'x2']). |
_tree |
Expression
|
Parsed AST representation of the expression. |
Raises:
| Type | Description |
|---|---|
ExpressionSyntaxError
|
If the expression contains unsupported syntax. |
ExpressionEvaluationError
|
If evaluation fails at runtime. |
eval(X)
Evaluate the expression on X.
Parameters
X: Input data. Expected shape is (n_samples, n_features). A scalar is treated as shape (1, 1). A 1D array is treated as a single sample with multiple features.
Returns
np.ndarray Shape (n_samples,).
GERegressor(grammar, config, fitness_functions, generations=100, population_size=100, random_state=None)
Bases: RandomStateMixin, BaseEstimator, RegressorMixin
Scikit-learn compatible symbolic regression model based on Grammatical Evolution (GE).
The estimator evolves mathematical expressions defined by a grammar and selects the best-performing individual according to the provided fitness function(s). The resulting symbolic expression can then be evaluated to make predictions.
Attributes:
| Name | Type | Description |
|---|---|---|
ge_result_ |
GEResult
|
Result object returned by the GE run. |
selected_individual_ |
Individual | None
|
Individual currently selected for prediction. |
selected_expression_ |
SymbolicExpression | None
|
Parsed symbolic expression corresponding to the selected individual. |
Initialize the symbolic regression estimator.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
grammar
|
Grammar
|
Grammar defining the search space of valid symbolic expressions. |
required |
config
|
FinchConfig | dict[str, Any]
|
Configuration controlling
GE behavior (operators, mutation rates, limits, etc.).
A dictionary will be converted into a |
required |
fitness_functions
|
GEFitnessFunction | list[GEFitnessFunction]
|
Fitness function(s) used to evaluate candidate expressions. |
required |
generations
|
int
|
Number of evolutionary generations. Defaults to 100. |
100
|
population_size
|
int
|
Number of individuals per generation. Defaults to 100. |
100
|
random_state
|
int | None
|
Random seed for reproducibility. |
None
|
fit(X, y)
Run grammatical evolution to discover a symbolic expression that fits the provided data.
This launches the evolutionary search, evaluates individuals using the configured fitness function(s), and stores the resulting GE run along with the best individual (if available).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X
|
Any
|
Training features. |
required |
y
|
Any
|
Target values. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
GERegressor |
GERegressor
|
The fitted estimator. |
predict(X)
Evaluate the selected symbolic expression on the input data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X
|
Any
|
Input features. |
required |
Returns:
| Type | Description |
|---|---|
NDArray[float64]
|
NDArray[np.float64]: Predicted target values. |
Raises:
| Type | Description |
|---|---|
RuntimeError
|
If the estimator has not been fitted or no individual has been selected. |
select_individual(individual)
Manually select an individual from the GE population for prediction.
Useful for multi-objective runs where no single best individual is automatically chosen.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
individual
|
Individual
|
Individual whose phenotype will be used as the prediction expression. |
required |
finchge.grammar.range_handlers
ArraySliceHandler()
Bases: RangeHandler
Handles x[:, 0], x[:, 1], etc. This will be useful if we don't have many columns we can just have a grammar like: ::= x[:, 0] | x[:, 1] However, when the number of columns is higher we may need ArraySliceRangeHandler.
ArraySliceRangeHandler()
Bases: RangeHandler
Handles Array slicer with longer range. For example: x[:, 0..4] will expand to x[:, 0], x[:, 1], ..., x[:, 4]
If it's not a range and we just need to pick columns randomly we can use individual array slices.
CharClassHandler()
CharRangeHandler()
NumericRangeHandler()
RangeHandler(priority=0)
Bases: ABC
Base class for all range handlers. This is the interface for different types of range handlers. There are several range handlers currently supported by finchGE BNFGrammarParser. To add custom range support new range handler can be created and should register by calling.
parser.range_registry.register(MyCustomRangeHandler())
All range handlers must inherit from RangeHandler.
pattern
abstractmethod
property
Return regex pattern that matches this range type.
can_handle(token)
abstractmethod
Check if this handler can process the token.
expand(token)
abstractmethod
Expand the range into individual symbols.
RangeHandlerRegistry()
Registry for all range handlers with priority support.
expand_token(token)
Expand a token using the appropriate handler (checking in priority order).
get_combined_pattern()
Get a combined regex pattern for all handlers in priority order.
get_handler(token)
Find a handler that can process the token (checking in priority order).
register(handler)
Register a new handler, maintaining priority order.
SymbolicVariableRangeHandler()
Bases: RangeHandler
Handles variable ranges for evaluation .
Replaces ArraySliceRangeHandler.
Example:
x[0..4] expands to x0, x1, x2, x3, x4
OR
x0..4 expands to x0, x1, x2, x3, x4
SymolicVariableHandler()
Bases: RangeHandler
Handles individual variables like x0, x1, etc in symbolic regression, supporting easy evaluation with SymPy like libraries.
finchge.benchmarks
Benchmark(random_state=None)
Bases: RandomStateMixin, ABC
Abstract base class for all benchmarks.
uses_data()
Check if this benchmark uses traditional train/test data.
Returns:
| Type | Description |
|---|---|
bool
|
True if the benchmark implements _generate_data, False for control problems |
finchge.benchmarks.regression
KozaQuarticBenchmark(random_state=None, train_samples=None, test_samples=None, x_range=None, train_type='uniform', test_type='grid')
Bases: Benchmark
KeijzerBenchmark(version, random_state=None, train_samples=None, test_samples=None)
Bases: Benchmark
NguyenBenchmark(version, random_state=None, train_samples=None, test_samples=None, x_range=None, noise_std=None, train_type='grid', test_type='uniform')
Bases: Benchmark
VladislavlevaBenchmark(version, random_state=None, train_samples=1024, test_samples=5000, x_range=None, noise_std=None)
Bases: Benchmark
finchge.benchmarks.logic
MultiplexerBenchmark(version, random_state=None)
Bases: Benchmark
finchge.benchmarks.control
MazeBenchmark(version='medium', max_steps=None)
Bases: Benchmark
grammar()
Return the Grammar object.
MazeEnvironment(grid, max_steps=200, step_penalty=-1.0, goal_reward=100.0)
Bases: ControlEnvironment
from_version(version='medium')
classmethod
Creating env for given version
get_available_actions()
Returns only the actions that won't result in hitting a wall or going OOB.
MazeInterpreter()
Interpreter for maze navigation programs.
SantaFeTrailBenchmark(random_state=None, max_steps=None)
Bases: Benchmark
Santa Fe Trail (Artificial Ant) benchmark.
The ant must navigate a 32x32 grid with 89 food pellets along a winding trail. Fitness is the number of food pellets eaten within the time limit.
grammar_str()
Return grammar for Santa Fe Trail problem.
SantaFeEnvironment(grid, max_steps, total_food, start_pos, start_dir)
Bases: ControlEnvironment
from_default()
classmethod
Static factory to create environment
get_action_space()
Return complete action space.
get_available_actions()
Return available actions (all actions always available).
SantaFeInterpreter()
Interpreter for Santa Fe ant programs using cleaner grammar.
evaluate(tokens, observation, pc=0)
Evaluate tokenized program.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tokens
|
List[str]
|
List of tokens from tokenize() |
required |
observation
|
bool
|
Current food-ahead sensor value |
required |
pc
|
int
|
Program counter (index into tokens) |
0
|
Returns:
| Type | Description |
|---|---|
Tuple[Optional[str], int]
|
Tuple of (action, new_pc) |
program_to_string(tokens)
Convert tokens back to readable program string. Tokens separated by space
tokenize(program)
Convert program string to list of tokens.
CartPoleBenchmark(random_state=None, max_steps=None, n_episodes=1)
Bases: Benchmark
grammar()
Return the Grammar object.
CartPoleEnvironment(max_steps, gravity, mass_cart, mass_pole, length, force_mag, tau, theta_threshold_degrees, x_threshold, random_state=None)
Bases: ControlEnvironment
CartPoleInterpreter()
Interpreter for cartpole balancing programs.