Synthetic Transcriptional Regulatory Networks (SynTReN)

Category Cross-Omics>Pathway Analysis/Gene Regulatory Networks/Tools

Abstract SynTReN is a ‘network generator’ that creates synthetic transcriptional regulatory networks (TRNs) and produces simulated gene expression data that approximates experimental data.

‘Network topologies’ are generated by selecting subnetworks from previously described ‘regulatory networks’. Interaction kinetics is modeled by equations based on Michaelis-Menten and Hill kinetics.

The manufacturer's results show that the statistical properties of these topologies more closely approximate those of genuine ‘biological networks’ than do those of different types of random graph models. Several user-definable parameters adjust the complexity of the resulting data set with respect to the structure learning algorithms.

According to the manufacturers, this ‘network generation’ technique offers a valid alternative to existing methods. The topological characteristics of the ‘generated networks’ more closely resemble the characteristics of real ‘transcriptional networks’. Simulation of the network scales well to large networks.

SynTReN models different types of ‘biological interactions’ and produces biologically plausible synthetic gene expression data.

SynTReN's bioinformatics motivation --

The development of algorithms to infer the structure of ‘gene regulatory networks’ (GRNs) based on expression data is an important subject in bioinformatics research.

Validation of these algorithms requires benchmark data sets for which the underlying network is known. Since experimental data sets of the appropriate size and design are usually Not available, there is a clear need to generate well-characterized ‘synthetic data sets’ that allow thorough testing of ‘learning algorithms’ in a fast and reproducible manner.

SynTReN's Network topology --

SynTReN produces synthetic ‘transcriptional regulatory networks’ (TRNs) and corresponding ‘microarray data sets’. In these networks, the nodes represent the genes and the edges correspond to the ‘regulatory interactions’ at transcriptional level between the genes. The flow of the data generation process comprises three (3) essential steps.

In the first step, a ‘network topology’ is selected from a known source network using either of two selection strategies. In the second step, ‘transition functions’ and their parameters are assigned to the edges in the network.

In the third step, ‘mRNA expression’ levels for the genes in the network are simulated under different conditions. After optionally adding noise, a data set representing normalized and scaled microarray measurements is obtained.

SynTReN's Selection of subnetworks -

To generate a ‘network topology’ that resembles a true TRN as closely as possible, ‘network structures’ are selected from previously described ‘biological networks’. The choice of source network is user-definable. A single source network at a time is used when generating networks. Two (2) different strategies to select a connected sub-graph from a source graph are implemented.

In the first strategy, called ‘neighbor addition’, a randomly selected node is chosen as an initial seed. Subsequent nodes are added in an iterative process. Only randomly selected nodes that have at least one connection to the current graph are retained.

In an alternative strategy, called ‘cluster addition’, a randomly selected node and all of its neighbors are selected as an initial graph. In each iteration a randomly selected node and all of its neighbors are added to the graph.

Similarly, only nodes that have at least one connection to the current graph are retained. Because of their presence in the original source network, cycles (e.g. feedback loops) can also be encountered in the generated topology.

SynTReN's Background network -

For a real biological microarray experiment it is generally assumed that only parts of the genes of the genome are triggered by the conditions applied. In the manufacturer's set up, the part of the network Not elicited by the simulated ‘experimental conditions’ is modeled by adding background genes.

These background genes increase the dimension of the data set without being a part of the network to be inferred. Their ‘expression values’ are assumed to be constitutive but change in a correlated way as a result of the ‘biological noise’ modeled in the transition functions.

In this way, the ‘background network’ mimics pathways that are Not influenced by the simulated conditions.

SynTReN's Transition functions -

After generating the topology, transition functions representing the ‘regulatory interactions’ between the genes are assigned to the edges in the network. A transition function defines how the mRNA concentration of a gene depends on the mRNA concentrations of each of its input ‘transcription factors’.

SynTReN's Michaelis-Menten and Hill kinetics -

Non-linear functions based on Michaelis-Menten and Hill enzyme kinetic equations are used to model ‘gene regulation’ in steady-state conditions. As a result of this choice, the generation of expression data scales linearly with the number of genes and therefore allows ‘fast simulation’ of large networks comprising thousands of genes.

Biological noise, corresponding to stochastic variations in gene expression, which are unrelated to the applied experimental procedures, are modeled by a function, based on a log-normal distribution superposed on the ‘kinetic equations’.

SynTReN's Choosing interactions types -

‘Regulatory interactions’ between genes can be either activating or inhibiting. When a given gene interacts with more than one regulator, the ‘different regulators’ can either act independently or exhibit more complex effects on their target genes, such as cooperativity, synergism or antagonism. Different possible interactions are implemented.

For each combination of a gene and its regulators, proper enzyme kinetic equations is selected, depending on the number of activators and repressors and on user-defined settings that control the fraction of ‘complex interactions’ (see ‘Generator parameters’ below...).

SynTReN's Setting transition function parameters -

Choosing realistic parameter settings of these equations is a nontrivial task. Except for a few well characterized networks, No data about the parameters for the Michaelis-Menten and Hill functions is available. Therefore, the value of each parameter is chosen from a distribution that allows a large variation of ‘interaction kinetics’ likely to occur in true networks (including linear activation functions, sigmoid functions ...), while avoiding very steep transition functions.

SynTReN's Sampling data --

Below describes how a ‘gene expression’ data set is obtained by simulating the ‘synthetic network’ under different simulated experimental conditions.

Generating gene expression data -

When generating data, the manufacturer assumes that the expression of the genes depends on how changes in external conditions trigger the network.

External conditions are modeled by choosing a ‘gene set’ without regulatory inputs and setting their expression level to a different value for each experiment, in a simulated response to changing experimental conditions. Remaining genes without ‘regulatory inputs’ are assigned a random constitutive expression level.

SynTReN's Adding noise -

After sampling from the network, a data set with mRNA expression levels for all genes is obtained for different simulated conditions. All gene expression values are normalized between 0 and 1, where 0 indicates that No transcription occurred and 1 refers to a maximal level of transcription.

Besides the biological noise, microarrays are subject to random experimental noise. This experimental noise is added to the simulated microarray data and is approximated by a log-normal distribution.

SynTReN's Generator parameters -

To benchmark an algorithm, having access to data sets of an increasing level of complexity is useful. Experience shows that in real data, the difficulty of the ‘structure learning’ task of an inference problem is influenced to a large extent by the topology of the network to be inferred and by the type of the regulatory interactions present. For example, more data is required to resolve interactions that are Not fully exercised.

Initial performance testing of an algorithm can be done on rather easy problems (e.g. small, noiseless networks without synergism or cooperativity between regulators). Increasingly difficult data sets can then be generated to further optimize the ‘inference method’.

The following parameters controlling the ‘gene network generation’ and sampling process are user-definable:

1) The choice of source network;

2) The size of the network in number of nodes;

3) The number of background nodes;

4) The number of available experiments and samples for each condition;

5) The level of stochastic and experimental noise; and

6) The fraction of complex interactions.

System Requirements

Contact manufacturer.

Manufacturer

Manufacturer Web Site SynTReN

Price Contact manufacturer.

G6G Abstract Number 20584

G6G Manufacturer Number 104187