Info for Researchers

Genestorian will be an Open Source web application that will allow researchers to organise collections of model organism strains and recombinant DNA.

Tracking the history of plasmids and strains

The focus of Genestorian is on tracking the history of plasmids and strains in a collection. This involves keeping track of two lineages:

  • The lineages of sequences: How new sequences (in alleles or recombinant DNA) were generated from existing ones.

  • The lineages of strains: What alleles exist in a given strain, and how they were acquired.

Let’s imagine a typical case, in which we generate a new strain by integrating a fragment from a plasmid in a given locus. Then, we mate that strain to generate a new strain that contains a combination of the alleles of the parents:

genealogies
  1. First, a new strain is generated by integrating a fragment of a plasmid in the genome. This creates a new allele sequence, and a new strain.
genealogies
  1. Second, a new strain is generated through mating. No new sequences are generated, but a new strain is generated in which existing alleles are combined in a new way.
genealogies

If we keep track of the lineages of strains and sequences, we can easily navigate our collection and find out how an allele was generated, in which strain this allele is present, its sequence, etc. The ultimate objective is to be able to track the origin of all sequences of our collection up the wild-type sequences that generated them, or the sequences that were received from a repository or collaborator.

Want more details?

Click here to see prototypes of the functionality.

Prototype of strain history

Below you can find a video and a link to a prototype of Genestorian for strain collections. By clicking on a strain, we can access its alleles, and the related strains. We can also click on the alleles in the strain, and find out how they were generated, etc. Note that this prototype is only intended to show the use-case of allele tracking in fission yeast, and does not generate new sequences from parent sequences in silico.

Prototype of sequence history

Below you can find a video and a link to a prototype of ShareYourCloning, a tool that will be integrated in Genestorian, and will be used to document the generation of new sequences from existing ones. At this point it only supports a small amount of cloning strategies, but it illustrates the use-case.

Click here to know more about lineage tracking.

The application will have separated functionality for strains and recombinant DNA, so you will be able to use it for only recombinant DNA if your lab does not use strains (see figure below).

The figure below representes how the information will be stored in the database. Biological resources will be mapped to entities in a database. Abstract entities in the database referred to as sources will store the information of the strain/plasmid generation process. Genestorian will leverage bioinformatic resources to access genomic DNA sequences (like the Ensembl API) or recombinant DNA repositories (like Addgene), and to generate allele and plasmid sequences from existing ones (in silico molecular biology libraries).

database_image
Click here to see the Open Source tools we are using to build Genestorian.
  • To do cloning in silico we use pydna .
  • To show sequence maps we use openVectorEditor .
  • To store the information about molecular cloning, we use json format for now, but in the future we want to move to SBOL . SBOL is a very interesting web standard that has rich functionality to describe sequence features, much better than conventional formats like .gb or SnapGene’s .dna format.

SBOL provides abstract functionality for linking entities to their ancestors through an experimental step, but no specific implementations of concrete cloning steps exist. For example, for PCR amplification, there is neither an established way for linking the template sequence to the primers nor for specifying primer binding sites. Consequently, no software tools are available to systematically generate new entities from existing ones by reading experimental information. This also means that there is no way to verify that the relationships among entities described in a file are correct. We would like to work together with SBOL developers to extend this functionality, but developing a standard requires the consensus of a community and the exploration of many use-cases.

To keep the development of Genestorian lean, we plan to progressively adopt SBOL. In a first step, we will keep our custom schema to describe the cloning steps and use SBOL for the sequences and their features. In the future, if consensual schemas emerge, we will introduce them into Genestorian.

Migrating your collection to Genestorian

As researchers, we understand that this might sound interesting, but that moving your collection from a spreadsheet or equivalent software might seem overwhelming.

database_image

We will develop a progressive adoption system for strain collections, where strains in the old text format and new format can coexist. Strains will undergo up to four steps of documentation (see figure above):

  1. Ancestry and progeny.
  2. Identification of plasmids and alleles listed in the genotype, along with their provenance
  3. Identification of resources used to modify the genomic DNA and generate a new allele
  4. DNA sequences of the alleles.

Starting with a strain spreadsheet, steps 1 and 2 can be partially automated. The rest will depend on the available information; but after step 2, it becomes possible to trace inheritance of alleles and plasmids, which is a major improvement over spreadsheets. Importantly, migrating a small fraction of key strains will be sufficient to get started. For example, among ~12,000 strains in the collection of my current laboratory, only 118 have been used over 20 times to generate new strains in the past 12 years.

Planned features

  • Possibility to add experimental evidence that a cloning step is successfully completed (e.g. sequencing file or gel electrophoresis image).
  • Parallel collection of resources that are planned or in production (have not yet been experimentally generated or verified). These will be selectable for further manipulations, but will not be part of the main collection.
  • Exporting a subset of your collection for a publication in pdf, word, latex, pdf or plain text.
  • Integration with Electronic Laboratory Notebooks, so that you can reference the strains from your Genestorian collection when you document your experiments, or in the inventory management from Electronic Laboratory Notebooks.
  • … any ideas? Write to us at genestorian@gmail.com.

How will I be able to use Genestorian in the future?

All the features of Genestorian are free and Open Source, so you will be able to install it in your institution servers or the cloud. We also plan to offer hosting and pro support as a service in the future.