Inventa

Logo

a computational tool to discover chemical novelty in natural extracts libraries

Version 1.0

View the Project on GitHub luigiquiros/inventa

Getting started

To run the repository the minimun inputs needed are:

This will run at least the Feature component (FC).

Optionally, the following inputs can be added:

Where to start?

Check the inputs

1.1 Metadata table:

The standard format from GNPS is prefered:

metadata_filename: it uses the GNPS format.

While creating the ‘metadata’ there are some MANDATORY columns:

Taxonomy: the species, genus and family are neeeded if the LC and CC want to be computated. The taxonomy should be cleaned to uptoday recognized names, you can use the Open Tree of Life.

The headers for each one could follow the GNPS format or the user’s preferences, how ever the following parameter need to be indicated:

        species_column = 'yourspeciesnamecolumn'  #ATTRIBUTE_species
        genus_column =   'yourgenusnamecolumn'    #ATTRIBUTE_genus
        family_column =  'yourfamilynamecolumn'   #ATTRIBUTE_family
        organe_column =  'yourorganenamecolumn'   #ATTRIBUTE_organie
        filename_header = 'yourfilenamecolumn'    #filename

The organe _colum should be specified if you have diferent parts (or solvents) from the same species. If you prefer to use only the filename as identifier for the resuts, it can be specified directly in the notebook.

1.2 Feature quantitative table:

quantitative_data_filename: MZmine output format using only the ‘Peak area’, ‘row m/z’ and ‘row retention time’ columns.

-Inventa takes input directly from MZmine2 or MZmine 3, is possible to use other processing sofwares , however the input should be manually formated to a MZmine 2 format.

-Inventa is capable to performe the calcultions based on the results from Ion Identity, reducing the total number of features.

        df.drop('name of the colum', axis=1, inplace=True)

1.3 In silico annotation usign timaR:

tima_results_filename: timaR reponderated output format.

1.4 Chemical taxonomy results:

canopus_npc_summary_filename: Sirius CANOPUS recomputated output format.

This output needs an additional step after runnign sirius, please follow the next instructions:

        from canopus import Canopus
        C = Canopus(sirius="sirius_projectspace")
        C.npcSummary().to_csv("npc_summary.tsv")

1.5 Annotations with Sirius:

sirius_annotations_filename: Sirius annotations output format. Containing Zodiac and Cosmic.

1.6 Memo dissimilarity matrix:

vectorized_data_filename: MEMO package output format.

Examples of all these input could be found in /format_examples

Continue to Configurations and running

Back to home page