---------------------------------------------------------------------------- PENTACON NSAID Project Curation Princeton University, NJ, USA University of Pennsylvania, PA, USA ---------------------------------------------------------------------------- Readme file name: README Protein_DNA_Interactions_20141009_preliminary.txt Readme for the following file: Protein_DNA_Interactions_20141009_preliminary.txt Version: preliminary Date: 10/09/2014 Corresponding gene file 1: AAP_genelist_20130725_production.txt Version: production Date: 07/25/2013 Corresponding gene file 2: BP_AAE_genelist_20130725_preliminary.txt Version: preliminary Date: 07/25/2013 Curation Overview ----------------- Curation was performed by PENTACON curators based on information available in the primary published literature (reviews were excluded). The curation process involved reading papers containing studies on the gene regulation and transcription factor binding at regulatory regions of genes listed in the "Gene Lists" section below. Here REGULATOR protein is defined as the protein interacting with the TARGET DNA. Direct DNA binding is annotated, as well as experiments demonstrating that the REGULATOR affects transcription of the TARGET DNA in vivo or in vitro. TARGETs are designated by the PENTACON project as genes involved in arachidonic acid metabolism. Gene Lists: ----------- The curation in this file has been carried out for a subset of genes from the following gene lists: AAP_genelist_20130725_production.txt file: This gene list includes information on the curated "Gene Set" AAP (Arachidonic Acid Pathway). BP_AAE_genelist_20130725_preliminary.txt This gene list contains information on the curated "Gene Set" AAE (Arachidonic Acid Extended) which includes genes related to the arachidonic acid pathway not already present in the list of arachidonic acid metabolism genes (AAP_genelist_20130725_production.txt). It also contains information on the "Gene Set" BP (Blood Pressure) which is used for genes related to the phenotype of blood pressure. Each gene list includes information on the various curated "Gene Sets" and "Gene Set Qualifiers" that are used to rank the genes as being "Gold Standard (Direct)", "Likely (Indirect)", or "Predicted" based on evidence codes and details for each gene, as described below: Evidence codes used in gene lists: The evidence code C is used to denote review articles. The evidence code E is used for articles that present experimental evidence including, but not limited to, tissue distribution and enzyme characterization. The evidence code P is used for publications that (1) predict presence based on evidence in mice/rabbits (2) use bioinformatics tools to identify human genes and (3) contain non-traceable author statements. Bioinformatics approaches would include using conserved sequence motifs to identify candidate genes, using a known human gene to identify sequences with significant identity (and finding cDNA in EST database). If there are multiple Pubmed IDs in the PubMed ID column, but only one evidence code in the Evidence Type column, it means that all Pubmed IDs were assigned the same evidence code. If there are multiple reference codes, each evidence code correlates with each corresponding PMID in the PubMed ID column. Gold Standard (Direct): The gene set qualifier "Gold Standard" is assigned when experimental evidence demonstrates involvement of the gene in the arachidonic acid metabolism and arachidonic acid remodeling pathways. Experimental evidence means that an enzyme has been assayed with substrates that are in these pathways, a receptor binds ligands in these pathways, or the protein interacts with another protein in the pathway. Genes assigned the "Gold Standard" gene set qualifier can be used for computational analyses. Likely (Indirect): The gene set qualifier "Likely" is assigned when genes ‘likely’ participate in the arachidonic acid metabolism and remodelling pathway. Genes are assigned "Likely" when there is experimental evidence for the predicted/expected activity for a relevant probe substrate, but not definitive experimental evidence for participation in the arachidonic acid metabolism and remodelling pathway. For example, an enzyme expected to be involved in AA remodelling for which activity was demonstrated using palmitic, oleic, or linoleic acid, but not arachidonic acid, would be assigned the gene set qualifier "Likely". These genes can be included in a computational analysis based on programmer discretion. Predicted: The gene set qualifier "Predicted" is assigned when genes have been inferred to be involved in the arachidonic acid metabolism and remodelling pathways based on (1) evidence from other organisms or (2) homology. "Predicted" is assigned to genes for which participation in the arachidonic acid pathway is purely predicted and/or gene products have not been characterized. These genes should not be used in a computational analysis. Curated Isoforms ---------------------- Isoform information for interactors is captured from the literature when possible and specified with UniProt identifiers (UniProt Data Value columns). Isoforms may be products of alternative splicing, or from the use of alternate start sites for transcription or translation and are considered Alternative Products at UniProt (UniProt Data Type columns). Uniprot appends a dash-number suffix (-1, -2, -3, etc.) to IDs to specify the isoform (e.g. P04180-1;Isoform 1 for LCAT). UniProt entries that only have a full-length gene product documented and do not have any Alternative products specified are not assigned a dash-number suffix at UniProt. However, in this file, PENTACON forced the assignment of -1 to full length proteins participating in protein-protein-DNA interactions involving at least one interactor that is an isoform. An example in REGULATOR, ADDITIONAL Uniprot Data Value column: P18146-1|P19544-2|P19544-4 means that EGR1 (full length), WT1 isoform 2 and WT1 isoform 4 were bound to each other. The Data Type for the entry was made to be "Comment/alternative products/isoform". In the PENTACON Notes column the note indicates this assignment by PENTACON: Full length protein P18146 entered as P18146-1 in Regulator, Additional Data Value. This example refers to PENTACON Annotation No:4000000220 in the curation file. Curation files: ---------------------- Public file(s): Protein_DNA_Interactions_20141009_preliminary.txt Experimentally determined protein-DNA interactions influencing expression of AAP genes README Protein_DNA_Interactions_20141009_preliminary.txt Readme for above file. Columns for "Protein_DNA_Interactions_20130816_preliminary.txt" file: -------------------------------------------------------------- (Notes provided for specific columns where applicable.) Curator Name REGULATOR UniProt ID ID assigned by UniProt REGULATOR NCBI Gene ID ID assigned by NCBI REGULATOR Gene Name Official gene name (NCBI and UniProt) REGULATOR Type REGULATOR Species Name REGULATOR Taxonomy ID NCBI taxonomic identifier REGULATOR UniProt Data Type UniProt category from which isoform information was taken for REGULATOR REGULATOR UniProt Data Value UniProt identifier for REGULATOR protein isoforms REGULATOR UniProt Version Date Date of UniProt record used in this curation REGULATOR UniProt Version UniProt version number for record used in this curation REGULATOR Part of Complex Indicates whether authors describe REGULATOR as part of a complex (whether or not they present data pertaining to the relevant protein-protein interaction(s)) ; regulators identified specifically as not being part of a complex were also flagged. REGULATOR Name of Complex Indicates name of complex discussed by authors BioGRID PPI Curation Indicates, with the phrase "Protein Complex", that the paper reports protein-protein interactions involving the regulator that have been curated for the BioGRID database. Note that the relevant annotation in BioGRID will also contain the phrase "Protein Complex" for ease in identification. REGULATOR, ADDITIONAL UniProt ID(s) UniProt ID(s) of additional protein(s) affecting REGULATOR's role in modulating gene expression of TARGET, including proteins directly bound to the DNA-bound REGULATOR REGULATOR, ADDITIONAL NCBI Gene ID(s) NCBI ID(s) of additional protein(s) affecting REGULATOR's role in modulating gene expression of TARGET, including proteins directly bound to the DNA-bound REGULATOR REGULATOR, ADDITIONAL Gene Name(s) Official gene name(s) of additional protein(s) affecting REGULATOR's role in modulating gene expression of TARGET, including proteins directly bound to the DNA-bound REGULATOR REGULATOR, ADDITIONAL Type REGULATOR, ADDITIONAL Species Name(s) REGULATOR, ADDITIONAL Taxonomy ID NCBI taxonomic identifier of additional protein(s) affecting REGULATOR's role in modulating gene expression of TARGET, including proteins directly bound to the DNA-bound REGULATOR REGULATOR, ADDITIONAL UniProt Data Type UniProt category from which isoform information was taken for REGULATOR, ADDITIONAL REGULATOR, ADDITIONAL UniProt Data Value UniProt identifier for REGULATOR, ADDITIONAL isoforms REGULATOR, ADDITIONAL UniProt Version Date Date of UniProt record used in this curation REGULATOR, ADDITIONAL UniProt Version UniProt version number for record used in this curation TARGET UniProt ID ID assigned by UniProt TARGET NCBI Gene ID ID assigned by NCBI TARGET Gene Name Official gene name (NCBI and UniProt) TARGET Type TARGET Species Name TARGET Taxonomy ID NCBI taxonomic identifier TARGET UniProt Data Type UniProt category from which isoform information was taken for TARGET TARGET UniProt Data Value UniProt identifier for TARGET isoforms TARGET UniProt Version Date Date the UniProt record version was created TARGET UniProt Version UniProt version number for record used in this curation Experiment Type Evidence Code Ontology Experiment Type ID Evidence Code Ontology ID Molecular Interactions Ontology Term Molecular Interactions Ontology Molecular Interactions Ontology ID GO Term Gene Ontology GO ID Uberon Term Uber Anatomy Ontology Uberon ID BRENDA Tissue Ontology Term BRENDA - The Comprehensive Enzyme Information System BRENDA Tissue Ontology ID Human Disease Ontology Term Human Disease Ontology Human Disease Ontology ID ChEBI Chemical Term Chemical Entities of Biological Interest CHEBI Chemical ID Reviewed (Y/N) Reviewed by a PENTACON curator (Y=yes, N=no) Use (Y/N) Curation for that row is valid and can be used (Y=yes, N=no) Source Typically the PubMed ID PENTACON Notes Internal PENTACON curator notes REGULATOR Gene Set Abbreviation used to indicate the corresponding gene collection relevant to a certain biological pathway or network in which this protein is designated by PENTACON, e.g. AAP is used for genes in the 'arachidonic acid pathway'. Gene sets were not assigned to non-human genes at the time of this curation. REGULATOR, ADDITIONAL Gene Set Abbreviation used to indicate the corresponding gene collection relevant to a certain biological pathway or network in which this protein is designated by PENTACON, e.g. AAP is used for genes in the 'arachidonic acid pathway'. Gene sets were not assigned to non-human genes at the time of this curation. TARGET Gene Set Abbreviation used to indicate the corresponding gene collection relevant to a certain biological pathway or network in which this protein is designated by PENTACON, e.g. AAP is used for genes in the 'arachidonic acid pathway'. Gene sets were not assigned to non-human genes at the time of this curation. PENTACON Annotation No Unique annotation ID assigned by PENTACON. These IDs only exist in the flat file released 8/16/2013. Some rows contain "PAN" instead of an ID. These are rows that were corrected and need to be assigned a new PAN. Obsolete PAN These are PAN IDs that should be obsoleted because the row they were associated with in the 8/16/2013 released file has been updated/corrected, and hence should be assigned a new PAN. Obsolete PANs don't exist in the database, they only exist in the flat file released 8/16/2013. ---------------------------------------------------------------------------- For questions please contact Rose Oughtred (rose at genomics.princeton.edu). ----------------------------------------------------------------------------