2025-02-27
Las bases de datos de tipo ChipDb, OrgDb y TxDb heredan todos los métodos de la clase AnnotationDbi::AnnotationDb.
Podemos aplicar los métodos: columns, keytypes,keys y select.
Cargamos AnnotationDbi y un paquete de anotación, por ejemplo, hgu133a.db.
[1] "hgu133a" "hgu133a_dbconn" "hgu133a_dbfile"
[4] "hgu133a_dbInfo" "hgu133a_dbschema" "hgu133a.db"
[7] "hgu133aACCNUM" "hgu133aALIAS2PROBE" "hgu133aCHR"
[10] "hgu133aCHRLENGTHS" "hgu133aCHRLOC" "hgu133aCHRLOCEND"
[13] "hgu133aENSEMBL" "hgu133aENSEMBL2PROBE" "hgu133aENTREZID"
[16] "hgu133aENZYME" "hgu133aENZYME2PROBE" "hgu133aGENENAME"
[19] "hgu133aGO" "hgu133aGO2ALLPROBES" "hgu133aGO2PROBE"
[22] "hgu133aMAP" "hgu133aMAPCOUNTS" "hgu133aOMIM"
[25] "hgu133aORGANISM" "hgu133aORGPKG" "hgu133aPATH"
[28] "hgu133aPATH2PROBE" "hgu133aPFAM" "hgu133aPMID"
[31] "hgu133aPMID2PROBE" "hgu133aPROSITE" "hgu133aREFSEQ"
[34] "hgu133aSYMBOL" "hgu133aUNIPROT"
Con el nombre del paquete también tenemos información.
ChipDb object:
| DBSCHEMAVERSION: 2.1
| Db type: ChipDb
| Supporting package: AnnotationDbi
| DBSCHEMA: HUMANCHIP_DB
| ORGANISM: Homo sapiens
| SPECIES: Human
| MANUFACTURER: Affymetrix
| CHIPNAME: Affymetrix HG-U133A Array
| MANUFACTURERURL: http://www.affymetrix.com
| EGSOURCEDATE: 2021-Apr14
| EGSOURCENAME: Entrez Gene
| EGSOURCEURL: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA
| CENTRALID: ENTREZID
| TAXID: 9606
| GOSOURCENAME: Gene Ontology
| GOSOURCEURL: http://current.geneontology.org/ontology/go-basic.obo
| GOSOURCEDATE: 2021-02-01
| GOEGSOURCEDATE: 2021-Apr14
| GOEGSOURCENAME: Entrez Gene
| GOEGSOURCEURL: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA
| KEGGSOURCENAME: KEGG GENOME
| KEGGSOURCEURL: ftp://ftp.genome.jp/pub/kegg/genomes
| KEGGSOURCEDATE: 2011-Mar15
| GPSOURCENAME: UCSC Genome Bioinformatics (Homo sapiens)
| GPSOURCEURL:
| GPSOURCEDATE: 2021-Feb16
| ENSOURCEDATE: 2021-Feb16
| ENSOURCENAME: Ensembl
| ENSOURCEURL: ftp://ftp.ensembl.org/pub/current_fasta
| UPSOURCENAME: Uniprot
| UPSOURCEURL: http://www.UniProt.org/
| UPSOURCEDATE: Mon Apr 26 21:53:12 2021
En los paquetes de anotación tenemos columns. Algunas de estas columnas pueden ser keys. Podemos realizar consultas en la base de datos utilizando una key y pedir que nos devuelva una o más de una columns.
¿Qué información podemos recuperar utilizando select?
[1] "ACCNUM" "ALIAS" "ENSEMBL" "ENSEMBLPROT" "ENSEMBLTRANS"
[6] "ENTREZID" "ENZYME" "EVIDENCE" "EVIDENCEALL" "GENENAME"
[11] "GENETYPE" "GO" "GOALL" "IPI" "MAP"
[16] "OMIM" "ONTOLOGY" "ONTOLOGYALL" "PATH" "PFAM"
[21] "PMID" "PROBEID" "PROSITE" "REFSEQ" "SYMBOL"
[26] "UCSCKG" "UNIPROT"
No todas las variables que hemos obtenido con columns son utilizables para realizar consultas.
Aquellas utilizables para las consultas las podemos conocer con keytypes.
A estas variables las llamamos llaves keys.
[1] "ACCNUM" "ALIAS" "ENSEMBL" "ENSEMBLPROT" "ENSEMBLTRANS"
[6] "ENTREZID" "ENZYME" "EVIDENCE" "EVIDENCEALL" "GENENAME"
[11] "GENETYPE" "GO" "GOALL" "IPI" "MAP"
[16] "OMIM" "ONTOLOGY" "ONTOLOGYALL" "PATH" "PFAM"
[21] "PMID" "PROBEID" "PROSITE" "REFSEQ" "SYMBOL"
[26] "UCSCKG" "UNIPROT"
[1] "10" "100" "1000" "10000" "100008586" "10001"
[1] "ENSG00000121410" "ENSG00000175899" "ENSG00000291190" "ENSG00000171428"
[5] "ENSG00000156006" "ENSG00000196136"
[1] "208696_at" "221786_at" "210096_at" "207957_s_at" "201862_s_at"
AnnotationDbi::select(hgu133a.db,keys=ids,columns=c("ENTREZID","ENSEMBL",
"SYMBOL"),keytype="PROBEID")
PROBEID ENTREZID ENSEMBL SYMBOL
1 208696_at 22948 ENSG00000150753 CCT5
2 221786_at 387263 ENSG00000185127 C6orf120
3 210096_at 1580 ENSG00000142973 CYP4B1
4 207957_s_at 5579 ENSG00000166501 PRKCB
5 201862_s_at 9208 ENSG00000124831 LRRFIP1
¿Qué sondas tienen correspondencia en Entrez?
Por ejemplo, la primera posición de la lista nos da el identificador de Affymetrix y su identificador Entrez.
[1] "hgu133a" "hgu133a_dbconn" "hgu133a_dbfile"
[4] "hgu133a_dbInfo" "hgu133a_dbschema" "hgu133a.db"
[7] "hgu133aACCNUM" "hgu133aALIAS2PROBE" "hgu133aCHR"
[10] "hgu133aCHRLENGTHS" "hgu133aCHRLOC" "hgu133aCHRLOCEND"
[13] "hgu133aENSEMBL" "hgu133aENSEMBL2PROBE" "hgu133aENTREZID"
[16] "hgu133aENZYME" "hgu133aENZYME2PROBE" "hgu133aGENENAME"
[19] "hgu133aGO" "hgu133aGO2ALLPROBES" "hgu133aGO2PROBE"
[22] "hgu133aMAP" "hgu133aMAPCOUNTS" "hgu133aOMIM"
[25] "hgu133aORGANISM" "hgu133aORGPKG" "hgu133aPATH"
[28] "hgu133aPATH2PROBE" "hgu133aPFAM" "hgu133aPMID"
[31] "hgu133aPMID2PROBE" "hgu133aPROSITE" "hgu133aREFSEQ"
[34] "hgu133aSYMBOL" "hgu133aUNIPROT"
ids = c("39730_at", "1635_at", "1674_at", "40504_at", "40202_at")
pacman::p_load("hgu95av2.db")
columns(hgu95av2.db)
[1] "ACCNUM" "ALIAS" "ENSEMBL" "ENSEMBLPROT" "ENSEMBLTRANS"
[6] "ENTREZID" "ENZYME" "EVIDENCE" "EVIDENCEALL" "GENENAME"
[11] "GENETYPE" "GO" "GOALL" "IPI" "MAP"
[16] "OMIM" "ONTOLOGY" "ONTOLOGYALL" "PATH" "PFAM"
[21] "PMID" "PROBEID" "PROSITE" "REFSEQ" "SYMBOL"
[26] "UCSCKG" "UNIPROT"
[1] "ACCNUM" "ALIAS" "ENSEMBL" "ENSEMBLPROT" "ENSEMBLTRANS"
[6] "ENTREZID" "ENZYME" "EVIDENCE" "EVIDENCEALL" "GENENAME"
[11] "GENETYPE" "GO" "GOALL" "IPI" "MAP"
[16] "OMIM" "ONTOLOGY" "ONTOLOGYALL" "PATH" "PFAM"
[21] "PMID" "PROBEID" "PROSITE" "REFSEQ" "SYMBOL"
[26] "UCSCKG" "UNIPROT"
columns = c("PFAM","SYMBOL")
AnnotationDbi::select(hgu95av2.db, keys=ids, columns, keytype="PROBEID")
PROBEID PFAM SYMBOL
1 39730_at PF07714 ABL1
2 39730_at PF00017 ABL1
3 39730_at PF08919 ABL1
4 39730_at PF00018 ABL1
5 1635_at PF07714 ABL1
6 1635_at PF00017 ABL1
7 1635_at PF08919 ABL1
8 1635_at PF00018 ABL1
9 1674_at PF00017 YES1
10 1674_at PF07714 YES1
11 1674_at PF00018 YES1
12 40504_at PF01731 PON2
13 40202_at PF00096 KLF9
[1] "ACCNUM" "ALIAS" "ENSEMBL" "ENSEMBLPROT" "ENSEMBLTRANS"
[6] "ENTREZID" "ENZYME" "EVIDENCE" "EVIDENCEALL" "GENENAME"
[11] "GENETYPE" "GO" "GOALL" "IPI" "MAP"
[16] "OMIM" "ONTOLOGY" "ONTOLOGYALL" "PATH" "PFAM"
[21] "PMID" "PROSITE" "REFSEQ" "SYMBOL" "UCSCKG"
[26] "UNIPROT"
[1] "ACCNUM" "ALIAS" "ENSEMBL" "ENSEMBLPROT" "ENSEMBLTRANS"
[6] "ENTREZID" "ENZYME" "EVIDENCE" "EVIDENCEALL" "GENENAME"
[11] "GENETYPE" "GO" "GOALL" "IPI" "MAP"
[16] "OMIM" "ONTOLOGY" "ONTOLOGYALL" "PATH" "PFAM"
[21] "PMID" "PROSITE" "REFSEQ" "SYMBOL" "UCSCKG"
[26] "UNIPROT"
[1] "ENSG00000121410" "ENSG00000175899" "ENSG00000291190" "ENSG00000171428"
[5] "ENSG00000156006" "ENSG00000196136"
[1] "GO:0002764" "GO:0003674" "GO:0005576" "GO:0005615" "GO:0005886"
[6] "GO:0008150"
ENTREZID SYMBOL
1 1 A1BG
2 2 A2M
3 3 A2MP1
4 9 NAT1
5 10 NAT2
ENSEMBL GO EVIDENCE ONTOLOGY
1 ENSG00000000003 GO:0005515 IPI MF
2 ENSG00000000003 GO:0005886 IBA CC
3 ENSG00000000003 GO:0039532 IMP BP
4 ENSG00000000003 GO:0043123 HMP BP
5 ENSG00000000003 GO:0043124 IDA BP
6 ENSG00000000003 GO:0070062 HDA CC
Y los términos GO serían
GOID
1 GO:0005515
2 GO:0005886
3 GO:0039532
4 GO:0043123
5 GO:0043124
6 GO:0070062
TERM
1 protein binding
2 plasma membrane
3 negative regulation of cytoplasmic pattern recognition receptor signaling pathway
4 positive regulation of canonical NF-kappaB signal transduction
5 negative regulation of canonical NF-kappaB signal transduction
6 extracellular exosome
[1] "CDSCHROM" "CDSEND" "CDSID" "CDSNAME" "CDSSTART"
[6] "CDSSTRAND" "EXONCHROM" "EXONEND" "EXONID" "EXONNAME"
[11] "EXONRANK" "EXONSTART" "EXONSTRAND" "GENEID" "TXCHROM"
[16] "TXEND" "TXID" "TXNAME" "TXSTART" "TXSTRAND"
[21] "TXTYPE"
[1] "CDSID" "CDSNAME" "EXONID" "EXONNAME" "GENEID" "TXID" "TXNAME"
[1] "chr2L" "chr2R" "chr3L" "chr3R" "chr4" "chrX"
[7] "chrU" "chrM" "chr2LHet" "chr2RHet" "chr3LHet" "chr3RHet"
[13] "chrXHet" "chrYHet" "chrUextra"
columns = c("TXNAME", "TXSTART","TXEND","TXSTRAND")
AnnotationDbi::select(txdb, keysGENEID, columns, keytype="GENEID")
GENEID TXNAME TXSTRAND TXSTART TXEND
1 FBgn0000003 FBtr0081624 + 2648220 2648518
2 FBgn0000008 FBtr0100521 + 18024494 18060339
3 FBgn0000008 FBtr0071763 + 18024496 18060346
4 FBgn0000008 FBtr0071764 + 18024938 18060346
5 FBgn0000014 FBtr0306337 - 12632936 12655767
6 FBgn0000014 FBtr0083388 - 12633349 12653845
7 FBgn0000014 FBtr0083387 - 12633349 12655300
8 FBgn0000014 FBtr0300485 - 12633349 12655474
GRanges object with 5384 ranges and 2 metadata columns:
seqnames ranges strand | tx_id tx_name
<Rle> <IRanges> <Rle> | <integer> <character>
[1] chr2L 7529-9484 + | 1 FBtr0300689
[2] chr2L 7529-9484 + | 2 FBtr0300690
[3] chr2L 7529-9484 + | 3 FBtr0330654
[4] chr2L 21952-24237 + | 4 FBtr0309810
[5] chr2L 66584-71390 + | 5 FBtr0306539
... ... ... ... . ... ...
[5380] chr2L 22892306-22918560 - | 5380 FBtr0331166
[5381] chr2L 22892306-22918647 - | 5381 FBtr0111127
[5382] chr2L 22959606-22960915 - | 5382 FBtr0111241
[5383] chr2L 22959606-22961179 - | 5383 FBtr0111239
[5384] chr2L 22959606-22961179 - | 5384 FBtr0111240
-------
seqinfo: 1 sequence from dm3 genome
GRanges object with 13850 ranges and 1 metadata column:
seqnames ranges strand | exon_id
<Rle> <IRanges> <Rle> | <integer>
[1] chr2L 7529-8116 + | 1
[2] chr2L 8193-8589 + | 2
[3] chr2L 8193-9484 + | 3
[4] chr2L 8229-9484 + | 4
[5] chr2L 8668-9484 + | 5
... ... ... ... . ...
[13846] chr2L 22959606-22959815 - | 13846
[13847] chr2L 22959877-22960833 - | 13847
[13848] chr2L 22959877-22960876 - | 13848
[13849] chr2L 22959877-22960915 - | 13849
[13850] chr2L 22960932-22961179 - | 13850
-------
seqinfo: 1 sequence from dm3 genome
GRanges object with 5384 ranges and 3 metadata columns:
seqnames ranges strand | tx_id tx_name
<Rle> <IRanges> <Rle> | <integer> <character>
[1] chr2L 7529-9484 + | 1 FBtr0300689
[2] chr2L 7529-9484 + | 2 FBtr0300690
[3] chr2L 7529-9484 + | 3 FBtr0330654
[4] chr2L 21952-24237 + | 4 FBtr0309810
[5] chr2L 66584-71390 + | 5 FBtr0306539
... ... ... ... . ... ...
[5380] chr2L 22892306-22918560 - | 5380 FBtr0331166
[5381] chr2L 22892306-22918647 - | 5381 FBtr0111127
[5382] chr2L 22959606-22960915 - | 5382 FBtr0111241
[5383] chr2L 22959606-22961179 - | 5383 FBtr0111239
[5384] chr2L 22959606-22961179 - | 5384 FBtr0111240
gene_id
<CharacterList>
[1] FBgn0031208
[2] FBgn0031208
[3] FBgn0031208
[4] FBgn0263584
[5] FBgn0067779
... ...
[5380] FBgn0250907
[5381] FBgn0250907
[5382] FBgn0086683
[5383] FBgn0086683
[5384] FBgn0086683
-------
seqinfo: 1 sequence from dm3 genome
GRanges object with 11003 ranges and 1 metadata column:
seqnames ranges strand | cds_id
<Rle> <IRanges> <Rle> | <integer>
[1] chr2L 7680-8116 + | 1
[2] chr2L 8193-8589 + | 2
[3] chr2L 8193-8610 + | 3
[4] chr2L 8229-8610 + | 4
[5] chr2L 8668-9276 + | 5
... ... ... ... . ...
[10999] chr2L 22959877-22960833 - | 10999
[11000] chr2L 22959877-22960873 - | 11000
[11001] chr2L 22959877-22960876 - | 11001
[11002] chr2L 22960932-22960995 - | 11002
[11003] chr2L 22960932-22961048 - | 11003
-------
seqinfo: 1 sequence from dm3 genome
En los tres casos hemos obtenido un objeto GRanges.
A este tipo de objetos podemos aplicar otros métodos que nos dan información adicional.
[1] 11003
factor-Rle of length 11003 with 2 runs
Lengths: 5489 5514
Values : + -
Levels(3): + - *
GRangesList object of length 2986:
$FBgn0000018
GRanges object with 1 range and 2 metadata columns:
seqnames ranges strand | tx_id tx_name
<Rle> <IRanges> <Rle> | <integer> <character>
[1] chr2L 10973443-10975273 - | 4279 FBtr0080168
-------
seqinfo: 1 sequence from dm3 genome
$FBgn0000052
GRanges object with 3 ranges and 2 metadata columns:
seqnames ranges strand | tx_id tx_name
<Rle> <IRanges> <Rle> | <integer> <character>
[1] chr2L 6041178-6045593 - | 3537 FBtr0079221
[2] chr2L 6041178-6045970 - | 3538 FBtr0079219
[3] chr2L 6041178-6045970 - | 3539 FBtr0079220
-------
seqinfo: 1 sequence from dm3 genome
$FBgn0000053
GRanges object with 2 ranges and 2 metadata columns:
seqnames ranges strand | tx_id tx_name
<Rle> <IRanges> <Rle> | <integer> <character>
[1] chr2L 7014861-7023940 - | 3704 FBtr0079431
[2] chr2L 7018085-7023940 - | 3705 FBtr0100353
-------
seqinfo: 1 sequence from dm3 genome
...
<2983 more elements>
GRangesList object of length 2986:
$FBgn0000018
GRanges object with 2 ranges and 2 metadata columns:
seqnames ranges strand | exon_id exon_name
<Rle> <IRanges> <Rle> | <integer> <character>
[1] chr2L 10973443-10974210 - | 11021 <NA>
[2] chr2L 10974268-10975273 - | 11022 <NA>
-------
seqinfo: 1 sequence from dm3 genome
$FBgn0000052
GRanges object with 6 ranges and 2 metadata columns:
seqnames ranges strand | exon_id exon_name
<Rle> <IRanges> <Rle> | <integer> <character>
[1] chr2L 6041178-6042011 - | 8949 <NA>
[2] chr2L 6042075-6044351 - | 8950 <NA>
[3] chr2L 6044428-6045526 - | 8951 <NA>
[4] chr2L 6044428-6045593 - | 8952 <NA>
[5] chr2L 6045894-6045970 - | 8955 <NA>
[6] chr2L 6045921-6045970 - | 8956 <NA>
-------
seqinfo: 1 sequence from dm3 genome
$FBgn0000053
GRanges object with 8 ranges and 2 metadata columns:
seqnames ranges strand | exon_id exon_name
<Rle> <IRanges> <Rle> | <integer> <character>
[1] chr2L 7014861-7016374 - | 9389 <NA>
[2] chr2L 7016437-7017486 - | 9390 <NA>
[3] chr2L 7017545-7017966 - | 9391 <NA>
[4] chr2L 7018085-7018374 - | 9392 <NA>
[5] chr2L 7018149-7018374 - | 9393 <NA>
[6] chr2L 7018431-7019080 - | 9394 <NA>
[7] chr2L 7019135-7019382 - | 9395 <NA>
[8] chr2L 7023529-7023940 - | 9396 <NA>
-------
seqinfo: 1 sequence from dm3 genome
...
<2983 more elements>
GRangesList object of length 4951:
$`1`
GRanges object with 2 ranges and 3 metadata columns:
seqnames ranges strand | cds_id cds_name exon_rank
<Rle> <IRanges> <Rle> | <integer> <character> <integer>
[1] chr2L 7680-8116 + | 1 <NA> 1
[2] chr2L 8193-8610 + | 3 <NA> 2
-------
seqinfo: 1 sequence from dm3 genome
$`2`
GRanges object with 3 ranges and 3 metadata columns:
seqnames ranges strand | cds_id cds_name exon_rank
<Rle> <IRanges> <Rle> | <integer> <character> <integer>
[1] chr2L 7680-8116 + | 1 <NA> 1
[2] chr2L 8193-8589 + | 2 <NA> 2
[3] chr2L 8668-9276 + | 5 <NA> 3
-------
seqinfo: 1 sequence from dm3 genome
$`3`
GRanges object with 2 ranges and 3 metadata columns:
seqnames ranges strand | cds_id cds_name exon_rank
<Rle> <IRanges> <Rle> | <integer> <character> <integer>
[1] chr2L 7680-8116 + | 1 <NA> 1
[2] chr2L 8229-8610 + | 4 <NA> 2
-------
seqinfo: 1 sequence from dm3 genome
...
<4948 more elements>
GRangesList object of length 5384:
$`1`
GRanges object with 1 range and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] chr2L 8117-8192 +
-------
seqinfo: 1 sequence from dm3 genome
$`2`
GRanges object with 2 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] chr2L 8117-8192 +
[2] chr2L 8590-8667 +
-------
seqinfo: 1 sequence from dm3 genome
$`3`
GRanges object with 1 range and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] chr2L 8117-8228 +
-------
seqinfo: 1 sequence from dm3 genome
...
<5381 more elements>
Las regiones UTR 5’ y 3’ agrupadas por transcrito.
GRangesList object of length 4733:
$`1`
GRanges object with 1 range and 3 metadata columns:
seqnames ranges strand | exon_id exon_name exon_rank
<Rle> <IRanges> <Rle> | <integer> <character> <integer>
[1] chr2L 7529-7679 + | 1 <NA> 1
-------
seqinfo: 1 sequence from dm3 genome
$`2`
GRanges object with 1 range and 3 metadata columns:
seqnames ranges strand | exon_id exon_name exon_rank
<Rle> <IRanges> <Rle> | <integer> <character> <integer>
[1] chr2L 7529-7679 + | 1 <NA> 1
-------
seqinfo: 1 sequence from dm3 genome
$`3`
GRanges object with 1 range and 3 metadata columns:
seqnames ranges strand | exon_id exon_name exon_rank
<Rle> <IRanges> <Rle> | <integer> <character> <integer>
[1] chr2L 7529-7679 + | 1 <NA> 1
-------
seqinfo: 1 sequence from dm3 genome
...
<4730 more elements>
GRangesList object of length 4721:
$`1`
GRanges object with 1 range and 3 metadata columns:
seqnames ranges strand | exon_id exon_name exon_rank
<Rle> <IRanges> <Rle> | <integer> <character> <integer>
[1] chr2L 8611-9484 + | 3 <NA> 2
-------
seqinfo: 1 sequence from dm3 genome
$`2`
GRanges object with 1 range and 3 metadata columns:
seqnames ranges strand | exon_id exon_name exon_rank
<Rle> <IRanges> <Rle> | <integer> <character> <integer>
[1] chr2L 9277-9484 + | 5 <NA> 3
-------
seqinfo: 1 sequence from dm3 genome
$`3`
GRanges object with 1 range and 3 metadata columns:
seqnames ranges strand | exon_id exon_name exon_rank
<Rle> <IRanges> <Rle> | <integer> <character> <integer>
[1] chr2L 8611-9484 + | 4 <NA> 2
-------
seqinfo: 1 sequence from dm3 genome
...
<4718 more elements>
1880-letter DNAString object
seq: CTACTCGCATGTAGAGATTTCCACTTATGTTTTCTC...CAGAGAATCTAGTTTTTCAATAAAATTTCCCCAAGT
626-letter AAString object
seq: LLACRDFHLCFLYFQQPRREPTFEQVSACGQQLSPL...LSYLIELSWPRDVLVQ*LVLISDCNRESSFSIKFPQ
AAStringSet object of length 5384:
width seq names
[1] 626 LLACRDFHLCFLYFQQPRREPT...Q*LVLISDCNRESSFSIKFPQ 1
[2] 600 LLACRDFHLCFLYFQQPRREPT...Q*LVLISDCNRESSFSIKFPQ 2
[3] 614 LLACRDFHLCFLYFQQPRREPT...Q*LVLISDCNRESSFSIKFPQ 3
[4] 743 IAVVIHLVHKFV*TLQRFKSFP...H*FPRGRK**E*SEK*IKLYS 4
[5] 1350 RQINT*YVCTYTLNTRSQEKTG...LIITVPKNNFK*CK*IMMLKI 5
... ... ...
[5380] 2389 LDFL*KCDISS*PRPKGKCTFV...*FILYNHIFAN*TI*IKNKLN 5380
[5381] 2365 ITLVKGCVQIIGCYRRLQQLIK...LIYFI*PHICELNYLNKK*IK 5381
[5382] 416 PI*IIQLLIHPM*LMTPVVNLR...KKCLPHRTSLVNFY*KLLSIK 5382
[5383] 486 V*IGH*VEQLIVIKSIGP*K*I...KNVYRTELVL*ISIKNYFLSN 5383
[5384] 471 V*IGH*VEQLIVIKSIGP*K*I...*KMFTAQN*SCKFLLKTTFYQ 5384
cds2seqs = extractTranscriptSeqs(BSgenome.Dmelanogaster.UCSC.dm3,cdsBy(txdb, by="tx"))
translate(cds2seqs)
AAStringSet object of length 4951:
width seq names
[1] 285 MGERDQPQSSERISIFNPPVYT...HDRFNEITQDDKSTVWQRIY* 1
[2] 481 MGERDQPQSSERISIFNPPVYT...QSEMLYFRKKMALEIVDGEL* 2
[3] 273 MGERDQPQSSERISIFNPPVYT...HDRFNEITQDDKSTVWQRIY* 3
[4] 1008 MDAQFEHLCRICAANTKSKTNS...YKGPTSKSHHSMGTRSTRQR* 5
[5] 1008 MDAQFEHLCRICAANTKSKTNS...YKGPTSKSHHSMGTRSTRQR* 6
... ... ...
[4947] 2287 MHPYVPSVLVVVVLAISVKAHI...LRTLSQELLGIPGQKAKDCI* 5380
[4948] 2287 MHPYVPSVLVVVVLAISVKAHI...LRTLSQELLGIPGQKAKDCI* 5381
[4949] 364 MTPVVNLRSKRLADPEVTCFAP...GRQVRAGFYNYDKFKCFQLH* 5382
[4950] 404 MDLYDGIDTRARSSQIDGWSSG...GRQVRAGFYNYDKFKCFQLH* 5383
[4951] 372 MVFGNKNVTDPAGSKNGCQKTN...GRQVRAGFYNYDKFKCFQLH* 5384
[1] "1" "2"
columns = c("SYMBOL","TXNAME")
AnnotationDbi::select(Homo.sapiens, keys, columns, keytype="ENTREZID")
ENTREZID SYMBOL TXNAME
1 1 A1BG uc002qsd.4
2 1 A1BG uc002qsf.2
3 2 A2M uc001qvk.1
4 2 A2M uc009zgk.1
GRanges object with 82960 ranges and 2 metadata columns:
seqnames ranges strand | TXNAME SYMBOL
<Rle> <IRanges> <Rle> | <CharacterList> <CharacterList>
[1] chr1 11874-14409 + | uc001aaa.3 DDX11L1
[2] chr1 11874-14409 + | uc010nxq.1 DDX11L1
[3] chr1 11874-14409 + | uc010nxr.1 DDX11L1
[4] chr1 69091-70008 + | uc001aal.1 OR4F5
[5] chr1 321084-321115 + | uc001aaq.2 <NA>
... ... ... ... . ... ...
[82956] chrUn_gl000237 1-2686 - | uc011mgu.1 <NA>
[82957] chrUn_gl000241 20433-36875 - | uc011mgv.2 <NA>
[82958] chrUn_gl000243 11501-11530 + | uc011mgw.1 <NA>
[82959] chrUn_gl000243 13608-13637 + | uc022brq.1 <NA>
[82960] chrUn_gl000247 5787-5816 - | uc022brr.1 <NA>
-------
seqinfo: 93 sequences (1 circular) from hg19 genome