2025-03-11
The data set can be found at GEO gse21942.
Let us modify the rownames
.
We download the processed data set.
The samples have the same position.
We take the phenotypic variables from the processed data.
We modify the name of the last phenotypic variable.
The samples GSM545845
and GSM545846
are technical replications. We remove them from the ExpressionSet
. First, we can seen that they are the last two samples.
The new ExpressionSet
would be
a = AnnotationDbi::select(hgu133plus2.db,
keys=featureNames(gse21942a),
columns=c("ENTREZID","ENSEMBL"),
keytype="PROBEID")
a = a[!is.na(a[,"ENTREZID"]),] ## Eliminamos sondas sin ENTREZID
c1 = match(unique(a[,1]),a[,1])
a1 = a[c1,]
c2 = match(unique(a1[,2]),a1[,2])
a2 = a1[c2,]
dim(a2)
gse21942 = gse21942a[match(a2[,1],featureNames(gse21942a)),]
fData(gse21942) = a2
all(featureNames(gse21942) == a2$PROBEID) ## Comprobamos la correspondencia