2025-04-09
\[ y_i = \beta_0 + \beta_1 x_i, \ i = 1, \ldots,n \]
\[ y_i \approx \beta_0 + \beta_1 x_i, \ i = 1, \ldots,n \]
\[ S_t = \sum_{i=1}^n (y_i - \beta_0 - \beta_1 x_i)^2. \]
\[ \frac{\partial S_t}{\partial \beta_0} = -2 \sum_{i=1}^n(y_i - \beta_0 - \beta_1 x_i), \]
\[ \frac{\partial S_t}{\partial \beta_1} = -2 \sum_{i=1}^n x_i (y_i - \beta_0 - \beta_1 x_i), \]
\[ \hat{\beta}_1 = \frac{\sum_{i=1}^n x_i y_i - (\sum_{i=1}^nx_i \sum_{i=1}^n y_i)/n}{\sum_{i=1}^n x_i^2 - (\sum_{i=1}^nx_i)^2/n}. \]
\[ \hat{\beta}_0 = \bar{y} - \hat{\beta}_1 \bar{x}. \]
\[ S_{xy} = \sum_{i=1}^n (x_i-\bar{x}) (y_i - \bar{y}), \]
y
\[ S_{xx} = \sum_{i=1}^n (x_i-\bar{x})^2, \ \ S_{yy} = \sum_{i=1}^n (y_i-\bar{y})^2, \]
entonces
\[ \hat{\beta}_1 = \frac{S_{xy}}{S_{xx}}. \]
\[ \hat{y}_i = \hat{\beta}_0 + \hat{\beta}_1 x_i. \] - El residuo es
\[ e_i = y_i - \hat{y}_i. \]
\[ Y_i | x_i \sim N(\beta_0 + \beta_1 x_i,\sigma^2), \]
y que los distintos \(Y_i\) son independientes entre sí.
\[ \mathbf Y = \begin{bmatrix} Y_1 \\ \vdots \\ Y_n \end{bmatrix}; \ \ \mathbf X = \begin{bmatrix} 1 & x_1 \\ \vdots& \vdots \\ 1 & x_n \end{bmatrix}; \ \ \mathbf \beta = \begin{bmatrix} \beta_0 \\ \beta_1 \end{bmatrix}; \ \ \mathbf \epsilon = \begin{bmatrix} \epsilon_1 \\ \vdots \\ \epsilon_n \end{bmatrix} \] - El modelo de regresión lineal simple se puede formular como \[ \mathbf Y = \mathbf X \mathbf \beta + \mathbf \epsilon. \] - El vector de errores aleatorios \(\mathbf \epsilon\) sigue una distribución normal multivariante \[ \mathbf \epsilon \sim N_2(\mathbf 0_p,\sigma^2 \mathbf I_p). \]
Leemos los datos.
La primera es la más ortodoxa.
print
nos muestra de un objeto de clase lm
.coef
o coefficients
.(Intercept) time
7.9684362 -0.1330703
(Intercept) time
7.9684362 -0.1330703
predict
aplicado al objeto de clase lm
nos proporciona las predicciones para las observaciones.GSM618324.CEL.gz GSM618325.CEL.gz GSM618326.CEL.gz GSM618327.CEL.gz
7.968436 7.968436 7.835366 7.835366
GSM618328.CEL.gz GSM618329.CEL.gz GSM618330.CEL.gz GSM618331.CEL.gz
7.170014 7.170014 4.774749 4.774749
GSM618332.CEL.gz GSM618333.CEL.gz GSM618334.CEL.gz GSM618335.CEL.gz
7.968436 7.968436 7.835366 7.835366
GSM618336.CEL.gz GSM618337.CEL.gz GSM618338.CEL.gz GSM618339.CEL.gz
7.170014 7.170014 4.774749 4.774749
GSM618340.CEL.gz GSM618341.CEL.gz GSM618342.CEL.gz GSM618343.CEL.gz
7.968436 7.968436 7.835366 7.835366
GSM618344.CEL.gz GSM618345.CEL.gz GSM618346.CEL.gz GSM618347.CEL.gz
7.170014 7.170014 4.774749 4.774749
residuals
o resid
nos proporciona los residuos.GSM618324.CEL.gz GSM618325.CEL.gz GSM618326.CEL.gz GSM618327.CEL.gz
0.3545741 0.8777147 -0.9836174 0.5831095
GSM618328.CEL.gz GSM618329.CEL.gz GSM618330.CEL.gz GSM618331.CEL.gz
-1.2520196 0.9791937 -0.4345729 0.3560575
GSM618332.CEL.gz GSM618333.CEL.gz GSM618334.CEL.gz GSM618335.CEL.gz
-0.1650657 0.5102758 -1.2935869 0.8132667
GSM618336.CEL.gz GSM618337.CEL.gz GSM618338.CEL.gz GSM618339.CEL.gz
-1.6008608 1.1191170 -0.4417852 -0.2617990
GSM618340.CEL.gz GSM618341.CEL.gz GSM618342.CEL.gz GSM618343.CEL.gz
-0.4191948 0.8119677 -1.1244602 1.1839258
GSM618344.CEL.gz GSM618345.CEL.gz GSM618346.CEL.gz GSM618347.CEL.gz
-1.0545087 0.2315680 0.4836551 0.7270455
GSM618324.CEL.gz GSM618325.CEL.gz GSM618326.CEL.gz GSM618327.CEL.gz
0.3545741 0.8777147 -0.9836174 0.5831095
GSM618328.CEL.gz GSM618329.CEL.gz GSM618330.CEL.gz GSM618331.CEL.gz
-1.2520196 0.9791937 -0.4345729 0.3560575
GSM618332.CEL.gz GSM618333.CEL.gz GSM618334.CEL.gz GSM618335.CEL.gz
-0.1650657 0.5102758 -1.2935869 0.8132667
GSM618336.CEL.gz GSM618337.CEL.gz GSM618338.CEL.gz GSM618339.CEL.gz
-1.6008608 1.1191170 -0.4417852 -0.2617990
GSM618340.CEL.gz GSM618341.CEL.gz GSM618342.CEL.gz GSM618343.CEL.gz
-0.4191948 0.8119677 -1.1244602 1.1839258
GSM618344.CEL.gz GSM618345.CEL.gz GSM618346.CEL.gz GSM618347.CEL.gz
-1.0545087 0.2315680 0.4836551 0.7270455
summary
aplicado a un objeto lm
.
Call:
lm(formula = expression ~ time, data = df0)
Residuals:
Min 1Q Median 3Q Max
-1.6009 -0.5772 0.2931 0.7483 1.1839
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 7.96844 0.23130 34.451 < 2e-16 ***
time -0.13307 0.01868 -7.122 3.84e-07 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.8836 on 22 degrees of freedom
Multiple R-squared: 0.6975, Adjusted R-squared: 0.6837
F-statistic: 50.73 on 1 and 22 DF, p-value: 3.842e-07
$names
[1] "call" "terms" "residuals" "coefficients"
[5] "aliased" "sigma" "df" "r.squared"
[9] "adj.r.squared" "fstatistic" "cov.unscaled"
$class
[1] "summary.lm"
(Intercept) time
GSM618324.CEL.gz 1 0
GSM618325.CEL.gz 1 0
GSM618326.CEL.gz 1 1
GSM618327.CEL.gz 1 1
GSM618328.CEL.gz 1 6
GSM618329.CEL.gz 1 6
GSM618330.CEL.gz 1 24
GSM618331.CEL.gz 1 24
GSM618332.CEL.gz 1 0
GSM618333.CEL.gz 1 0
GSM618334.CEL.gz 1 1
GSM618335.CEL.gz 1 1
GSM618336.CEL.gz 1 6
GSM618337.CEL.gz 1 6
GSM618338.CEL.gz 1 24
GSM618339.CEL.gz 1 24
GSM618340.CEL.gz 1 0
GSM618341.CEL.gz 1 0
GSM618342.CEL.gz 1 1
GSM618343.CEL.gz 1 1
GSM618344.CEL.gz 1 6
GSM618345.CEL.gz 1 6
GSM618346.CEL.gz 1 24
GSM618347.CEL.gz 1 24
attr(,"assign")
[1] 0 1
summary
tenemos los contrastes.
Call:
lm(formula = expression ~ time, data = df0)
Residuals:
Min 1Q Median 3Q Max
-1.6009 -0.5772 0.2931 0.7483 1.1839
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 7.96844 0.23130 34.451 < 2e-16 ***
time -0.13307 0.01868 -7.122 3.84e-07 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.8836 on 22 degrees of freedom
Multiple R-squared: 0.6975, Adjusted R-squared: 0.6837
F-statistic: 50.73 on 1 and 22 DF, p-value: 3.842e-07
Source | SS | df | MS | F | p |
---|---|---|---|---|---|
Between | SS(B) | I-1 | \(\frac{SS(B)}{I-1}\) | \(\frac{SS(B)/ (I-1)}{SS(W)/(n-I)}\) | \(P(> F)\) |
Within | SS(W) | n-I | \(\frac{SS(W)}{n - I}\) | ||
Total | SS(B) + SS(W) |
Consideramos las variables que indican las categorías y que consideran como categoría de referencia el primer grupo.
\[ E[Y_{1j}] = \beta_0 \] y \[ E[Y_{ij}] = \beta_0 + \beta_i \] para \(i=2, \ldots, I\). De un modo conjunto: \[ E[Y_{ij}] = \beta_0 + \beta_2 v_{2j} + \ldots + \beta_I v_{Ij} \] donde \(v_{ij} = 1\) si estamos en el grupo \(i\) y cero en otro caso.
La hipótesis nula de que no hay diferencias entre las medias de los distintos grupos vendría formulada como \[ H_0: \beta_2 = \ldots = \beta_I =0. \]
Analizamos los datos correspondientes a la sonda 261892_at
.
Consideramos las variables fenotípicas time2
y Pi
.
time time2 Pi replication
GSM618324.CEL.gz 0 Short Treatment 1
GSM618325.CEL.gz 0 Short Control 2
time2Pi = vector("list",ncol(gse25171))
for(i in seq_along(time2Pi))
time2Pi[[i]] = paste0(pData(gse25171)[,"time2"][i],
pData(gse25171)[,"Pi"][i])
time2Pi = factor(unlist(time2Pi))
levels(time2Pi)
[1] "MediumControl" "MediumTreatment" "ShortControl" "ShortTreatment"
data.frame
en el que consideramos la expresión de la sonda y la variable que acabamos de construir.sel0 = which("261892_at"==fData(gse25171)[,"PROBEID"])
df1 = data.frame(time2Pi,expression=exprs(gse25171)[sel0,])
summary(df1[,"time2Pi"])
MediumControl MediumTreatment ShortControl ShortTreatment
6 6 6 6
Call:
lm(formula = expression ~ time2Pi, data = df1)
Residuals:
Min 1Q Median 3Q Max
-1.98463 -0.62805 0.04225 0.54559 1.79155
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.4976 0.4007 16.217 5.66e-13 ***
time2PiMediumTreatment -1.2419 0.5666 -2.192 0.040407 *
time2PiShortControl 2.2010 0.5666 3.884 0.000922 ***
time2PiShortTreatment 0.7991 0.5666 1.410 0.173828
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.9814 on 20 degrees of freedom
Multiple R-squared: 0.6607, Adjusted R-squared: 0.6098
F-statistic: 12.98 on 3 and 20 DF, p-value: 6.219e-05
time
y Pi
Analizamos los datos correspondientes a la sonda 261892_at
.
Consideramos las variables fenotípicas time
y Pi
.
time time2 Pi replication
GSM618324.CEL.gz 0 Short Treatment 1
GSM618325.CEL.gz 0 Short Control 2
(Intercept) time PiTreatment
GSM618324.CEL.gz 1 0 1
GSM618325.CEL.gz 1 0 0
GSM618326.CEL.gz 1 1 1
GSM618327.CEL.gz 1 1 0
GSM618328.CEL.gz 1 6 1
GSM618324.CEL.gz GSM618325.CEL.gz GSM618326.CEL.gz GSM618327.CEL.gz
1.01552765 0.21676116 -0.32266386 -0.07784405
GSM618328.CEL.gz GSM618329.CEL.gz GSM618330.CEL.gz GSM618331.CEL.gz
-0.59106601 0.31824015 0.22638066 -0.30489608
GSM618332.CEL.gz GSM618333.CEL.gz GSM618334.CEL.gz GSM618335.CEL.gz
0.49588789 -0.15067778 -0.63263328 0.15231308
GSM618336.CEL.gz GSM618337.CEL.gz GSM618338.CEL.gz GSM618339.CEL.gz
-0.93990720 0.45816343 0.21916840 -0.92275255
GSM618340.CEL.gz GSM618341.CEL.gz GSM618342.CEL.gz GSM618343.CEL.gz
0.24175878 0.15101414 -0.46350659 0.52297219
GSM618344.CEL.gz GSM618345.CEL.gz GSM618346.CEL.gz GSM618347.CEL.gz
-0.39355515 -0.42938559 1.14460870 0.06609189
time2
y Pi
Analizamos los datos correspondientes a la sonda 261892_at
.
Consideramos las variables fenotípicas time2
y Pi
.
time time2 Pi replication
GSM618324.CEL.gz 0 Short Treatment 1
GSM618325.CEL.gz 0 Short Control 2
(Intercept) time2Medium PiTreatment
GSM618324.CEL.gz 1 0 1
GSM618325.CEL.gz 1 0 0
GSM618326.CEL.gz 1 0 1
GSM618327.CEL.gz 1 0 0
GSM618328.CEL.gz 1 1 1
Df Sum Sq Mean Sq F value Pr(>F)
time2 1 26.99 26.992 29.36 2.24e-05 ***
Pi 1 10.48 10.485 11.41 0.00285 **
Residuals 21 19.30 0.919
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Df Sum Sq Mean Sq F value Pr(>F)
time2 1 26.992 26.992 28.02 3.51e-05 ***
Pi 1 10.485 10.485 10.88 0.00358 **
time2:Pi 1 0.038 0.038 0.04 0.84370
Residuals 20 19.264 0.963
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
GSEAlm
(Intercept) time PiTreatment
GSM618324.CEL.gz 1 0 1
GSM618325.CEL.gz 1 0 0
GSM618326.CEL.gz 1 1 1
GSM618327.CEL.gz 1 1 0
GSM618328.CEL.gz 1 6 1
GSM618329.CEL.gz 1 6 0
Y los estadísticos correspondientes a los tests de coeficientes nulos con
limma
(Intercept) pData(gse25171)[, "time"]
244919_at 5.086290 -0.004315513
244920_s_at 8.371483 -0.004977996
pData(gse25171)[, "Pi"]Treatment
244919_at -0.11665034
244920_s_at -0.08573999