Statistical data analysis - R

174 阅读5分钟

Basic Operation

  1. data3 <- read.csv("data3.csv") 读取数据时注意,文件名要加双引号
  2. table(df$sex, df$age) give the frequency regarding of sex and age => contigency table, 一个查找关键词也可以
  3. length(x)返回vector/data frame的长度
  4. x <- df[df$smoke== 0,] 提取特定行 where smoke== 0, 注意行提取有逗号 "," df['列名'] 提取特定列
  5. df$smoke <- as.factor(df$smoke) creae factors, 对于factor/nominal/categorical variables 需要 tell R -> no normal order df$smoke <- ordered(df$smoke) creates ordered factors.
  6. cbind() function: combine vectors, matrices, or data frames by column.
  7. tapply(df1$W, df1$A, mean)计算W和A的mean
  8. rep("M", 825), the function rep() is used to repeat the character string "M" 825 times. The resulting output will be a character vector of length 825, with each element of the vector containing the string "M".

Check the normality

  • Draw the Q-Q plot
    x=df$bwt
    qqnorm(x)
    abline(mean(x), sd(x), col=2,lwd=2) 
    
  • Draw the P-P plot
    x=df$bwt
    plot(rank(x)-0.5)/length(x),pnorm(mean=mean(x),sd=sd(x),x),main="p-p plot")
    
  • Shapiro-Wilk Test

H0: The variable xx is normally distributed

H1: The variable xx is not normally distributed

shapiro.test(df$bwt)
  • Kolmogorov-Smirnov Test
    ks.test(df$xx,"pnorm", mean=mean(df$xx), sd=sd(df$xx))
    
  • Carmer-von Mises Test
    library(goftest) 
    cvm.test(df$xx,"pnorm", mean=mean(df$xx), sd=sd(df$xx))
    

画图

用ggplot需要先加载包 library(ggplot2)

# boxplot
p <-ggplot(df, aes(x=smoke, y=bwt, fill=smoke)) # fill可省略
p + geom_boxplot(outlier.colour="red")+
     ggtitle("boxplot of bwt versus smoke groups")+
     xlab("smoke")+
     ylab("birth weight in grams")

Association of Nominal Variable

Check whether A and B are independent, where A and B are factor/ nominal/ categorical variables

  1. Pearson Chi-Square(χ2) Test chisq.test(table, correct=F)
  2. LRT
  3. Fisher's Exact Test fisher.test(table, alternative = c("two.sided", "less", "greater"))
  • Both Fisher’s Exact test and Pearson chi-square test can test the independence between the column and row variable of a 2×2 contingency table. However, Pearson chi-square test can only do the two-sided test. While, Fisher’s Exact test can do the one-sided test (not only whether there exists a association, but also a positive or negative association).

Two-sample z-test

Assumption: the sample x & y are independent

Consider the contigency table, this test is to test if there's a significant different in xxx between Men and Women

Hypothsis: Px -> the proportion of xxx of men
Py-> the proportion of xxx of women
H0: Px = Py
H1: Px ≠ Py

prop.test(x=c(40,23), n=c(80,66), correct=F) # 40,23 为每一类中符合特定行为的数量; 80,66为该类总人数 e.g. Men 80; Female 66

Testing for equality of two variances

98a82da5d0cf2440981a4fc75b3bd26.png

2d978effd0d2406e69159e031947479.png

x <- df[df$smoke== 0,]$bwt
y <- df[df$smoke== 1,]$bwt
varx = var(x)
vary = var(y)
F <- varx/vary
m = length(x)
n = length(y)
p_value <- pf(F, m-1, n-1, lower.tail = FALSE)
p_value*2 #计算两个tail, p_value*2

fe111b51fc2e574343ef35edcc549fa.png

Two-sample t-test

Test whether the means of xxx are equal between the two groups.

H0: μx = μy
H1: μx ≠ μy / μx > μy / μx > μy 需根据实际情况选择, 注意计算两边tail , p_value*2

x <- df[df$smoke== 0,]$bwt
y <- df[df$smoke== 1,]$bwt

x_bar = mean(x)
y_bar = mean(y)
varx = var(x)
vary = var(y)
m = length(x)
n = length(y)
s2 = ((m-1)*varx + (n-1)*vary) / (m+n-2)
t_obs = (x_bar - y_bar) / sqrt(s2*(1/m + 1/n)) 

# p-value
p_value = 1-pt(t_obs,m+n-2) #注意用1-
cat('p-value: ', p_value*2 , "\n")

Linear regression model

lm(bwt ~ (race) + age + lwt + (smoke) + ptl + (ht) + ui + ftv, data = df)
The parentheses( ) around certain variables indicate that they are categorical variables or factors, they will be converted into binary indicator variables by the lm() function & 同时也要用 df$smoke <- as.factor(df$smoke) 把他们变成 factors

  • Find the best regression model by using backward/forward/stepwise elimination (selection) method:
library(leaps)
fit = regsubsets(bwt~(race)+age+lwt+(smoke)+ptl+(ht)+ui+ftv, data=df, method="backward") # backward method 结果从下往上看
summary(fit) # R返回selection的过程
# fit the submodels 根据返回selection过程fit submodels
fit1=lm(bwt~(race)+lwt+(smoke)+ptl+(ht)+ui+ftv, data=df)
fit2=lm(bwt~(race)+lwt+(smoke)+ptl+(ht)+ui, data=df)
fit3=lm(bwt~(race)+lwt+(smoke)+(ht)+ui, data=df)
fit4=lm(bwt~(race)+(smoke)+(ht)+ui, data=df)
fit5=lm(bwt~(race)+(smoke)+ui, data=df)
fit6=lm(bwt~(race)+ui, data=df)
fit7=lm(bwt~ui, data=df)
print(c(summary(full_model)$adj.r.squared,summary(fit1)$adj.r.squared,summary(fit2)$adj.r.squared,summary(fit3)$adj.r.squared,summary(fit4)$adj.r.squared,summary(fit5)$adj.r.squared,summary(fit6)$adj.r.squared,summary(fit7)$adj.r.squared)) #通过想使用的指标选择最好的model
# 这里用的是 adjusted 𝑹𝟐 criterion、R2、Min_Max Accuracy=> the larger the better
# explain: about R2*100% percentage can be estimated by the model
library(olsrr) #for the ols_mallows_cp() function, which calculates the Mallows' Cp statistic for model selection => smaller is better, closer to p+1 and small p, where p is the number of variables
# AIC、BIC、MAPE(Mean absolute percentage error)、MSE=> the lower the better
print(cbind(c(ols_mallows_cp(fit1, fit2.full),
              ols_mallows_cp(fit2, fit2.full),
              ols_mallows_cp(fit3, fit2.full),
              ols_mallows_cp(fit4, fit2.full),
              ols_mallows_cp(fit5, fit2.full),
              ols_mallows_cp(fit6, fit2.full)),
            c(summary(fit1)$r.squared,summary(fit2)$r.squared,
              summary(fit3)$r.squared,summary(fit4)$r.squared,
              summary(fit5)$r.squared,summary(fit6)$r.squared),
            c(summary(fit1)$adj.r.squared,summary(fit2)$adj.r.squared,
              summary(fit3)$adj.r.squared,summary(fit4)$adj.r.squared,
              summary(fit5)$adj.r.squared,summary(fit6)$adj.r.squared)))
  • Compare the model in (d) and the best model selected in (e). Which model is better?
    anova(fit3, full_model)to obtain the F-statistic and p-value.
    Through comparing the two models, we can determine if adding additional predictors to the model significantly improves the fit.

The F-statistic is used to test the null hypothesis that the simpler model (fit3) is just as good as the more complex model (full_model), against the alternative hypothesis that the more complex model provides a better fit.

ANOVA table

Difference between Regression and ANOVA :
Regression-> investigate the effect of one or more continous independent variables on Y
ANOVA-> investigate the effect of one or more categorical independent variables on Y

  • One-way ANOVA

used to test the null hypothesis that the means of k independent normal populations with a common variance are identical
H0: μ1 = μ2 = ... = μk
v.s. H1: at least two means are not equal

-e.g. Conduct a hypothesis test to test whether the means are equal of all six treatment*exercise groups.

onewayanova = aov(score ~ factor(interaction(treatment,exercise)),data = data3)
summary(onewayanova)

Here the interaction(treatment, exercise) function creates a new factor by combining the levels of the treatment and exercise variables. The factor() function then converts this combined variable into a factor so that it can be used in the ANOVA model.

Three assumptions:

  1. Independence
  2. Constant variance
  3. Normality

When a one-way ANOVA is performed, it assumes that the group variances are statistically equal.

  • So we need to test the Homogeneity of Variance/ equality of group variances first before conducting ANOVA
    Null hypothesis: the variances are equal across all groups
    Alternative hypothesis: at least one variance is different
  1. Bartlett's Test bartlett.test(x~grps,dat)
    bartlett.test(score ~ interaction(treatment, exercise), data = data3)
    -Noticed that when we need to specify the interaction between the two categorical variables, use interaction() function.
    -Note that the bartlett.test function assumes that the data are normally distributed within each group. If this assumption is not met, the test may not be appropriate and alternative tests, such as the Levene's test, may be more suitable.
    The below 2 tests are less senstivie to departures from normality
  2. Levene's Test
library(car)
leveneTest(score ~ treatment*exercise, data = data3)
  1. Brown-Forsythe Test
  • Write down the R output ANOVA table for a specifc model
# build the model
model <- lm(score~age+treatment+exercise+treatment*exercise+age*treatment+age*exercise+age*exercise*treatment,data=data3)
library(car) # Use the Anova function under package "car"
Anova(model, type="II")
  • Multiple Comparisons

After we perform an ANOVA and rejected the null hypothesis, we know that at least two group means are different. Then we consider methods for multiple comparsion (pairwise comparisons) to identify significant difference in group means.
H0: μi = μj

  1. Fisher's leat significant difference (LSD) method

res = aov(dat$x~ dat$grps)
summary(res) # From this, obtain the DFerror = Residuals-Df, MSerror= Residuals-Mean Sq 

library(agricolae)
out = LSD.test(dat$x, dat$grps, DFerror=15, MSerror=0.386,p.adj="none") #获得LSD
out
#用每两个mean(从out的结果$groups获得)的差的绝对值与LSD比较,> LSD, reject H0

The function: LSD.test(y, trt, DFerror, MSerror, alpha = 0.05, p.adj=c(“none”,“holm”,“hommel”, “hochberg”, “bonferroni”, “BH”, “BY”, “fdr”), group=TRUE, main = NULL,console=FALSE)

Arguments: y: model(aov or lm) or answer of the experimental unit

trt: Constant( only y=model) or vector treatment applied to each experimental unit

DFerror: Degrees of freedom of the experimental error

MSerror: the mean squared error

  1. Bonferroni's method/ correction/ adjustment
    k groups -> c=k(k-1)/2 comparisons, the significance level for each comparison is α/c
# conduct t test for each pair
# Bonferroni adjustment
pairwise.t.test(data3$score,interaction(data3$treatment,data3$exercise), p.adj = "bonferroni") 
# 注意:这里specify dataset 只能用在variables前面加data$的形式, 后面加",data=data3"是无效的

注意把获得的结果与α/c比较得到结论

  1. Tukey's method
# Tukey’s HSD
Para <- aov(x~factor(grps), data=dat)
summary(Para)

TukeyHSD(Para, "factor(grps)", ordered = TRUE)

  • Non-parametric method - Kruskal-Wallis Test

When the assumption of ANOVA is not met, like, group variances are different and also are not normally distributed, the result provided by F-test is not reliable. We therefore consider the use of Kruskal-Wallis test on equality of distributions
H0 : there is no difference between the median scores for all three chefs, μA=μB=μC
H1 : there is difference between the median scores for all three chefs.

  1. Calculation by hand:
# 共有18个记录, 首先给所有的记录rank,分组并求出每组rank的平均值
rankA = rank(x)[1:6]
rankB = rank(x)[7:12]
rankC = rank(x)[13:18]
c(mean(rankA), mean(rankB), mean(rankC))

image.png

H=12N(N+1)i=1kniRi23(N+1)H = \frac{12}{N(N+1)} \sum_{i=1}^k n_i\overline{R}_{i}^2 - 3(N+1)

where N indicates the number of observations across the group
ni: No. of observations in the i-th group
Ri2\overline{R}_{i}^2: i-th group 的 average rank (刚刚获得的mean们) 求得H 后, 用1-pchisq(H, 2)算p-value

  1. Use R function: kruskal.test(x~grps, data=dat)

ANCOVA table

Covariance analysis combines ANOVA and linear regression analysis
To include 1+ continuous variables (covariates) into the ANOVA model.

-Assumptions of one-way ANCOVA:

  1. Linearity between the covariate and the outcome variable
    Check by creating a grouped scatter plot of covariate and the outcome variable
  2. The outcome variable should be approximatley Normally DIstributed
    Shapiro-Wilk test
  3. Homoscedasticity/homogeneity of residuals variance for all groups The residuals are assumed to have a constant variance
  4. No significant outliers in the groups

-General Model in covariance Analysis
yij=μ+αi+βxij+βixij+ϵij,where αk=βk=0 and ϵijy_{ij}= \mu+\alpha_i +\beta x_{ij}+\beta_ix_{ij}+\epsilon_{ij},\\ where \ \alpha_k=\beta_k =0\ and\ \epsilon_{ij}~N(0,σ2), i=1,2,...,k; j=1,2,...,niN(0,\sigma^2),~i=1,2,...,k;~j=1,2,...,n_i
1st testing hypothesis:
H0:β1=β2=...=βk1=0H_0:\beta_1=\beta_2=...=\beta_{k-1}=0
v.s. H1:H_1: at least one of the β\beta's is not zero\

# test the significance of the interaction term 
anova(lm(W ~ A + B, data = df1), model.2)
# model.2 is the model with interaction term

获得p-value,和5%比较. >0.05, the null hypothesis is not rejected. if H0H_0 cannot be rejected -> equal slope
Consider yij=μ+αi+βxij+ϵij, ϵijy_{ij}= \mu+\alpha_i +\beta x_{ij}+\epsilon_{ij},\ \epsilon_{ij}~N(0,σ2)N(0,\sigma^2)
test the hypothese: H0x:β=0H_0^x:\beta=0 (no x effect)
v.s. H1x:β0H_1^x:\beta\neq0 (no treatment effect)
H0treatment:α1=α2=...=αk1=0H_0^{treatment}:\alpha_1=\alpha_2=...=\alpha_{k-1}=0 (no treatment effect)
v.s. H1treatment:H_1^{treatment}: at least one of the α\alpha's is not zero
If both H1x and H1treatmentH_1^x\ and\ H_1^{treatment}are true, consider multiple comparisons for all pairs of treatments based on the least squares means
-The mean of Y for treatment i at X = x\overline{x} is E(Y|X = x\overline{x})=μ+αi+βx\mu+\alpha_i+\beta\overline{x}, corresponding least squares mean is μ^+α^i+β^x\hat\mu+\hat\alpha_i+\hat\beta\overline{x}
-The mean of Y for treatment k at X = x\overline{x} is E(Y|X = x\overline{x})=μ+βx\mu+\beta\overline{x}, corresponding least squares mean is μ^+β^x\hat\mu+\hat\beta\overline{x}

Two-Way Analysis of Variance

Two-way ANOVA:Two nominal independent variable
-Two-way interaction plot

interaction.plot(x.factor = df1$B, trace.factor = df1$A, 
                 response = df1$Days, fun = mean, 
                 type = "b", legend = TRUE, 
                 xlab = "B", ylab="Days",
                 pch=c(1,20), col = c("#00AFBB", "#E7B800"))

If there is no interaction, the difference will be the same regardless of the level of the other factor.
Plot of means- >parallel
-Estimate σ^2 based on the final model

We use MSE (Mean Square of Error) to estimate σ^2

sum(residuals(model.3)^2) /56 # 56 = degrees of freedom

Generalized Linear Model (GLM)

GLM for Binary Data - logit Model

  • Fit a logistic regression model
# 1. 把所需的class variables变成factor
data4$diabetes = as.factor(data4$diabetes)
data4$age_bucket = as.factor(data4$age_bucket)
data4$preg_bucket = as.factor(data4$preg_bucket)

# 2. 为每一个class variables设定相应的reference level
data4$age_bucket = relevel(data4$age_bucket, ref = "20-30")
data4$preg_bucket = relevel(data4$preg_bucket, ref = "10+")

# fit logistic regression model
fit1 <-glm(diabetes~glucose+pressure+triceps+insulin+mass+pedigree+age_bucket+preg_bucket, family = "binomial"("logit"), data = data4)
# ("logit") refers to use logit link function specifically.
# 在glm中加上`weights = count`可以根据每一个group的count数建logistic regression model

代入获得的系数获得logistic regression logit(π)=log(π1π)=6.8451+1.8271FIBRINlogit(\pi)=\log(\frac{\pi}{1-\pi})=−6.8451+1.8271∗FIBRIN
the probabitlity π=P(Y=1)=exp(刚刚获得的公式)1+exp(公式)\pi=P(Y=1)=\frac{\exp(刚刚获得的公式)}{1+\exp(公式)}

  • Estimated probability
glm.probs = predict(glm.fit, newdata=data.frame(FIBRIN=2.2), type = "response", se.fit=T,interval="prediction")
# newdata放进用来预估的值,注意需要求的prob是Y=0还是1
glm.probs

adequate 充分的 -> 使用该model

  • Report the exponentiated regression coefficients. Interpret each exponentiated regression coefficients in terms of odds ratio.
    To find to the estimated odds-ratio between group A and group B, use exp(coef(model)), notice that the model need to set reference class to group B
  • Confidence intervals
# confint(fit3)
exp(confint(fit3)) # choose the line about A

-If the confidence interval does not include 1, indicating that the odds ratio is significantly greater than 1, i.e. children with age 3 are significantly more likely to believe in Santa than children with age 6
-If the confidence interval include 1, indicating that the odds ratio is not significantly difference from 1, i.e. there is no significant difference between the odds of believing in Santa for children of age 4 and age 5.

  • Conduct a hypothesis test to check whether there is a difference between the coefficients for age_bucket 31-40 and age_bucket 50+ levels
# Wald test for difference between coefficients for age_bucket 31-40 and age_bucket 50+ levels
# Create the contrast matrix
L <- matrix(c(0, 0, 0, 0, 1, 0, -1), nrow = 1, ncol = 7)
# 用L matrix来表示想要验证的两个coefficient的关系
library(aod)
# Perform the Wald test
wald.test(b = coef(fit2), Sigma = vcov(fit2), L = L)
  • Do you think that the sample size is sufficiently large for the asymptotic theory of maximum likelihood estimation to hold in this case? Explain.
fit0 <- glm(success~1, family = "binomial", data = df)

library(lmtest)
lrtest(fit1, fit0)

library(aod)
wald.test(b = coef(fit1), Sigma = vcov(fit1), Terms = 2:12) # test the significance of the 2-12 terms

通过lrt test和wald test 获得chi-square和p-value, 比较
If good enough, the difference of the two test should have be small. Since if the sampke size is small to moderate, the Wald test is the least reliable.

  • Goodness-of-fit
    1.Test for the global significance of this model (Goodness of fit test ) Goodness of fit test in logistic regression model is based on the likelihood ratio test (or Deviance).
#  likelihood ratio test 
# -- Full model: logit(pi) = intercept + beta1*Age3 + beta2*Age4 + beta3*Age5
# -- Null model: logit(pi) = intercept
fit0 <- glm(cbind(bYes,bNo)~1, family = "binomial", data = santa3)
# `~1` refers a null model
library(lmtest)
lrtest(fit3, fit0)

通过lrtest获得p-value,更长的那个好不好。 if p-value < 0.05,the model is significant at the 5% level of significance. Our model as a whole fits significantly better than an empty model.
2.Deviance

fit0 <- glm(diabetes~1, family = "binomial",data = data4)
summary(fit0)
summary(fit2)
#通过两个summary获得residual deviance和DF
#两两相减获得p_value
pchisq(498.1-341.42,391-385,lower.tail = FALSE)
  • estimated probabilities

image.png 根据不同的输入variable, 选择不同的coefficent from summary(model)代入进xβx\beta部分,算出每一部分的probabilities
Then we can obtain the estimated odds by the formula:

image.png

GLM for Count Data - Poisson Model

Let Y be a Poisson random variable with μ=E(Y)\mu=E(Y)
The probability mass function of Possion (μ\mu) is f(y;μ)=eμμyy!=exp(ylog(μ)μ)log(y!))f(y;\mu)=\frac{e^{\mu}\mu^y}{y!}=exp(ylog(\mu)-\mu)-log(y!)) Loglink g(μ)=log(μ)Log-link\ g(\mu)=log(\mu) canonical link
The sum of independment Poisson random variables is also Poisson.

b332df03d3b1978def45b43c84117bf.jpg

fb9ed137e4bb4292a9f5007117ba92d.jpg

ee26451faa66f62424cf3361106e147.jpg

654feb55fa3f2988dce9c9e15fe0f86.jpg

  • Fit Poisson regression model
# Read Publication Data
pub <- read.csv(file = "publications.csv", header = TRUE, sep = ",")
pub$gen <- relevel(pub$gen, ref="Male") #也需要先对categorical variable设置reference class
pub$mar <- relevel(pub$mar, ref="Single")

# Fit Poisson regression model(Model 1)
fit1 <- glm(art ~ gen + mar + kid5 + phd + ment + gen:kid5, 
             family = "poisson", data = pub)
summary(fit1)
drop1(fit1, test = "LR")
#Refine the model, fitting a reduced model without the interaction term as it is the most insignifican 重复以上直至没有interaction term可以被remove -> final fitted model
  • Give an interpretation for each parameter estimate in the final fitted model.
    The estimated constant term β0^\hat{\beta_0}=0.3452 is just a baseline value that does not have a specific meaning.

The coefficient for the gender effect is estimated as βgen^\hat{\beta_{gen}}=−0.2253, indicating that female PhD biochemists produced less publications than male PhD biochemists, when the other variables are kept fixed. The average ratio is estimated as e0.2253=0.7983e^{−0.2253}=0.7983, i.e. about 20.2% = (1-0.7983)*100% less in the expected number of articles published by female PhD biochemists in last three years.

The coefficient for the marital effect is estimated as βmar^\hat{\beta_{mar}}=0.1522, indicating that married PhD biochemists produced more publications than single PhD biochemists, when the other variables are kept fixed. The average ratio is estimated as  e0.1522=1.1644e^{0.1522}=1.1644, i.e. about 16.4% more in the expected number of articles published by married PhD biochemists in last three years.

  • Assume the model in (b) is valid. Find a 95% confidence interval for the mean number of articles published by a female scientist who is married, graduated from a program with prestige rank 3.5, has two kids, and a mentor having 10 published articles.
predict = predict(model2, newdata=data.frame(fem=factor(1),mar=factor(1),phd=3.5,kid5=2,ment=10), type = "response", se.fit=T,interval="prediction")
predict #用predict出来的expected number和se算CI
# CI = expected number +/- 1.96 * SE
lower=1.173127-1.96*0.09010848 
higher=1.173127+1.96*0.09010848 
cat("95% confidence interval:", lower, "-", higher, "\n")
# 有了两个分别的,求diff的CI: CI_diff = (expected value_1 - expected value_2) +/- 1.96 * sqrt(SE_1^2 + SE_2^2)

image.png

negative binomial regression model

In the Poission log-linear model, the assumption is Var(Y)=E(Y)
Over-dispersion: Var(Y) > E(Y)
Here negative binomial distributuion is appropriate to describe Over-dispersion phenomenon.

fitting a negative binomial regression model with log-link:

# Fit Negative Binomial regression model
fit4 <- glm.nb(art ~ gen + mar + kid5 + phd + ment + gen:kid5, 
               link = log, data = pub)
summary(fit4)

image.png -比较Negative binomial和Poisson:

  1. 从两个模型的summary中找到Residual Deviance/相应的df比较,哪一个更close to 1, indicating a more adequate fit than the other model where Deviance/df=1.800. \
  2. The lack-of-fit of the Poisson model is probably due to over-dispersion, as also indicated by the estimated dispersion ϕ\phi=1/Theta from the summary from the negative binomial fit.
    ϕ\phi是否close to 0, 若成立,Negative binomial → Poisson

Classification table

  • Calculate the predicted responses for all observations in the original data, then compare the predicted responses with the true data. Report the sensitivity and specificity.
# Make predictions for all observations
predictions=predict(fit2,type = "response")

# Convert predictions to binary format (0 or 1) using a threshold of 0.5
predicted_diabetes <- ifelse(predictions > 0.5, 1, 0)

# Compare predicted diabetes status with true diabetes status
table(data4$diabetes, predicted_diabetes)

032d4339ea16dfe6ab087ca6ae12716.jpg

524c3f9dea1739b527d932e45f39c75.jpg The proportion of correct classification -> accuracy

d3f786ad6377cf060d2492d687f219e.jpg