Critical Thinking Inferential Statistics Wiki
Inferential Statistics Wiki
Confidence Intervals
- Confidence Interval:
- a type of estimate that gives you an interval of numbers. This comes from a calculation of a given data set with unknown population parameters. A confidence interval consists of two types of numbers, a point estimate and a margin of error. (Morgan Correa)
- p'`~~` EBP lower= p' - EBP upper= p' + EBP (ME) EBP= upper-lower/2 p' = upper + lower/2 (April Meadows)
- They can be written as: point estimate `+-` ME ('aka' margin of error) OR as an interval: (lower bound value, upper bound value). (Siobhan Holman)
The point of finding a Confidence Interval is to estimate a population parameter, like average age or percent of males. (Jennie) - To be 95% confident when estimating a population parameter means 95% of similar intervals will contain the actual mean. (Jennie)
- p-m=l
- p+m=h
- to find point estimate (mean): (h+l)/2=p
- to find margin of error: (h-l)/2=m (jennie)
-
Model: Interpreting a Confidence Interval for the Mean
-
General Template:
We are _ 1_% confident that the _2_ is between _3_ and _4_ _5_.
1 – Confidence level as a percent number
2 – Description of the population mean
3 – Lower bound of CI
4 – Upper bound of CI
5 – Units for dataExample: Confidence interval for the mean time taken to drive to work in Chesapeake is (23,25) with a confidence level of 90%.
Interpretation statement: We are 90% confident that the mean commute time to work in Chesapeake is between 23 and 25 minutes.
Model: Interpreting a Confidence Interval for the Proportion
General Template:
We are _ 1_ % confident that the _ 2_ is between _ 3_ and _4_.1 – Confidence level as a percent number
2 – Description of the population proportion
3 – Lower bound of CI (can use percent number)
4 – Upper bound of CI (can use percent number)Example: Confidence interval for the proportion of individuals who use public transportation in Chesapeake is (0.07,0.09) with a confidence level of 95%
Interpretation statement: We are 95% confident that the proportion of individuals that use public transportation in Chesapeake is between 7% and 9%. (jennie)
-
- Population Proportion
- Use a sample proportion to estimate the population proportion
- We make histograms out of many sample proportions
- represented by p'
- ex: if x = # of peanuts, n = total (sample size): x=52, n=100
- p = the percent/proportion of peanuts in a can of mixed nuts
- p'= 52/100= 0.52 = 52% (we use percent for communication and interpretation) (jennie)
- Sampling Distribution:
- the probability distribution of a given statistic based on a random sample. the sample mean is produced by repeatedly selecting simple random samples from the same population and of the same size, and then computing the sample mean for each of these samples. (Brenda Uselman)
- The mean of the sampling distribution of the mean is the same as the population mean. For example, if you have a population mean of 125, then the mean of the sampling distribution of the mean will also be 125, as long as you have a large enough sample size. (Griffen Mattson)
- Confidence Level:
- CL is NOT a probability of success (We never know the exact success)
- CL IS a measure of how confident we are in our procedure.
- Confidence levels are chosen based on the importance of the test. Most statistics are given at a 95% CI, which means that there is a 95% chance that the true mean value is within this range. (April Meadows)
- To be 95% confident when estimating a population parameter means 95% of similar intervals will contain the actual mean. (Jennie)
- How confident we are in our a) sampling technique and/or b) our process of calculating the interval (jennie)
- If it has the word "chance", then we're talking about success and Confidence Level is NOT about success (jennie)
- Point Estimate:
- A number taken from a sample and use to estimate a population parameter
(point estimate - error bound, point estimate + error bound) (Casandra Jensen)
- A number taken from a sample and use to estimate a population parameter
- Error Bound on a Population Proportion:
- Formula used:
p'=x/n
(q'=1-p') (Casandra Jensen)
- Formula used:
- Error Bound on a Population Mean
- Formula used:
(Casandra Jensen)
- Formula used:
- Student's t Distribution:
- When sample sizes are to small we use the t-distribution.
Formula for t-scores:
n-1=degrees of freedom
Finding probability using spreadsheet- =tdist(t, n-1, tails)
t based on probability using spreadsheet- =tinv(`alpha`, n-1)
To find the Margin of error: (also, note lower (mean-M.E.) and upper (mean=M.E.) margins)
(Casandra Jensen)
- When sample sizes are to small we use the t-distribution.
- Sample Size Estimate for a Population Proportion
- Solving this equation for n will give you the sample size:
`n= ((zalpha/2)^2*(p'q')) -: (EBP^2)` (Hanna Strate)
- Solving this equation for n will give you the sample size:
- Sample Size Estimate for a Population Mean
- formula for Margin of Error of a population mean, where STDV is not known.(Madyson Handy)
EBP= t * `((s)/(sqrtn))`
- formula for Margin of Error of a population mean, where STDV is not known.(Madyson Handy)
- Symbols/Formulas/Abbreviations for Confidence Intervals
- x = count, successes, mean, sample
- `barx` = sample mean
- n = sample size
- p = proportion in population
- p' = sample proportion: p'=x/n
- ME = margin of error
- EBP = error bound proportion
- ME (Margin of Error) (EBM) (error bound on the mean): high/upper/max - low/lower/min divided by 2: (h-l)/2=m; also t-score minus StD/`sqrt(n)`
- CL = confidence level
- `alpha` , alpha, Significance Level: if not given, 1-CL=`alpha`
- z-score = x-`barx` /StD sample size minus sample mean divided by standard deviation
- IQR = interquartile range = Q3-Q1
- Range = max-min
- Min = minimum, smallest number in data set
- Q1 = first quartile
- m (Q2) = median, second quartile
- Q3 = third quartile
- Max = maximum, largest number in data set
- Lower bounds = Mean - Margin of Error
- Upper bounds = Mean - Margin of Error
- Low fence = Q1-(1.5*IQR)
- Upper fence = Q3+(1.5*IQR)
Hypothesis Testing
- Hypothesis Test
- The region of rejection is on both sides of the sampling distribution is called a two-tail test. There can also be a left-tail or a right-tail. Being able to tell which was the test is going to be is determined by the alternative hypothesis. (Kendall Davis)
- When we perform a hypothesis test of a single population mean "`mu` " using a normal distribution we take a simple random sample from the population:
- population we are testing is normally distributed or our sample size is sufficiently large AND
- we know the the value of the population standard deviation
- When we perform a hypothesis test of a single population proportion "p" we take a simple random sample from the population,and we must meet the conditions for a binomial distribution, which are:
- there are a certain number "n" of independent trials
- the outcomes of any trial are success or failure
- each trial has the same probability of a success "p"
- the quantities "np" and "nq" must both be greater than five (np>5, nq>5)
- `mu` = p and `alpha` = `sqrt((pq)/n)` (remember that (q=1-p))
- Standard Deviation = `alpha` / `sqrt(n)` (significance level divided by the square root of the sample size) (jennie)
- Null Hypothesis
- The Null Hypothesis is the result you expect before running an experiment. (e.g. innocent until proven guilty); it is a statement based on prior knowledge (Jennie Wolfe)
- When hypothesis testing for difference the Null Hypothesis is usually stated as 'there is no difference' or 'no effect'.
- We can either Reject or Fail to Reject the Null Hypothesis, based on evidence.
- The Null Hypothesis can not be accepted because we can not possibly PROVE it.
- fail to reject the null-meaning the sample data supports the null hypothesis.
- Reject the null-meaning the sample data supports the alternative hypothesis. (Baylee Powers)
- The Null Hypothesis is always contains an equals sign.
- Null Hypothesis -`H_0`
- Possible Signs - ` ` `>=`, `=`, `<=`
Proportions: `H_0:` `p>=0`
`H_0:` `p=0`
`H_0:` `p<=0`
Means: `H_0:` `mu>=0`
`H_0:` `mu=0 `
`H_0:` `mu<=0 `
- The Null Hypothesis can be rejected in three ways.
- The test statistic is in the critical area.
- ` ` `P-value < alpha`
- The hypothesis value is outside the confidence interval (Gio)
- The null value always goes in the middle of the graph with population parameter (Jennie)
- Alternative Hypothesis
- The alternate hypothesis states there is is a significant relationship between two variables.
- The possible signs for an alternative hypothesis are <, >, `!=` It can never be =
- < indicates a left tail test, > indicates a right tail test and `!=` indicates a two tail test
- Proportions `Ha` : p > 0, `Ha` : p < 0, `Ha` : p`!=` 0
- Means `Ha` : `mu` > 0, `Ha` : `mu` < 0, `Ha` : `mu` `!=` 0 (Yvonne)
- Significance Level
- Also known as alpha (`alpha` ), is a probability that is closely related to the confidence level (`alpha` + cl = 1). It is the probability that the calculated confidence interval does not contain the true population mean. For example, a significance level of 0.05 indicates that 5% of similar confidence intervals will miss the actual parameter. It is also the probability of making a type 1 error (rejecting the null hypothesis when it is true). (Shannon Hatley)
- The significance level generally goes on the test side; refers to the chance of making a mistake; it is determined by the statistician before they start collecting sample data. (Jennie)
- p-value
- When you perform a hypothesis test, a p-value helps you determine the significance of your results. The p-value is a number between 0 and 1 and is interpreted in a specific way. A small p-value which is, ≤ 0.05, indicates strong evidence against the null hypothesis, which means you reject the null hypothesis. (Jennifer McDougall)
- Probability of finding another sample with a more extreme statistic; calculation of how unusual an event or sample is based on the null hypothesis; it is a conditional probability. (Jennie)
- p-value high, null must fly; p-value low, null must go. `alpha` > p-value (alpha high, p-value low) = reject H0; `alpha` < p-value (alpha low, p-value high) = do not reject H0 (Jennie)
- If the p-value is low, the results of the sample data are significant; there is sufficient evidence to conclude that H0 is an incorrect belief and that Ha may be correct (Jennie)
- If the p-value is high, the results of the sample data are not significant; there is not sufficient evidence to conclude that the Ha may be correct (Jennie)
- Test Statistic
- A random variable that is calculated from a data sample and compared to the null hypothesis. They are used to see if a null hypothesis will be rejected, or failed to be reject. (Riley Lankford)
- A test statistic can either be a z-score, t-score, or a sample statistic. (Siobhan Holman)
- For Hypothesis Testing, a test statistic refers to a sample statistic. If the test statistic ends up in the critical region, we must reject the null hypothesis, and accept the alternative hypothesis. (Jennie)
- In StatKeys, we change the cutoff number to the sample statistic number. (Jennie)
- Equation to find a Test Statistic for a Proportion (Felix Hernandez)
- `z=(P'-P)/sqrt((PQ)/n)`
- P = Null Hypothesis, P' = Sample Proportion, Q = 1 - P, n = Sample Size
- `z=(P'-P)/sqrt((PQ)/n)`
- Equation to find a Test Statistic for a Mean (Felix Hernandez)
- `t=(barx-mu)/(s/sqrt(n))`
- `barx` = Sample Mean, `mu` = Population Mean, S = Sample Standard Deviation, n = Sample size
- `t=(barx-mu)/(s/sqrt(n))`
- Type 1 Error
- A type 1 error is the rejection of a true null hypothesis, it is also known as a false positive. A type 1 error occurs when the null hypotheses is true but is rejected- a false positive. A sample that is too large increases the type 1 error because the p-value depends on the size of the sample. (Jenna Doherty)
- = P(Type I error): probability of a Type 1 error; alpha `alpha` should be as small as possible because it is the possibility of error (Jennie)
- `alpha` > p-value (alpha high, p-value low) = reject H0; `alpha` < p-value (alpha low, p-value high) = do not reject H0 (Jennie)
- A type 1 error is the rejection of a true null hypothesis, it is also known as a false positive. A type 1 error occurs when the null hypotheses is true but is rejected- a false positive. A sample that is too large increases the type 1 error because the p-value depends on the size of the sample. (Jenna Doherty)
- Type 2 Error
- A type 2 error is when you make a decision to not reject the null hypothesis, when in reality the null hypothesis is false and you should reject it. (Giovanna Garcia)
- False negative
- = P(Type II error): probability of a Type II error; beta `beta` should be as small as possible because it is the possibility of error
- `alpha` > p-value (alpha high, p-value low) = reject H0; `alpha` < p-value (alpha low, p-value high) = do not reject H0 (Jennie)
- Unusual Events
- An unusual event is an event that has a low probability of occurring. 95% of all values from a normal distribution will usually be within 2 standard deviations of the mean. (Jordan Schmidt)
-
If the probability of and event of occurring is greater or equal to 0.05 is not unusual. outcomes <0.05 are unusual. (Brenda Uselman)
- An unusual event is an event that has a low probability of occurring. 95% of all values from a normal distribution will usually be within 2 standard deviations of the mean. (Jordan Schmidt)
- Power of a Test
- The power of a test is the probability of correctly rejecting the null hypothesis. We are usually only interested in the power of a test when the null hypothesis is in fact false. The significance level, sample size and variables will affect power. Calculating the power will tell you how many samples or trials you need to avoid incorrectly rejecting the null. Statistical Power is roughly the same as the error test. (Tiah Benedict)
- The power of a test is the probability of correctly rejecting the null hypothesis. We are usually only interested in the power of a test when the null hypothesis is in fact false. The significance level, sample size and variables will affect power. Calculating the power will tell you how many samples or trials you need to avoid incorrectly rejecting the null. Statistical Power is roughly the same as the error test. (Tiah Benedict)
- Decision
- After determining which hypothesis the sample supports, we make a decision: "reject null" if sample data/information favors the alternative hypothesis OR "do not reject the null" or "decline to reject the null" if sample info is insufficient to reject the null hypothesis (Jennie)
- Four possible outcomes: (also see above table at Power of a Test and Type I Error)
- The decision is not to reject H0 when H0 is true: CORRECT.
- The decision is to reject H0 when H0 is true: TYPE I ERROR, false positive.
- The decision is not to reject H0 when, in fact, H0 is false: TYPE II ERROR, false negative
- The decision is to reject H0 when H0 is false: CORRECT, aka Power of the Test. (Jennie)
- If the p-value is low, the results of the sample data are significant; there is sufficient evidence to conclude that H0 is an incorrect belief and that Ha may be correct (Jennie)
- If the p-value is high, the results of the sample data are not significant; there is not sufficient evidence to conclude that the Ha may be correct (Jennie)
- Order of Operations for Hypothesis Testing
- State the null and alternative hypothesis
- Identify the type of test
- Calculate the test statistic
- Calculate the p-value
- Draw a picture including the null value, critical value, significance level, test statistic, and p-value
- compare the p-value to alpha
- Make a decision: reject or fail to reject the null hypothesis
- Write a conclusion
(Gary Parker)
- Important Spreadsheet Functions
P72: =PERCENTILE(data,percentile (72))
Q3: =QUARTILE(data,quartile (3))
std: =STDEV(value1,value2)
mean: =AVERAGE(value1,value2)
max: =max(value1,value2)
min: =min(value1,value2)
z-score left-tail test: =normdist(x,mean,std,true)
z-score right-tail test: =1-normdist(x,mean,std,true)
to find z-score within a range: =NORMDIST(high, mean, std, true) – NORMDIST(low, mean, std, true)
z-score using probability,left-tail: =norminv(prob,mean,std)
z-score using probability,right-tail: =norminv(prob,mean,std); subtract answer from 1
(Mark Otton)
p-value = NORMDIST(p',p,sqrt(p*q/n),true) (Siobhan Holman) << For a proportion, left tail.
p' = x/n; x = favorable, n = sample size; p = proportion; q= 1-p (Adriana Ruiz)
(P-value For hypothesis test for mean) =TDIST(t, n-1, # of tails [1 or 2]) (Siobhan Holman)
p= population proportion
p1=sample proportion (Kylee Smith)
Confidence Interval Sample Size Proportion: n=(p1q1z2)/ME2 . find `alpha/2` . NormInv(`alpha/2,0,1)` (Kylee Smith) If p' isn't known use 0.5 (Adriana Ruiz)` `
Critical t-value =tinv(1-CL, n-1) (Felix Hernandez)
Margin of Error ME= t*(`sigma`/`sqrt(n)`) (Felix Hernandez)
Critical z-value =NormInv(`alpha/2,0,1)` (Felix Hernandez)
Margin of Error ME= z*(`sigma`/`sqrt(n)`) (Felix Hernandez)
(Jennie)
(jennie)
Probability Formulas
- `P(A)="# of times A was observed"/"total # of observations"`
- `0<=P(A)<=1`
- P(A) + P(A') = 1
- P(A') = 1 - P(A)
- P(A or B) = P(A) + P(B) - P(A and B)
- P(A | B) = P(A given B) = P(A knowing B) = P(A and B) / P(B)