35 cards
Mean
The average of a data set, calculated by dividing the sum of all values by the number of values.
Median
The middle value in a data set when the values are arranged in ascending order.
Mode
The value that appears most frequently in a data set.
Standard Deviation
A measure of the amount of variation or dispersion in a set of values.
Variance
The square of the standard deviation, representing the average of the squared differences from the mean.
Normal Distribution
A probability distribution that is symmetric about the mean, with a bell-shaped curve.
What is a Z-score?
A measure of how many standard deviations an element is from the mean.
Central Limit Theorem
The theorem stating that the sampling distribution of the sample mean approaches a normal distribution as the sample size becomes large.
What does correlation measure?
The strength and direction of a linear relationship between two variables.
Pearson Correlation Coefficient
A measure of the linear correlation between two variables, ranging from -1 to 1.
Sample Space
The set of all possible outcomes in a probability experiment.
P-value
The probability of obtaining test results at least as extreme as the observed results, assuming that the null hypothesis is true.
Type I Error
The error of rejecting a true null hypothesis (a false positive).
Type II Error
The error of failing to reject a false null hypothesis (a false negative).
Hypothesis Testing
A method of making decisions using data, whether from a controlled experiment or an observational study.
Confidence Interval
A range of values that is likely to contain the population parameter with a certain level of confidence.
Regression Analysis
A statistical process for estimating the relationships among variables.
What is an outlier?
An observation point that is distant from other observations, often due to variability in the measurement or it may indicate experimental error.
Binomial Distribution
A discrete probability distribution of the number of successes in a sequence of independent experiments.
Poisson Distribution
A probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time or space.
Cumulative Distribution Function (CDF)
A function that represents the probability that a random variable is less than or equal to a certain value.
What is the purpose of a scatter plot?
To visualize the relationship between two quantitative variables and identify potential correlations.
Chi-square Test
A statistical test used to determine the association between categorical variables.
ANOVA (Analysis of Variance)
A statistical method used to compare means of three or more samples.
What is the Law of Large Numbers?
A principle that states as the number of trials increases, the experimental probability of an event will get closer to the theoretical probability of the event.
Bayes' Theorem
A formula that describes how to update the probabilities of hypotheses when given evidence.
What is a random variable?
A variable whose possible values are numerical outcomes of a random phenomenon.
Skewness
A measure of the asymmetry of the probability distribution of a real-valued random variable about its mean.
Kurtosis
A measure of the 'tailedness' of the probability distribution of a real-valued random variable.
What is the difference between descriptive and inferential statistics?
Descriptive statistics summarize data from a sample using indexes, while inferential statistics draw conclusions from data that are subject to random variation.
What is the difference between a population and a sample?
A population includes all elements from a set of data, while a sample consists of one or more observations drawn from the population.
What is a probability density function (PDF)?
A function that describes the likelihood of a random variable to take on a particular value.
What is a t-test?
A statistical test used to determine if there is a significant difference between the means of two groups.
What is heteroscedasticity?
A condition in which the variance of errors or the dependent variable is not the same across all levels of an independent variable.
What is multicollinearity?
A situation in which two or more independent variables in a multiple regression model are highly correlated.
Free forever. No credit card needed.
Ready to study Statistics — Key Concepts & Formulas?
Free forever. No credit card needed.