Analyze the symmetry and tailedness of your data distribution.
Understanding Skewness and Kurtosis
💡 Skewness and Kurtosis are fundamental statistical measures used to describe the shape of a data distribution. They provide insights into the symmetry and peakedness of data, beyond basic measures like mean and variance.
What is Skewness?
Skewness measures the asymmetry of a distribution:
- Positive Skew: The tail on the right side of the distribution is longer/more pronounced.
- Negative Skew: The tail on the left side is longer/more pronounced.
- Zero Skew: The distribution is symmetric, like a normal distribution.
The formula for skewness (\(g_1\)) is:
- \(n\): Number of data points
- \(\bar{x}\): Mean of the data
- \(s\): Standard deviation
- \(x_i\): Individual data points
What is Kurtosis?
Kurtosis measures the "tailedness" of a distribution:
- Mesokurtic (K=3): A normal distribution.
- Leptokurtic (K>3): More extreme outliers (heavier tails).
- Platykurtic (K<3): Fewer outliers (lighter tails).
The formulas for kurtosis are:
Fisher's Kurtosis (Excess Kurtosis)
Fisher's kurtosis, also known as excess kurtosis, measures how much the tails deviate from a normal distribution:
Pearson's Kurtosis
Pearson's kurtosis includes the baseline kurtosis of a normal distribution (3) and is calculated by adding 3 to Fisher's kurtosis:
- \(g_2\): Fisher's kurtosis (excess kurtosis)
- \(K\): Pearson's kurtosis
Key Concepts
- Symmetry: Skewness reveals the symmetry (or lack thereof) of the data distribution.
- Peakedness: Kurtosis highlights how heavy or light the tails of the distribution are compared to a normal distribution.
- Higher Moments: Both skewness and kurtosis are part of the higher moments of a dataset (third and fourth moments).
Note: While skewness and kurtosis provide useful insights, interpreting them should be done alongside visualizations and other statistical measures.
Real-Life Applications
Skewness and Kurtosis are widely used in various fields:
- Finance: Measuring the risk of extreme events in stock returns.
- Healthcare: Analyzing test results or patient characteristics.
- Social Sciences: Studying income distributions or survey data.
Python Implementation
from scipy.stats import skew, kurtosis
# Example data
data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
# Calculate skewness
data_skewness = skew(data)
# Calculate Fisher's kurtosis (excess kurtosis is the default)
fisher_kurtosis = kurtosis(data, fisher=True)
# Calculate Pearson's kurtosis
pearson_kurtosis = fisher_kurtosis + 3
# Print results
print(f"Skewness: {data_skewness:.4f}")
print(f"Fisher's Kurtosis (Excess): {fisher_kurtosis:.4f}")
print(f"Pearson's Kurtosis: {pearson_kurtosis:.4f}")
R Implementation
# Load the required library
library(moments)
# Example data
data <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
# Calculate skewness
data_skewness <- skewness(data)
# Calculate Pearson's kurtosis (default from moments::kurtosis())
pearson_kurtosis <- kurtosis(data)
# Calculate Fisher's kurtosis (excess kurtosis)
fisher_kurtosis <- pearson_kurtosis - 3
# Print results
cat(sprintf("Skewness: %.4f\n", data_skewness))
cat(sprintf("Fisher's Kurtosis (Excess): %.4f\n", fisher_kurtosis))
cat(sprintf("Pearson's Kurtosis: %.4f\n", pearson_kurtosis))
JavaScript Implementation
function calculateMean(data) {
return data.reduce((a, b) => a + b, 0) / data.length;
}
function calculateSD(data, mean = null) {
const m = mean === null ? calculateMean(data) : mean;
const variance = data.reduce((a, b) => a + Math.pow(b - m, 2), 0) / (data.length - 1);
return Math.sqrt(variance);
}
function calculateSkewness(data) {
// Input validation
if (!Array.isArray(data)) {
throw new Error('Input must be an array');
}
if (data.length < 3) {
throw new Error('Skewness requires at least 3 data points');
}
if (!data.every(x => typeof x === 'number' && !isNaN(x))) {
throw new Error('All elements must be valid numbers');
}
const n = data.length;
const mean = calculateMean(data);
const sd = calculateSD(data, mean);
// Check for division by zero
if (sd === 0) {
return 0; // All values are identical
}
const skewnessFactor = (n / ((n - 1) * (n - 2)));
return skewnessFactor * data.reduce((sum, x) => sum + Math.pow((x - mean) / sd, 3), 0);
}
function calculateKurtosis(data, fisher = true) {
// Input validation
if (!Array.isArray(data)) {
throw new Error('Input must be an array');
}
if (data.length < 4) {
throw new Error('Kurtosis requires at least 4 data points');
}
if (!data.every(x => typeof x === 'number' && !isNaN(x))) {
throw new Error('All elements must be valid numbers');
}
const n = data.length;
const mean = calculateMean(data);
const sd = calculateSD(data, mean);
// Check for division by zero
if (sd === 0) {
return fisher ? -3 : 0; // All values are identical
}
const kurtosisFactor = (n * (n + 1)) / ((n - 1) * (n - 2) * (n - 3));
const fourthMoment = data.reduce((sum, x) => sum + Math.pow((x - mean) / sd, 4), 0);
const correction = (3 * Math.pow(n - 1, 2)) / ((n - 2) * (n - 3));
const excessKurtosis = kurtosisFactor * fourthMoment - correction;
return fisher ? excessKurtosis : excessKurtosis + 3;
}
// Example usage with error handling
try {
const data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10];
const skewness = calculateSkewness(data);
const fisherKurtosis = calculateKurtosis(data, true);
const pearsonKurtosis = calculateKurtosis(data, false);
console.log(`Skewness: ${skewness.toFixed(10)}`);
console.log(`Fisher's Kurtosis (Excess): ${fisherKurtosis.toFixed(10)}`);
console.log(`Pearson's Kurtosis: ${pearsonKurtosis.toFixed(10)}`);
} catch (error) {
console.error('Error:', error.message);
}
Note: While the formulas for skewness and kurtosis are standardized, the calculated values may differ slightly between JavaScript, Python, and R due to differences in numerical precision and rounding methods used by each language.
Further Reading
- Wikipedia: Skewness – A detailed explanation of skewness with examples.
- Wikipedia: Kurtosis – Insights into kurtosis and its applications.
- NIST/SEMATECH e-Handbook of Statistical Methods: - Fisher’s and Pearson’s kurtosis definitions and formulae.
- The Research Scientist Pod Calculators – Explore other statistical calculators.
Suf is a senior advisor in data science with deep expertise in Natural Language Processing, Complex Networks, and Anomaly Detection. Formerly a postdoctoral research fellow, he applied advanced physics techniques to tackle real-world, data-heavy industry challenges. Before that, he was a particle physicist at the ATLAS Experiment of the Large Hadron Collider. Now, he’s focused on bringing more fun and curiosity to the world of science and research online.