Skip to contents

Main pipeline

The single entry point and the profile object it returns.

profile_data()
Profile a data frame
is_data_profile()
Is an object a data_profile?
report()
Render a profile to a self-contained HTML report

Methods for a profile

S3 methods on the data_profile object.

print(<data_profile>)
Print a concise overview of a data profile
summary(<data_profile>)
Detailed summary of a data profile
plot(<data_profile>)
Plot a data profile

Profiling engine

Column types, missingness, summary statistics, data-quality score.

infer_column_types()
Infer a semantic type for each column
analyze_missing()
Analyse missing values
summarize_columns()
Summary statistics by column type
data_quality_score()
Data quality score
skewness()
Sample skewness
kurtosis()
Sample excess kurtosis

Statistical analysis

Normality, outliers, correlation, categorical association, dates, groups.

normality_tests()
Normality tests for numeric columns
detect_outliers()
Detect outliers in a numeric vector
outlier_summary()
Outlier summary across numeric columns
correlation_analysis()
Correlation analysis
categorical_association()
Categorical association (Cramer's V)
analyze_dates()
Profile date / datetime columns
compare_groups()
Compare numeric columns across groups

Visualisation

ggplot2 figures for each part of the profile.

plot_association()
Categorical association heatmap
plot_boxplots()
Boxplots for numeric columns
plot_correlation()
Correlation heatmap
plot_distribution()
Distribution plot for a single column
plot_missing()
Missing-value heatmap
plot_pairs()
Pairwise scatterplot matrix

Package

dataProfilerR dataProfilerR-package
dataProfilerR: automated exploratory data analysis