Changelog • dataProfilerR

dataProfilerR 0.2.1

CRAN release: 2026-06-24

Changes requested during the initial CRAN review:

Added method references (Shapiro-Wilk, Anderson-Darling, Cramer’s V) to the Description field.
normality_tests() no longer touches the global random-number state. Large columns are now reduced with a deterministic, evenly-spaced subsample instead of set.seed() + sample(); the seed argument has been removed.

New analysis and reporting:

report() renders a complete profile to a self-contained HTML file (requires pandoc, via ).
categorical_association() and plot_association() add Cramer’s V between categorical columns (the categorical analogue of the correlation matrix).
analyze_dates() profiles date/datetime columns: range, unique count, and the largest gap between consecutive timestamps.
compare_groups() summarises numeric columns within the levels of a grouping column (grouped/comparative profiling).

Pipeline changes:

profile_data() gains group_by (adds a grouped comparison to the diagnostics) and distributions (set FALSE to skip the eager per-column distribution plots on wide data). Association and date results are now part of the returned object, and plot() accepts which = "association".
summary() now also prints date, association and grouped-comparison sections when present.

First version: profile_data() with type inference, missing-value analysis, summary statistics (incl. skewness/kurtosis), normality tests, outlier detection (IQR/z-score/robust), correlation analysis, a data-quality score, and ggplot2 figures, returned as a data_profile S3 object with print(), summary() and plot() methods.