IT FundamentalsDA0-001

CompTIA Data+: Data Concepts, Analysis, Visualisation, and Governance

CompTIA Data+ is the vendor-neutral data analytics certification for professionals who work with data — cleaning, analysing, visualising, and communicating insights. It bridges the gap between business analysts who understand data but lack technical depth and developers who code but do not think analytically. Data+ validates you can take a business question, find the right data, analyse it correctly, visualise the findings clearly, and ensure data governance throughout.

11 min
4 sections · 10 exam key points

Data Concepts and Types

Data+ starts with foundational data literacy. Data types: quantitative (numerical — discrete like count of events, continuous like temperature or revenue), qualitative (categorical — nominal (no order: colour, country), ordinal (ordered: low/medium/high, star rating)). Structured data: rows and columns, relational databases — easily queryable with SQL. Semi-structured data: partial structure, key-value or hierarchical — JSON, XML, CSV. Unstructured data: no predefined schema — text documents, images, video, audio — requires NLP or computer vision to extract structure. Data sources: primary (collected directly for this purpose — surveys, sensors, experiments), secondary (collected for another purpose — public datasets, purchased data, operational databases). Data pipelines: ETL (Extract from source, Transform to clean/reshape/join, Load to destination) or ELT (Extract, Load raw, Transform in the destination — common with cloud data warehouses). Data quality dimensions: accuracy (correct values), completeness (no missing data), consistency (same values across systems), timeliness (current enough for the use case), uniqueness (no duplicates), validity (conforms to defined rules).

Data Analysis and Statistics

Statistical analysis for Data+. Descriptive statistics: mean (average — sum/count, sensitive to outliers), median (middle value — resistant to outliers, preferred for skewed distributions like income), mode (most frequent value — useful for categorical data), range (max - min), standard deviation (average distance from mean — larger SD = more spread), variance (SD squared). Distributions: normal distribution (bell curve — mean = median = mode, 68-95-99.7 rule for 1/2/3 SDs). Skewness: right skew (tail on right, mean > median — income data), left skew (tail on left, mean < median). Correlation: measures relationship strength between two variables — correlation coefficient (r) from -1 to 1. Positive correlation (as X increases, Y increases), negative correlation (as X increases, Y decreases), r = 0 means no linear relationship. Correlation is not causation — a third variable (confounding variable) may explain both. Regression: predict the value of a dependent variable from one or more independent variables — linear regression fits a straight line to the data. Hypothesis testing: null hypothesis (H0 — no effect, no difference), alternative hypothesis (H1 — there is an effect). P-value < 0.05 (or chosen alpha) = reject null hypothesis — the result is statistically significant.

Data Visualisation and Reporting

Choosing the right visualisation is a core Data+ competency. Chart types: bar chart (compare categories — best for comparing discrete groups), line chart (show trend over time — best for continuous time-series data), scatter plot (show relationship between two numeric variables — correlation visualisation), pie chart (show proportions of a whole — limited to 5-7 slices maximum, use bar chart for more), histogram (show distribution of a single continuous variable — binned frequencies), box plot (show distribution statistics — median, quartiles, outliers — compare distributions across groups), heat map (show matrix data with colour intensity — good for correlation matrices), waterfall chart (show cumulative effect of positive and negative changes — financial P&L). Visualisation best practices: match chart type to data type and question, eliminate chart junk (3D effects, unnecessary gridlines, decorative elements), use colour purposefully (not for decoration — use to highlight, to encode a third dimension, or for categorical grouping), always label axes, include data source and date. Dashboard design: executive dashboards show KPIs and trend indicators; operational dashboards show real-time metrics; analytical dashboards allow drill-down exploration.

Data Governance and Ethics

Data governance ensures data is trustworthy, secure, and used appropriately. Data governance programme: data catalogue (metadata inventory of all data assets — what exists, where it is, what it means, who owns it), data lineage (tracking data from origin through transformations to final use — essential for debugging data quality issues), data classification (sensitivity labels: public, internal, confidential, restricted — drives access controls and retention policies), master data management (MDM — single authoritative source for key business entities: customer, product, employee — prevents duplicates and inconsistencies across systems). Data privacy regulations: GDPR (EU — consent required, right to access, right to erasure, 72-hour breach notification, applies to EU citizens globally), CCPA (California — opt-out of sale, right to know, right to delete, applies to California residents), HIPAA (US healthcare — PHI protected, covered entities and business associates). Data ethics: data collection (only collect what you need — data minimisation), data use (only use data for stated purposes — purpose limitation), fairness (examine training data and model outputs for bias), transparency (be clear about how data is used and decisions are made).

Key exam facts — DA0-001

  • Mean is sensitive to outliers; median is resistant — use median for skewed distributions
  • Correlation coefficient (r) range: -1 to 1; r = 0 means no linear relationship
  • Correlation is not causation — confounding variables may explain both
  • P-value < 0.05 = statistically significant (reject null hypothesis at 95% confidence)
  • Bar chart: compare categories; Line chart: trends over time; Scatter plot: correlation between variables
  • Data quality dimensions: accuracy, completeness, consistency, timeliness, uniqueness, validity
  • ETL: Extract > Transform > Load; ELT: Extract > Load raw > Transform in destination
  • GDPR: consent required, right to erasure, 72-hour breach notification for EU citizens
  • Data lineage tracks data from origin through all transformations to final use
  • Master data management (MDM): single authoritative source for key business entities

Common exam traps

More data always leads to better analysis

More data amplifies both signal and noise. Unclean, biased, or irrelevant data leads to worse conclusions than a smaller, clean, representative dataset. Data quality always trumps data quantity.

A correlation coefficient of 0.5 means there is a 50% relationship between variables

The correlation coefficient is not a percentage. r = 0.5 means a moderate positive linear relationship. R-squared (0.5 squared = 0.25) means 25% of variance in Y is explained by X — a more interpretable measure of explanatory power.

Practice this topic

Test yourself on CompTIA Data+

JT Exams routes you to questions in your exact weak areas — automatically, after every session.

No credit card · Cancel anytime

Related certification topics