
Understanding how to select and validate integrity assessment tools is not a one-time procurement decision. It is a technical and operational process that begins before vendor selection and continues after launch. Different tools measure different constructs, use different scoring models, and come with validity evidence that may or may not transfer to your specific jobs, sites, and outcome definitions. An assessment that performs well in a publisher’s validation study may produce no measurable lift in your environment if the risk criteria differ, the scoring logic does not match your use case, or the workflow produces inconsistent data.
This guide covers the full process: defining organizational risk targets, evaluating predictive validity and reliability evidence, conducting adverse impact analysis, confirming legal compliance, integrating with your HRIS and ATS, and running a structured pilot. The goal is a selection and validation approach that produces defensible evidence—not a vendor promise or a post-procurement checkbox.
For a complete overview of integrity assessments — including what they predict, how validity evidence works, compliance requirements, and how to build a defensible implementation — see our integrity assessments guide.
Why Tool Validation Matters Before Deployment
Deploying an integrity assessment without a defined validation approach means optimizing a hiring workflow around an outcome you are not measuring. In high-volume environments, this failure mode surfaces quickly: pass rates tighten without a corresponding drop in incidents or turnover, operations requests score overrides, and within one quarter the tool is functionally inconsistent across sites.
Validation and governance protect the program in three places simultaneously. First, defensibility: under the EEOC Uniform Guidelines on Employee Selection Procedures (UGESP), selection procedures must be supported by job-related validity evidence that matches the way the tool is actually used. Content validity—demonstrating that test content represents job tasks—generally does not satisfy this requirement for integrity assessments, which measure traits and characteristics rather than observable job behaviors. Second, performance: integrity tests often show stronger relationships with self-reported counterproductive behavior than with employer records. If your outcome targets are attendance points, incident logs, or termination codes, you need local evidence that the tool predicts those specific criteria. Third, adoption: without defined cut-score logic and a documented rationale, a single quarter of low incident rates can make the tool appear proven or useless by coincidence, eroding manager confidence and recruiter compliance.
Skipping validation creates four predictable failure modes:
- Legal exposure: if adverse impact questions arise, you cannot document why the screen is job-related and necessary
- Inflated ROI expectations: validity coefficients against self-reported counterproductive behavior are consistently higher than coefficients against employer records—the outcomes HR leaders are accountable for
- Operational instability: without a defined criterion and cut-score policy, base-rate variance will produce misleading period-to-period comparisons
- Trust breakdown: hiring managers bypass a screen they cannot explain; candidates disengage from an assessment they cannot interpret
Before you evaluate any vendor, answer three questions: Which outcome will you track and how will you measure it? What effect size would be operationally meaningful in your environment? What will you do if scores do not predict what leadership is accountable for?
Defining Organizational Goals and Risk Profile

The integrity assessment validation process for HR begins with risk criteria, not tool selection. Under UGESP, vagueness is indefensible. If leadership defines the goal as “reduce theft” but your only consistent, reliably coded data is attendance points and involuntary termination codes, you cannot build a criterion-related validity study against theft. You will end up evaluating opinions rather than outcomes.
Identify one primary risk per role family and a secondary risk only if you can measure it reliably within 90 days of hire. Convert each risk into a criterion you can pull from your HR systems at consistent coding quality across sites and supervisors:
| Define | What to Specify | Examples from HR Systems |
| Risk | What specific behavior are you trying to reduce? | Policy violations, falsified time, no-call/no-show, shrinkage, safety shortcuts |
| Criterion | Where will the outcome show up in your data? | Attendance system points, HRIS termination reason codes, incident logs, safety coaching events, quality escapes |
| Feasibility | Can you observe it reliably and at sufficient volume? | Consistent coding across sites and supervisors; base rate high enough to detect movement within the measurement window |
| Guardrail | What can the tool not harm? | Time-to-fill, candidate drop-off rate, adverse impact outcomes, site-level acceptance |
A common mismatch: if your accountable outcome is 90-day turnover, an assessment optimized to predict theft may generate impressive publisher validity coefficients while producing no movement on the metric operations cares about. Defining criteria before vendor evaluation makes this gap visible before procurement, not after.
If you cannot name the outcome and explain how it will be measured before speaking to vendors, you are not yet selecting an integrity assessment—you are selecting a story to present to executives. For implementation guidance on how the assessment fits within the hiring workflow once criteria are defined, see our article on how to implement integrity assessments in hiring.
Evaluating Predictive Validity and Reliability Evidence
Integrity assessment predictive validity requirements must be evaluated against your specific use case, not against a publisher’s summary correlation. Meta-analytic results are a starting point, not a substitute for local evidence or a documented transportability argument. Understanding the range and conditions of published validity is essential before evaluating vendor claims.
The foundational meta-analysis by Ones, Viswesvaran & Schmidt (665 validity coefficients, 576,460 data points) reported a mean operational predictive validity of .41 for supervisory ratings of overall job performance. However, an updated meta-analysis of 104 studies applying stricter methodological standards found corrected validity of .15 for job performance and .32 for counterproductive work behavior when publisher-authored studies were analyzed alongside non-publisher studies (Van Iddekinge et al., Journal of Applied Psychology, 2012). Critically, that same analysis found that corrected validities for counterproductive work behavior were .42 when the criterion was self-reported deviance, versus .15 when based on employer records. This gap—between self-report and operational data—is where most integrity programs underperform their projections.
This means that when a vendor presents a validity coefficient, the first question is: what was the criterion? Supervisor ratings of overall performance, self-reported counterproductive behavior, and attendance points from an HR system produce very different validity estimates for the same tool. Claims of validity against “job performance” without criterion definition are not actionable.
Selecting a Feasible Validation Strategy
How to select and validate integrity assessment tools depends on what evidence you can actually produce in your organization. There are three main strategies under UGESP, and the right choice depends on your sample size, data quality, and timeline.
Criterion-related validity (predictive): Scores at hire are correlated with outcomes measured after hire. This is the strongest and most defensible approach. Prerequisites include a sufficient sample of new hires in a consistent role, a criterion that is coded reliably across sites and supervisors, and a measurement window long enough for the outcome to appear. If you average two recordable incidents per quarter across a site, you cannot detect a stable correlation in a local predictive study. Options include pooling similar sites, extending the measurement window, or using a more frequently coded proxy criterion such as standardized safety coaching events.
Concurrent validity: Incumbents complete the assessment and their scores are correlated with existing performance data. Administratively easier, but potentially misleading in high-turnover environments where the incumbents represent a survivor sample. The employees most likely to show counterproductive behavior patterns may have already separated.
Transportability: Existing validity evidence from other settings is borrowed and documented as applicable to your use case. The SIOP Principles for the Validation and Use of Personnel Selection Procedures (5th ed., 2018) define transportability as a formal argument, not an assumption. You document the similarity between the setting where validity was established and your context—job tasks, risk exposure, administration mode, scoring model—and explain why the evidence should generalize. A file comparing warehouse selector roles across three DCs with the sample in a publisher study is transportability evidence. A verbal assertion that “the industry is similar” is not.
A workable approach for multi-site operations: run predictive validation for the highest-volume, highest-risk role family where clean data is available, and use transportability with tight adverse impact monitoring for smaller job groups until local outcome data accumulates.
Before accepting vendor evidence, require four specific artifacts:
- Technical manual: reliability estimates by job family, criterion-related validity results with sample sizes and study details
- Criterion definitions: exactly what outcomes the tool predicted—supervisor ratings, incident logs, self-reported admissions—and how those outcomes were measured
- Study characteristics: industry, job level, geography, administration mode, and the scoring model used—these affect whether results transport to your context
- Independent corroboration: any non-publisher research, with clear boundary conditions and limitations stated explicitly
Content validity is not an acceptable primary justification for integrity assessments. UGESP explicitly limits content validity to procedures that sample observable job behaviors. Integrity constructs—rule-following, honesty, risk tolerance—are traits and characteristics, not job behaviors, and do not meet that threshold. If a vendor claims content validity as the primary support for their tool, ask for criterion-related evidence instead.
Conducting Adverse Impact and Bias Analysis
Integrity assessment bias analysis steps must be built into the program design before launch, not triggered by a complaint. An assessment can show acceptable overall validity and still create compliance exposure if it screens one protected group at a substantially higher rate. Overall correlations and average scores mask what drives EEOC questions: who advances, who does not, and whether a cut score amplifies a small mean difference into a systematic barrier.
Apply the EEOC Uniform Guidelines on Employee Selection Procedures four-fifths rule at each stage where the assessment affects candidate movement—not just at the overall hire rate. If your manufacturing sites administer the assessment on a shared kiosk during a night shift while day shift candidates complete it on mobile at home, administration mode differences may produce selection-rate patterns that look like bias in the data. Break out results by site, shift, and administration mode before drawing conclusions about the instrument itself.
The four-fifths rule is an investigation trigger, not a legal threshold. If one group’s selection rate falls below 80% of the highest-selecting group’s rate, the required response is:
- Pause any automatic screen at that decision point
- Re-check job-relatedness and transportability for that specific job group
- Evaluate process factors: placement stage, administration mode, recruiter instructions, retest handling
- Test alternatives: adjusting cut scores, applying score bands, or reducing the tool’s weight while gathering cleaner criterion data
Decide your response protocol before you look at the data. Organizations that define the response threshold and escalation path in advance are more likely to act on the signal correctly than those that encounter adverse impact unexpectedly and respond reactively.
For a detailed review of adverse impact testing requirements and how to structure an integrity assessment bias analysis, see our article on whether integrity tests discriminate and how to monitor for adverse impact.
Confirming Legal Compliance and Documentation Requirements
HRIS integration for integrity assessment tools and the documentation surrounding that process are compliance deliverables, not administrative formalities. Under UGESP, if a regulator or plaintiff asks why a score affects a hiring decision, the answer must come from a documented file—not a vendor slide deck. Your documentation must match how the tool is actually used: the job groups covered, the validation or transportability rationale, how scores affect candidate movement, cut-score logic, and adverse impact monitoring outputs by stage.
Maintain a file that can be reconstructed six months after deployment. If that is not feasible with your current documentation practices, the program has a compliance gap independent of the tool’s validity.
Confirm that the assessment does not create exposure under the Employee Polygraph Protection Act (EPPA). EPPA prohibits most private employers from using lie detector tests for pre-employment screening. Written and online integrity assessments that do not use physiological measurement devices fall outside EPPA’s scope. However, an assessment positioned in candidate-facing materials, vendor marketing, or internal policy language as a lie detection or deception identification tool creates EPPA exposure. Require vendors to confirm that their candidate disclosures, product documentation, and marketing materials characterize the tool as a measure of integrity-relevant attitudes and behaviors, not as a deception detector.
On data governance: if candidates complete the assessment on shared kiosks, require documented controls for data minimization, access logging, retention schedules, and breach notification procedures before connecting the tool to your ATS or HRIS.
Integrating With Your HRIS and ATS
A technically valid integrity assessment produces inconsistent outcomes when its integration with your HRIS and ATS is not designed as a controlled decision point. If results are delivered as PDF attachments in recruiter email or require manual data entry to create a record, the program will produce inconsistent data across sites and create audit gaps that surface under compliance review.
Before turning on the integration, define four elements of the workflow:
- Scoring and decisioning: map score fields into structured ATS fields and specify the decision rule—cut score, score band, or mandatory review—that matches how you documented your intended use of the tool. If the documentation says “one input among several” and the ATS workflow functions as an automatic reject, the process and documentation are inconsistent
- Retest rules: specify the waiting period, the maximum number of attempts, and the qualifying conditions for a retest (for example, confirmed system error or an approved accommodation request). Without a documented policy, retest handling becomes a site-level discretionary decision that undermines score comparability
- Accessibility: document accommodation procedures and alternate administration options including mobile delivery, kiosk mode, language support, and extended timing. Candidates screened out by the delivery method rather than the content create both a compliance risk and a measurement validity problem
- Audit trail: require that the integration captures timestamps, stage movement, override events with reason codes, and score versioning. This is the data that enables adverse impact monitoring by stage and compliance reconstruction if needed
If managers can override scores without a documented reason code captured in the ATS, the assessment is not functioning as a consistent selection procedure—it is functioning as an advisory input with inconsistent application. For guidance on structuring ATS integration for pre-employment tools, see our article on talent assessment tools and ATS integration.
Building a Business Case for Integrity Assessment
How to build a business case for integrity assessment depends on knowing what outcomes your organization already tracks and what a meaningful reduction in those outcomes is worth. A business case built on vendor benchmarks or generic ROI claims will not survive a budget review from a finance partner who asks to see the source data.
Frame the case around the specific loss metrics identified in the risk profiling step. Calculate the current annual cost of each metric in your target role family:
- Early turnover: cost to replace a role (recruitment, onboarding, lost productivity) multiplied by the number of 90-day terminations per year
- Workers’ compensation: claim frequency multiplied by average claim cost in your industry classification. For context, OSHA estimates US employers pay approximately $1 billion per week in workers’ compensation costs (OSHA, Business Case for Safety and Health)
- Shrinkage: inventory loss attributable to internal theft or process failure as a percentage of revenue, compared against industry benchmarks from the National Retail Federation National Retail Security Survey where applicable
IntegrityFirst client data shows that organizations implementing validated integrity assessment programs have achieved a 48% reduction in workers’ compensation claim frequency, a 30% reduction in turnover, and a 19% reduction in claim severity (IntegrityFirst client data). Apply these as directional planning assumptions for your context, not as guaranteed outcomes. Present them alongside your organization’s current metric baselines so leadership can evaluate the potential return against a realistic implementation cost.
A credible business case includes four elements: a current-state baseline for each target metric, a projection based on conservative assumptions about effect size, an implementation cost estimate covering assessment licensing, ATS integration, and monitoring overhead, and a measurement plan specifying how outcomes will be tracked and attributed. Presenting projected savings without a measurement plan will raise questions you cannot answer at the funding review.
Running a Pilot to Validate Your Integrity Assessment Program

How to pilot an integrity assessment program determines whether the pilot produces usable evidence or noise. The most common pilot failure is treating the assessment as a tool to “try out” rather than as a selection procedure undergoing a structured evaluation. Loosely governed pilots generate exceptions, inconsistent data, and inconclusive results that neither justify expansion nor provide a clear rationale for stopping.
Run the pilot as a controlled selection procedure from day one. Select one high-volume role family across two to four similar sites—for example, picker/packer roles at two distribution centers hiring to the same job description. Deploy the assessment inside the live ATS workflow with locked rules: who receives an invitation, how scores affect stage movement, and what constitutes an override. Do not modify scoring logic during the pilot window.
Define a success scorecard before launch that mixes leading indicators (which appear quickly) with lagging outcome data (which takes time to accumulate):
- Completion rate by site and administration mode: a completion rate below 80% signals a workflow or communication problem that must be resolved before outcome data is interpretable
- Time-to-fill impact: track whether the additional screening step adds days to your hiring cycle and whether the impact varies by site or shift
- Override rate and reason codes: high override rates indicate that hiring managers do not trust the scores or that the decision rule is set incorrectly
- Adverse impact by stage: run the four-fifths analysis at the assessment stage specifically, not only at final hire
- Early outcome indicators: attendance points, standardized safety coaching codes, and early involuntary termination codes within the first 60 days are more frequent and more consistently coded than rare events like recordable incidents or confirmed theft cases
Do not promise that a 60 to 90 day pilot will demonstrate reductions in rare events. In most operations, the base rate for recordable incidents, confirmed theft cases, or workers’ compensation claims is too low to detect a statistically meaningful signal within a short pilot window. Set that expectation with leadership before launch. The pilot earns continued funding by demonstrating controlled execution and early directional signals—not by resolving a validity question that requires 12 to 18 months of outcome data.
Start with conservative cut scores or score bands. If you cannot explain in writing what you changed and why between the pilot and full deployment, the pilot did not produce evidence—it produced a trial run.
Next Steps: Implementing a Defensible Integrity Assessment Program
Selecting and validating integrity assessment tools is a structured process that spans risk definition, evidence evaluation, compliance documentation, ATS integration, and a governed pilot. Each step produces the evidence and operational controls that make the program defensible—to leadership, to regulators, and to hiring managers who need to trust the score before they act on it.
IntegrityFirst Tests works with HR leaders to design and implement integrity assessment programs that produce verifiable outcomes. Our team supports the full process: risk criterion definition, vendor evidence review, adverse impact monitoring design, ATS integration, and pilot governance. We provide documentation that matches UGESP requirements and outcome tracking that connects scores to the metrics leadership cares about.
Contact IntegrityFirst Tests to discuss how to select and validate integrity assessment tools for your hiring environment.