How AI is transforming the way we assess workforce skills
As federal organizations adopt merit-based hiring, leaders face growing pressure to assess job-relevant skills fairly, accurately, and at scale. AI is reshaping how skills assessments are designed and scored, reducing effort while keeping human judgment, validity, and accountability at the center.
Federal leaders are accelerating the shift toward merit-based hiring, consistent with guidance from the Office of Personnel Management, as they seek fairer, more defensible ways to hire, promote, and develop talent. Skills, not credentials or prior job titles, are increasingly treated as the most reliable indicators of performance. But this shift raises a critical question: how can agencies assess job-relevant skills accurately, consistently, and at scale?
For years, that question came with tradeoffs. High-quality assessments required significant time, specialized expertise, and sustained investment—forcing organizations to choose between speed, scale, and precision.
Artificial intelligence is changing that equation. When applied responsibly, AI can reduce the effort required to design, test, and score skills-based assessments—allowing experts to focus on what matters most: defining merit and ensuring assessments reflect real performance demands.
AI-enabled assessment methods
Traditional methods, interviews, simulations, and direct observations, remain essential to assessing job-relevant skills. But on their own, they are difficult to scale and slow to adapt as roles evolve. For leaders pursuing merit-based hiring, the challenge is not whether these methods work, but whether they can be applied consistently and quickly enough to meet workforce demands.
AI helps address this constraint. It can streamline the design, validation, and scoring of assessments—freeing up expert time for higher-value work. Crucially, this is not a shift toward automated decision-making. Experts remain responsible for how assessments are constructed, interpreted, and used. AI expands what teams can evaluate, and how quickly, while keeping accountability, transparency, and professional standards firmly in human hands.
Applying AI to real-world assessment challenges
The value of AI in skills assessment depends on how it is governed and applied. ICF pairs AI with the expertise of organizational psychologists and assessment professionals to measure job-relevant skills in ways that remain transparent, defensible, and grounded in real performance demands.
The following examples, drawn from work supporting the U.S. Army, show how AI can strengthen different aspects of assessment while keeping humans accountable for standards, interpretation, and outcomes.
Think-aloud assessments
Think-aloud assessments evaluate complex skills such as reasoning, judgment, and problem solving by capturing how individuals arrive at decisions—not just the outcomes they produce. The challenge is scale: transcribing, coding, and scoring verbal data requires significant analyst time.
In work supporting the U.S. Army, we applied AI to extract job-relevant evidence directly from recorded responses, align that evidence to behaviorally anchored rubrics, and support consistent scoring. Assessment experts defined performance criteria, reviewed outputs, and validated results.
This approach improves defensibility as well as efficiency. By tying ratings directly to cited evidence, assessment decisions become easier to review, explain, and audit—giving leaders greater confidence in how judgments are made.
Analogies
Analogy-based items assess reasoning by asking respondents to infer relationships and apply them in new contexts. They are effective for evaluating cognitive flexibility and pattern recognition, but labor-intensive to design.
In support of the U.S. Army, we used AI to generate dozens of draft items, then relied on assessment experts to evaluate, select, and refine the strongest outputs. Approximately 60%–80% of generated items met initial quality thresholds, giving experts a substantial head start.
This approach shifts expert effort from production to judgment—expanding item banks while maintaining control over quality and alignment to job requirements.
Critical thinking scenarios
Situational judgment tests and other critical thinking scenarios assess how individuals evaluate conditions and weigh alternative courses of action. Designing these scenarios requires both assessment expertise and deep domain knowledge.
In this work, we used AI to generate draft scenarios aligned to real operating contexts that Army officers face, then relied on industrial-organizational psychologists and subject-matter experts to review, refine, and approve each item. This reduced the average number of review cycles from two or three to one while maintaining quality.
For leaders, the implication is clear: assessments can adapt more quickly as roles evolve—without compromising rigor.
Written communication skill assessments
Written assessments are useful for evaluating communication skills but are traditionally slow to score and difficult to apply consistently at scale. When evaluations rely heavily on manual review, results can vary across reviewers, creating perceived fairness risks and limiting how broadly writing samples can be used in merit-based decisions.
In this work, we applied AI to support scoring and feedback, reducing manual review time while improving consistency across responses. Experts defined scoring criteria, monitored performance, and retained responsibility for how assessment results were interpreted and used.
The result is a more consistent and scalable way to assess writing—making it a more practical and reliable input into hiring, development, and mobility decisions.
Reconsidering skills assessment for the future
The opportunity for federal leaders goes beyond faster assessment—it’s a shift in how skills are defined, measured, and applied across hiring, mobility, and development.
AI can help make rigorous, merit‑based assessment more accessible, but it does not remove the need for judgment. Leaders still must decide which skills matter, how evidence should be weighed, and how accountability is maintained.
As merit-based strategies reshape the federal workforce, the question is no longer whether assessments can scale. It’s whether they are built to stand up to scrutiny, adapt as roles evolve, and support better decisions over time. When those conditions are met, skills assessment becomes a strategic asset—not an administrative hurdle.