Codility AI Capabilities
Assess real work
in an AI-first world_
Engineers use AI every day. Your assessments should reflect that reality, with full visibility into how candidates and employees collaborate with AI, and reviewable evidence your team can trust.
The challenge
Screening at scale when every candidate has AI
You are sending thousands of assessments, but AI-generated submissions look identical to hand-written code. Your scoring models cannot differentiate senior engineers from candidates who copy-pasted a prompt. Manual review of every submission is not an option.
Recruiters review every CV by hand
“We’re missing better candidates because their CV isn’t strong, but they perform well.” Manual filtering at the top of the funnel misses the engineers worth pulling forward.
AI tools push every candidate to 90%+
When correctness scoring tops out for everyone, the funnel collapses. Hiring managers default back to CV signal, which defeats the purpose of skill-based hiring.
Heavy proctoring burden, unsustainable at scale
“Heavy proctoring burden, unsustainable as we scale.” At enterprise volume, manually reviewing every flagged session breaks. Engineering teams pulled in to verify integrity end up doing work that should be handled by a defensible record.
Before and after
From blind scoring to reviewable signal
| Status quo | With Codility Screen |
|---|---|
| Correctness score with no insight into how it was produced | Score plus code evolution timeline, AI interaction log, and similarity analysis |
| Ban AI and get fake signal, or allow AI and lose auditability | Enable or disable AI per assessment, with every candidate interaction captured as reviewable AI activity |
| Senior engineers manually verify every promising submission | AI-generated follow-up questions probe candidate understanding automatically |
| Candidates receive a pass/fail with no feedback | AI-generated coaching feedback helps candidates improve, even after a low score |
| Top-of-funnel scores cluster at 90%+ with no differentiation | Similarity analysis and AI interaction logs reveal who understands their code and who relied on generation |
AI capabilities in Screen
AI in the assessment
How candidates interact with AI during their assessment, and how AI generates feedback and follow-up signal for reviewers.
Cody: In-Assessment AI Assistant
Trained on Codility’s content library, Cody helps candidates clarify tasks and iterate on their approach. It is guardrailed: candidates can explore ideas and get guidance, but Cody will not generate a full solution. Every prompt and response is logged as reviewable AI activity. Cody does not affect scoring.
AI Copilot in VS Code
A multi-model AI Copilot inside real VS Code dev containers: tab completion, inline suggestions, multi-file edits, and chat-based assistance. All prompts, accepted suggestions, and rejected completions are captured for playback.
AI Follow-Up Questions
After submission, the system generates contextual questions about the candidate’s approach, trade-offs, and edge cases. Responses are captured for reviewer analysis.
AI Feedback for Candidates
Growth-oriented feedback identifies positive patterns alongside areas for improvement. Even low-scoring candidates receive actionable coaching on structure, naming, and approach.
AI Readiness: Tech Roles
130+ tasks covering machine learning, NLP, model training, and AI tool collaboration. Assess whether engineers can build with AI: prompt engineering, output verification, debugging, and iterating on model outputs.
AI Task Creation via MCP
Build assessment tasks directly from your IDE using the Model Context Protocol (MCP). The MCP server is live and powering customer task publishing across hiring and skills assessments.
AI Readiness for Business Tasks
Work simulations for non-technical roles where AI is part of the job: customer support, success, sales, product management, marketing, data analysis, and finance. Candidates use Cody to produce real outputs. Reviewers see prompt quality, output evaluation, and judgment.
Integrity signals
How Codility surfaces reviewable evidence about candidate behaviour during assessments.
Integrity Risk
A calculated integrity score per screening session that combines behavioural, identity, and plagiarism signals into one reviewable indicator. Reviewers see what drove the score and act on it directly.
Similarity Check
Detects code pasted from external sources and submissions that match known AI output patterns. Pairs with the AI interaction log to differentiate authored work from generated output.
Pattern Detection
Analyses typing cadence, paste patterns, and editing behaviour to detect when solutions were retyped from another device or screen. Surfaces reviewable evidence for your team to act on.
Cheating apps detection
Detects unauthorized applications running alongside the assessment, including tools designed to be invisible to screen sharing. Detected apps appear in the integrity widget and on the candidate timeline for reviewer playback.
What customers are saying
Signal from the field
“Everyone’s using AI and it feels sometimes unfair to disregard candidates the access. But then we can monitor it and ensure they’re not using it for the whole thing.”Demi, Engineering Leader, Consumer Analytics
“The QA team currently spends the first part of follow-up interviews testing a candidate’s understanding of what they completed in Codility.”Sonia, Talent Acquisition Manager, E-Commerce
The challenge
Live interviews need to reflect how engineers work
Your engineers use VS Code, containers, and AI copilots every day. But your live interviews still run in a stripped-down browser editor with no tooling. The result: you are assessing how candidates perform in an artificial environment instead of how they build real software.
Generic live coding tools weren’t built for AI
Browser editors test syntax recall. They cannot run containers, install packages, or capture how a candidate uses AI mid-session. The interviewer ends up guessing what was generated.
Ad hoc interviews across hiring managers
“Interviews are not necessarily run the same way.” Without a shared environment and one rubric, two interviewers running the same role produce different signal. The decision drifts to gut feel.
Senior engineers want their hours back
Live interviews eat senior engineering time. Without a real environment that captures what the candidate actually did, interviewers re-test in follow-on sessions. The cost compounds.
Before and after
From artificial test to real work simulation
| Status quo | With Codility Interview |
|---|---|
| Browser-based editor with no terminal, no packages, no tooling | Full VS Code environment with dev containers, sidecar services, and extensions |
| Ban AI and assess an unrealistic workflow | Enable AI Copilot with full interaction capture: prompts, completions, and acceptance patterns |
| Interviewer guesses what was AI-generated | Code timeline shows every edit, AI suggestion, and candidate decision in playback |
| Interviewer notes are the only record | Full transcript, code evolution, and structured evaluation form for every session |
AI capabilities in Interview
AI in the assessment
How candidates interact with AI during live interviews, and how interviewers gain visibility into that collaboration.
Cody: In-Assessment AI Assistant
Trained on Codility’s content library, Cody helps candidates clarify tasks and iterate on their approach during live sessions. It is guardrailed: candidates can explore ideas, but Cody will not generate a full solution. Every interaction is logged as reviewable AI activity and visible to interviewers in real time.
AI Copilot in VS Code
A multi-model AI Copilot with tab completion, inline suggestions, multi-file edits, and chat. Interviewers see exactly how candidates use AI: what they prompted, what they accepted, and what they rejected.
AI Readiness: Tech Roles
Interview tasks designed to evaluate AI collaboration skills: prompt engineering, output verification, debugging AI-generated code, and iterating on model outputs.
AI Task Creation via MCP
Build interview tasks directly from your IDE using the Model Context Protocol (MCP). The MCP server is live and powering customer task publishing across hiring and skills assessments.
AI Readiness for Business Tasks
Live interview tasks for non-technical roles where AI collaboration is part of the work. Interviewers see prompt quality, output evaluation, and judgment in real time.
Integrity signals
How Codility surfaces reviewable evidence about candidate behaviour during live sessions.
Pattern Detection
Analyses typing cadence, paste patterns, and editing behaviour to detect when solutions were retyped from another device or screen. Surfaces evidence for the interviewer to review.
Cheating apps detection
Detects unauthorized applications running alongside the interview session, including tools designed to be invisible to screen sharing. Detected apps appear in the integrity widget and on the candidate timeline for interviewer playback.
What customers are saying
Signal from the field
“If I was in an interview with someone and I could just see them put everything into AI, I may feel disengaged and question that candidate’s capabilities.”Demi, Engineering Leader, Consumer Analytics
“Within 12 months I think all of the vendors in the market are going to have very similar AI copilot integration. I think the real differentiation is: what are the insights?”Engineering leaders, European Fintech
The challenge
You invested in AI tools. Can your workforce use them?
Your organization is spending millions on Copilot, ChatGPT, and internal AI tools. But you have no objective measurement of whether employees can actually use them effectively. Self-reporting and adoption metrics tell you who logged in. They say nothing about who built something valuable.
We can’t prove ROI on AI investment
Leadership asks for data. We have license counts and login rates. The board wants evidence that training moved the needle on actual AI proficiency.
No shared definition of good AI skills
Prompting quality? Output evaluation? Multi-step collaboration? Without a framework, every team defines readiness differently. The benchmark drifts by department.
We need our team to meet the benchmark we hire to
“We need to know our existing teams meet the benchmark of the candidates we already have here.” Internal mobility, project staffing, and upskilling run on manager opinion today. There is no objective way to check.
Before and after
From training spend to measured capability
| Status quo | With Codility Skills Intelligence |
|---|---|
| AI training completion rates with no skill validation | Before-and-after assessments that prove capability improvement by role |
| Manager reviews drive internal mobility decisions | Objective skills data identifies who is ready for new roles |
| “AI readiness” lives in slide decks with no way to validate it | 130+ tasks across ML, NLP, model training, and AI collaboration with deterministic scoring |
| Employees take assessments and never hear back | AI-generated feedback turns every assessment into a coaching moment |
| License counts prove adoption without measuring proficiency | Skills Intelligence maps AI capability across the entire engineering org |
AI capabilities in Skills Intelligence
AI in the assessment
How employees interact with AI during skills assessments, and how AI generates actionable feedback and proficiency scoring.
AI Feedback for Employees
The feature Skills Intelligence customers ask about most. After every assessment, employees receive coaching-oriented feedback that identifies strengths, surfaces specific improvement areas, and provides actionable guidance on structure, naming, and approach. This shifts the perception of assessments from “testing” to “development,” driving higher engagement and repeat participation across the org.
Cody: In-Assessment AI Assistant
Trained on Codility’s content library, Cody helps employees clarify tasks and iterate on their approach. It is guardrailed: employees can explore ideas and get guidance, but Cody will not generate a full solution. Logged interactions reveal prompt quality, iteration patterns, and output evaluation.
AI Copilot in VS Code
A multi-model AI Copilot in dev containers. Captures how employees leverage AI tools they use daily: tab completion, multi-file edits, chat-based assistance. Provides signal on real-world AI collaboration patterns.
AI Readiness: Tech Roles
130+ tasks for ML engineers, data scientists, MLOps, and AI integrators. Assess real AI capability with deterministic scoring that the board can trust.
AI Readiness for Business Tasks
Work simulations for non-technical roles: customer support, success, sales, product management, marketing, data analysis, and finance. Candidates use Cody to produce real outputs and reviewers see prompt quality, output evaluation, and judgment. Soft-launched on codility.ai while validation pilots run.
Scoring 2.0
Adds maintainability metrics to skills proficiency scoring: code structure, naming conventions, modularity. Maps actual engineering quality across the full spectrum of code craftsmanship.
AI Task Creation via MCP
Build assessment tasks directly from your IDE using the Model Context Protocol (MCP). The MCP server is live and powering customer task publishing across hiring and skills assessments.
Integrity signals
How Codility maintains trust across internal assessments.
Similarity Check
Identifies code pasted from external sources and submissions that match known AI output patterns. Reinforces the integrity of pre- and post-training capability measurements.
Cheating apps detection
Detects unauthorized applications running alongside the assessment, including tools designed to be invisible to screen sharing. Detected apps appear in the integrity widget and on the candidate timeline for reviewer playback.
What customers are saying
Signal from the field
“We are all in on AI. There’s no going back at this point. I would put us in that 5% category of companies who can demonstrate ROI. There’s not a tremendous amount when you really look under the covers.”Mike, Executive, Global Financial Services
“That’s a burning question for me right now: how do we test, how do we know capability in AI prompting, how do we do that?”Mel, Tech Academy Leader, Financial Services
Why Codility
What sets Codility apart
Controlled AI collaboration with reviewable activity
Cody does not affect scoring. Enable or disable AI per assessment, with every candidate interaction captured as reviewable AI activity. More granular controls are on the roadmap. No decisions are outsourced to opaque models.
Real IDE that mirrors daily engineering work
VS Code dev containers with optional sidecar services and a multi-model AI Copilot. Candidates and employees work in the environment they know, with the AI tools they use daily, while every interaction is captured for review.
Assessment science with a maintainability lens
I/O psychologist-led validation through the Engineering Skills Model, validated by engineering leaders. Scoring 2.0 adds 25+ maintainability metrics that differentiate candidates in the 90 to 100 range where AI inflates results.
Enterprise trust and regulatory alignment
EU data storage in Frankfurt, SOC 2 Type II, ISO 27001, GDPR, and WCAG 2.1 AA compliance. Human-in-the-loop philosophy aligns with EU AI Act requirements. Codility does not use customer or candidate data to train AI models.
Availability
AI capabilities across products
| Capability | Screen | Interview | Skills Intelligence |
|---|---|---|---|
| Cody: AI Assistant | Available | Available | Available |
| AI Copilot in VS Code | Preview | Preview | Preview |
| AI Follow-Up Questions | Preview | n/a | n/a |
| AI Feedback | Preview | n/a | Preview |
| AI Readiness: Tech Roles | Available | Available | Available |
| AI Readiness for Business Tasks | Preview | Preview | Preview |
| Integrity Risk | Available | n/a | n/a |
| Pattern Detection | Preview | Preview | n/a |
| Cheating apps detection | Preview | Preview | Preview |
| Similarity Check | Available | n/a | Available |
| Scoring 2.0 | n/a | n/a | Available |
| AI Task Creation via MCP | Available | Available | Available |