A company I advised hired an ML engineer with an impressive resume—PhD from a top program, papers at major conferences, experience at a well-known AI lab. Six months later, they let him go. He could design elegant models in notebooks but couldn't ship anything. Production systems? Foreign territory. Working code at scale? Not his strength. Collaborating with product teams? A constant struggle.
"We interviewed for research," the engineering director told me, "but we needed engineering."
This story plays out constantly in ML hiring. The field attracts candidates with impressive academic credentials who may or may not be able to build production systems. It attracts software engineers who learned some scikit-learn but don't understand why their model works. It attracts people who can talk confidently about transformers without being able to implement one.
| Candidate Profile | Interview Strength | Common Gap | Risk Level |
|---|---|---|---|
| Academic researcher | Theory, novel approaches | Production systems, shipping | High |
| Self-taught ML | Practical implementation | Statistical foundations | Medium |
| SWE learning ML | Engineering practices | Model intuition, debugging | Medium |
| ML bootcamp grad | Framework usage | Depth, edge cases | High |
| Industry ML veteran | Full-stack ML | May be dated on latest methods | Low |
After helping over 50 companies build ML teams at SmithSpektrum, I've learned that assessing ML engineers requires a fundamentally different approach than assessing traditional software engineers. The surface signals—publications, prestigious employers, confident jargon—often mislead. The real skills are harder to see[^1].
The ML Engineering Spectrum
Before you can assess ML engineers, you need to understand what you're actually hiring for. "ML engineer" means vastly different things depending on context.
At one end sits research-focused ML engineering: designing novel architectures, pushing state-of-the-art benchmarks, publishing papers. These engineers need deep mathematical intuition, comfort with ambiguity, and patience for experimentation. They may not ship production code for months.
At the other end sits production ML engineering: taking models from research and making them work reliably at scale. These engineers need software engineering discipline, infrastructure knowledge, and the ability to optimize for latency, throughput, and cost rather than just accuracy. They ship code weekly.
In between sits applied ML engineering: taking existing techniques and adapting them to specific business problems. These engineers need practical judgment about which approaches will work, strong experimentation skills, and the ability to translate business needs into ML solutions.
Most companies need applied or production ML engineers. Most impressive resumes belong to research-oriented candidates. This mismatch explains a lot of failed hires.
When I work with companies on ML hiring, I start by making them articulate exactly what they need. Will this person be inventing new approaches, or implementing existing ones well? Will they work mostly in notebooks, or mostly in production code? Do they need to collaborate closely with product teams, or operate relatively independently on research problems?
The answers shape everything about how you should assess candidates.
What Actually Matters
Here's what I look for across different types of ML roles:
For production ML engineers, software engineering skills matter more than ML depth. I want to see candidates who can write clean, tested, maintainable code. Who understand distributed systems and can reason about scale. Who know how to debug production issues and instrument systems for observability. Who can make pragmatic trade-offs between model quality and operational constraints.
The best production ML engineers I've placed often had relatively modest ML credentials but strong software engineering backgrounds. They knew enough ML to understand what they were building, and enough engineering to actually build it.
For applied ML engineers, practical judgment matters most. Can they look at a problem and quickly identify which techniques are likely to work? Do they have good intuition for the difference between problems that ML can solve and problems that look like ML problems but aren't? Can they set up experiments that actually answer the relevant questions?
I've seen brilliant researchers fail as applied ML engineers because they wanted to solve every problem elegantly rather than effectively. And I've seen scrappy practitioners with middling credentials succeed because they understood what "good enough" meant and how to get there quickly.
For research ML engineers, depth and creativity matter most. These candidates need mathematical sophistication, the patience for long experimentation cycles, and the ability to generate novel ideas. They also need good taste—the ability to identify promising research directions and abandon dead ends.
Research roles are genuinely hard to assess. The best signal I've found is drilling into their published work to see if they can explain it clearly and respond thoughtfully to probing questions. If they can't defend their own research, something's wrong.
The Assessment Framework
My standard ML interview process has five components, weighted differently depending on role type.
Component 1: Technical fundamentals. This is a conversation—not a quiz—about ML concepts. I want to understand how candidates think about the field, not just what they've memorized. Good questions probe understanding rather than recall.
Instead of "Explain gradient descent," ask "You've trained a model and the loss isn't decreasing. Walk me through how you'd diagnose what's wrong." This reveals whether they understand gradient descent well enough to debug it, which is what actually matters.
Instead of "What's the bias-variance tradeoff?" ask "Your model performs great on the training set but poorly on the test set. What might be happening and how would you address it?" Same concept, but requires application rather than recitation.
For senior candidates, I ask questions that don't have clean answers: "When would you choose a simple linear model over a deep learning approach?" "How do you decide when a model is good enough to ship?" These reveal judgment, which is harder to teach than technique.
Component 2: Coding assessment. Every ML engineer needs to code. The question is what kind of coding to assess.
For production-focused roles, I use standard software engineering problems—data structures, algorithms, clean code. If someone can't write a working solution to a medium-difficulty coding problem, they can't build production ML systems, no matter how many papers they've published.
For research-focused roles, I include ML-specific coding: implement a simple model from scratch, write a training loop, debug provided ML code. This tests whether they understand the mechanics beneath the abstractions.
The worst mistake in ML coding interviews is letting candidates off the hook because "they're researchers, not engineers." Even researchers need to write code. If they can't, they can't be productive.
Component 3: System design. ML systems are complex, and designing them well requires integrating ML knowledge with engineering judgment.
I give candidates realistic problems: "Design a system to detect fraudulent transactions in real-time" or "Design a recommendation system for a news app with millions of users." Then I probe their thinking across the full stack: data collection and preprocessing, feature engineering, model selection and training, inference serving, monitoring and retraining.
What distinguishes senior candidates is attention to operational concerns. Junior candidates focus on the model. Senior candidates think about how to get data, how to handle model failures, how to monitor for drift, how to retrain without downtime. These are the problems that make ML systems hard.
Component 4: Domain conversation. I want to understand what candidates have actually done—not what they claim to have done, but what they genuinely contributed and what they learned.
I pick a project from their resume and drill into it: "Tell me about this project. What was the business problem? How did you approach it? What did you try that didn't work? What would you do differently now?" Real contributors can answer these questions in detail. People who worked on teams and claim credit for the whole project stumble.
I pay attention to how they talk about failures. The best ML work involves lots of failure—experiments that don't pan out, approaches that don't scale, models that don't generalize. Candidates who can't describe failures either haven't done real work or can't learn from their experiences.
Component 5: Collaboration and communication. ML engineers rarely work in isolation. They collaborate with data teams, product teams, infrastructure teams. They need to explain complex concepts to non-experts and advocate for their approaches without being arrogant about it.
I assess this throughout the interview. Can they explain technical concepts clearly? Do they listen to questions or talk past them? How do they handle pushback on their ideas? The brilliant jerk who can't collaborate is a net negative on most teams.
Specific Questions That Reveal Understanding
I've collected questions over the years that distinguish candidates who truly understand ML from those who've memorized surfaces.
On model selection: "You're building a model to predict customer churn. You could use logistic regression, random forest, or a neural network. Walk me through how you'd decide." Good candidates discuss the tradeoffs: interpretability, training data requirements, computational cost, feature engineering needs. Weak candidates either default to the most complex option or can't articulate why they'd choose anything.
On features: "You're building a recommendation system and you notice one feature is improving your model significantly but you're not sure why. How do you handle this?" This reveals whether candidates understand the risks of unexplainable features and how they balance predictive power against interpretability and potential leakage.
On deployment: "Your model works great in development but performs worse in production. What might be happening?" Strong candidates immediately think about training-serving skew, data distribution drift, and environmental differences. This question separates researchers who've never deployed from engineers who have.
On evaluation: "You're comparing two models. Model A has higher accuracy but Model B has higher precision. How do you decide which is better?" The right answer is "it depends on the business context"—followed by intelligent discussion of when precision matters versus recall, and how to frame the decision for stakeholders.
On debugging: "You've trained a model and the test performance is much worse than validation performance. Walk me through your debugging process." This reveals systematic thinking. Good candidates have a mental checklist: check for data leakage, examine the validation split, look for distribution differences, verify preprocessing.
Red Flags I've Learned to Watch For
After years of ML hiring, certain patterns reliably predict problems.
Candidates who can't explain their own work simply. If someone published a paper but can't explain the key insight to a smart non-expert, they either didn't do the core work or don't understand it deeply. Real understanding enables simple explanation.
Candidates who always reach for complex solutions. When I ask how they'd approach a simple problem and they immediately propose a transformer architecture, I worry. The best ML engineers try simple things first. Sophistication that can't be turned off is often a mask for insecurity.
Candidates who dismiss engineering concerns. "I focus on the research; someone else can figure out production" is disqualifying for any role that involves shipping. Even researchers need to understand the constraints their colleagues face.
Candidates who've only worked on mature systems. Someone who's always worked with abundant labeled data and clean pipelines may not handle the ambiguity of building from scratch. I probe for experience with messy real-world problems.
Candidates who can't discuss failures. Everyone who's done real ML work has tried things that didn't work. Candidates who can only discuss successes either haven't done much or can't learn from experience.
Compensation Realities
ML engineers command premiums, but the premium varies enormously based on actual skill level—not credential level.
In my experience, top-tier ML engineers at major tech companies earn $350K-$600K+ in total compensation. But "ML engineer" is one of the most inflated titles in tech. I've seen candidates with ML titles who were really data analysts with some Python skills, and candidates with modest titles who were doing genuinely sophisticated work.
For applied ML roles at well-funded startups or established tech companies, expect to pay $180K-$280K base for solid mid-level candidates, $250K-$350K total comp for strong senior candidates. Research-focused roles at top companies pay more but require correspondingly more impressive backgrounds.
The hardest candidates to price are the ones at the intersection: strong software engineers who've developed real ML skills but don't have prestigious ML pedigrees. These are often the best hires, but they don't fit standard comp frameworks. I recommend paying for demonstrated skill, not credential prestige.
Building the Interview Panel
Who interviews ML candidates matters as much as what you ask.
You need at least one person with genuine ML depth—someone who can distinguish real understanding from sophisticated-sounding nonsense. This person should evaluate the ML-specific components and calibrate the candidate's actual skill level.
You also need strong software engineers who can assess production readiness. An ML expert might be impressed by elegant model architecture while missing that the candidate writes unmaintainable code. Engineering evaluation keeps you from hiring people who can't ship.
For senior roles, include a product or business stakeholder. ML engineers who can't communicate with non-technical teams create friction. Someone outside engineering can assess whether the candidate explains things clearly and seems collaborative.
That company with the PhD who couldn't ship? They eventually built a strong ML team—but not by hiring more researchers. They hired pragmatic engineers with ML skills, paired them with their one strong ML researcher, and built a culture where shipping mattered as much as sophistication. Their models weren't as elegant as the researcher would have designed, but they actually worked in production. Which, in the end, is the point.
References
[^1]: SmithSpektrum ML engineering placement data, 50+ companies, 2020-2026. [^2]: Chip Huyen, "Designing Machine Learning Systems," O'Reilly Media, 2022. [^3]: Google, "Machine Learning: The High Interest Credit Card of Technical Debt," NeurIPS 2015. [^4]: LinkedIn AI Engineering Survey, 2025.
Building an ML team? Contact SmithSpektrum for specialized ML recruiting and assessment design.
Author: Irvan Smith, Founder & Managing Director at SmithSpektrum