Challenges

The LEGO Fallacy: Why Validated Components Don't Make a Validated Framework

By Marc Molas·March 16, 2026·9 min read

There's a common pattern in how new management frameworks get justified: each individual practice has supporting research, the citations are good, and the framework as a whole is presented as the sum of its evidence. This is structurally seductive and often wrong. The integrated framework can produce different outcomes than any of its individual pillars predict, because the pillars interact.

The recent paper The Honey Badger Management Framework for Human-AI Hybrid Organizations: A Proxy Validation and Integration Analysis (Fradelos, January 2026) does something I rarely see in this space: it explicitly names this risk as the LEGO fallacy — "unsupported linear composition of supported parts" — and tries to address it head-on.

This is worth understanding because the LEGO fallacy isn't specific to one framework. It's a pattern that recurs in every management methodology that's been pitched as "evidence-based." Recognizing it changes how you evaluate any framework, and it changes how you should evaluate the methodologies you're already using.

What Proxy Validation Actually Is

Proxy validation is a specific evidentiary stance. It says: we don't have a longitudinal study of the integrated framework in a real organization, so we won't claim we do. Instead, for each pillar of the framework, we identify the closest empirical evidence base in the literature, classify the strength of that evidence, and explicitly flag the integration tensions where pillar-level evidence may not compose.

The HBMF paper applies this method to four pillars:

7-day cancellable sprints: backed by real-options theory and batch-size economics. Evidence is strong.
Governed intra-team competition: tournament theory predicts effort effects. Evidence on effort is real, but evidence on the governed version (with anti-sabotage governance, helping routines, psychological safety guardrails) is contingent. Sabotage and cooperation erosion under competition are well-documented; the success of governance to mitigate them is context-sensitive.
AI teaming: individual-level productivity is supported by recent RCTs and field studies. Evidence at the team level is moderate-to-thin.
Redundancy buffers: well-supported by reliability engineering and organizational psychology.

The honest framing matters more than the specific results. "Evidence is strong here, moderate there, contingent here, thin there" is the kind of stance most framework advocates avoid because it makes the framework less easy to sell. Adopting it makes the framework more credible to the people who would actually have to bet their organization on it.

Why the LEGO Fallacy Is Endemic

The reason this fallacy keeps showing up is structural: the people who design management frameworks typically can't run the longitudinal studies that would validate the integrated framework. Such studies are expensive, slow, and counterfactual-poor. So the literature is full of pillar-level evidence and short on integration-level evidence.

The honest options are limited:

Wait for longitudinal evidence before claiming validation. This is academically pure and operationally unhelpful — frameworks that wait for full validation get scooped by frameworks that don't.
Claim integrated validation based on pillar evidence. This is the LEGO fallacy and produces overclaiming.
Adopt a proxy-validation stance: classify the pillar-level evidence, flag the integration tensions, propose a minimal pilot to test the integrated framework.

Option 3 is harder to write and easier to evaluate. It also turns out to be more useful for engineering teams trying to decide whether to adopt the framework, because it tells them where the framework is most likely to break.

Integration Tensions Worth Naming

The integration tensions HBMF's analysis surfaces are general — they apply to any framework that combines short cycles, internal competition, AI augmentation, and redundancy. Worth understanding even if you don't adopt HBMF.

Competition vs. psychological safety

Tournament theory predicts higher effort under competition. Behavioral studies also predict that competition erodes helping behavior, increases sabotage incentives, and can reduce psychological safety. These two effects are not independent — they're produced by the same mechanism.

The framework's governance answer is the Guru role plus mandatory daily help sessions and explicit anti-sabotage culture. Whether this works depends on execution. The honest framing is that this pillar is contingent, not validated. CTOs evaluating any management approach with internal competition components should not assume the governance correctly mitigates the side effects.

AI augmentation vs. team learning

Individual-level AI augmentation has strong evidence: paired studies show productivity improvements when AI is used on individual tasks. Team-level evidence is thinner. The mechanism by which individual gains compose into team gains is not well-established, and there are plausible failure modes: AI-produced shortcuts that bypass learning, deskilling on tasks that the AI handles, asymmetric capability accumulation across team members.

The framework's answer is structured knowledge transfer (mandatory gap declarations, daily help sessions, AI access for all roles including top management) to keep individual gains flowing into team capability. Whether this works at scale is an empirical question.

Redundancy vs. velocity

Redundancy buffers — overlapping expertise, dual sub-teams — improve resilience and learning rate, at the cost of nominal velocity (you're "doing the same thing twice"). Reliability engineering supports the resilience claim. But the velocity penalty is real, and frameworks that promise both higher velocity and higher resilience need to be specific about how the trade-off resolves.

The argument is that integration effects (faster learning, better feedback, lower outage cost) more than offset the nominal velocity penalty. This is plausible but context-dependent. In low-uncertainty, high-throughput environments, the redundancy may not pay for itself.

The Minimal Pilot Plan

The most useful part of the proxy-validation paper, in my view, is its proposal for a minimal pilot — what would actually count as validating the integrated framework, in language any CTO would recognize.

The proposed pilot includes:

DORA-style engineering performance metrics: lead time, deployment frequency, change failure rate, MTTR. These are the standard outcome metrics for engineering organizations.
Psychological safety measurement: repeated, validated surveys (e.g., Edmondson-style instruments) to detect erosion under competitive structures.
AI augmentation effect measurement: comparison of work done with and without AI assistance, controlling for task type and contributor experience.
Redundancy effect measurement: outage and recovery metrics in dual-team versus single-team configurations.

The framing is correct: a pilot that doesn't measure the integration tensions can't tell you whether the framework is working as a system. A pilot that measures only velocity will produce false-positive validations whenever competition is producing short-term effort gains while eroding longer-term capability.

What This Means for Any Framework Decision

Three things every CTO should take from the proxy-validation method:

1. Pillar evidence does not validate integrated frameworks

When a framework is sold to you with citations, ask which citations are pillar-level and which are integration-level. Most are pillar-level. That's not disqualifying — it's the state of the evidence — but the framework should be presented honestly as such.

2. Integration tensions are where frameworks fail

The places frameworks fail in production are usually the integration tensions, not the individual pillars. A framework that can name its own integration tensions is more trustworthy than one that can't, because the tensions are where you'll need to invest extra governance.

3. The pilot you run is the validation you have

If you adopt a framework, the pilot data you generate is the integrated-framework evidence you have. Design it to measure the integration tensions, not just the velocity outcomes. A pilot that measures only velocity tells you nothing about whether the framework is sustainable.

The Broader Lesson

The proxy-validation stance is correct beyond hybrid-team management. The same pattern applies to:

DevOps maturity models: each practice has evidence; the integrated transformation often doesn't.
AI deployment frameworks: individual model evals are well-developed; integrated agent performance under real-world distribution is much less so.
Engineering org transformations: every individual practice has supporting research; the transformation as a whole is rarely validated.

Adopting the proxy-validation stance internally — naming what's pillar-validated, what's integration-tense, and what's contingent on context — produces more honest framework evaluations and more defensible adoption decisions.

The frameworks worth adopting are the ones that can name their own contingencies. The frameworks worth avoiding are the ones that promise integrated benefits without naming the integration tensions.

Source: Fradelos, G. The Honey Badger Management Framework for Human-AI Hybrid Organizations: A Proxy Validation and Integration Analysis (Geneva, January 6, 2026). SSRN 6306679.

If you're evaluating a management framework for a hybrid engineering team and want a sober view of what's actually validated, talk to a CTO about deploying nearshore engineering capacity that has run pilots through the integration tensions.