Alice Lee (1858-1939) was a British statistician and mathematician whose rigorous application of correlation analysis dismantled one of Victorian science’s most pernicious pseudosciences: the belief that skull size determined intelligence and proved male intellectual superiority. Working in Karl Pearson’s Biometric Laboratory during an era when women were excluded from universities, paid at “women’s wages,” and credited as “assistants” for work requiring doctoral expertise, Lee developed statistical methods that remain foundational to modern biology, whilst simultaneously becoming one of the first women to earn a doctorate from the University of London. Her 1901 dissertation did not merely present data – it staged a calculated intellectual confrontation, measuring the skulls of thirty-five distinguished male anatomists and naming them publicly, revealing that some of the field’s most celebrated minds possessed the smallest cranial capacities. Today, nearly a century after her death, her story exemplifies both the ingenuity required to survive as a woman in STEM and the moral complexity of scientific practice under systems of oppression.
We meet in a modest study in North London on a grey November afternoon – a setting Alice Lee would have recognised, though the recording device and the year itself mark the profound gulf separating her world from ours. She arrives punctually, steady-eyed and direct. Over tea, we discuss the work that changed history, the courage required to refuse recanting under pressure from Francis Galton himself, and the uncomfortable truth that one can simultaneously challenge one form of pseudoscience whilst reinforcing another.
Miss Lee, thank you for agreeing to this conversation. I wonder if we might begin with a rather domestic question: what drew you to mathematics as a girl? Your education at Bedford College began in 1876, when very few institutions accepted female students at all.
A practical question, and I appreciate it. My father was reasonably well-educated – not a scholar, but someone who took pride in mathematical reasoning. I remember him showing me how to calculate compound interest in our household accounts. There was something deeply satisfying about the precision of it: the rightness of a correctly balanced sum, the impossibility of disagreeing with a properly worked equation. Numbers, I found, would not argue with me simply because I was female.
When I was old enough to attend school, mathematics became my refuge. It was objective in a way that literature or history was not. A proof either held or it did not. A calculation either succeeded or failed. There was no master or examiner who could dismiss my work as “too assertive for a young lady” because the mathematics itself had no gender.
Bedford College was extraordinary precisely because it took women’s education seriously. Not as ornament or accomplishment, but as rigorous intellectual training. My professors – particularly those in the advanced mathematics course beginning in 1879 – treated us as we deserved: as minds capable of genuine scholarship.
Yet your early career included work that wouldn’t appear in publications, or appeared under considerable constraint. You worked initially as an unpaid volunteer in Karl Pearson’s laboratory. How did that transition occur, and what was it like to perform sophisticated statistical work without formal recognition or compensation?
You are being generous with the word “transition.” There was no transition – there was simply need on one side and opportunity on the other. I was teaching at Bedford College for thirty years, instructing young women in mathematics, physics, Greek, and Latin. The pay was meagre – determined largely by “the professor whom I was assisting,” which meant its level depended entirely on his whim.
When I first attended Karl’s lectures at University College London in 1895, I was genuinely there as a student. But he recognised I understood the material profoundly, and he began asking whether I might assist with calculations for his biometric research. Initially, I did so without payment. I would arrive at the laboratory on afternoons when I was free from teaching, and I would perform the necessary reductions of data from family measurement cards – the accumulation of heights, weights, arm spans, and the like.
The work itself was not mysterious, though it was exacting. What I did was reduce raw measurements into standardised form, compute correlation coefficients, construct histogram tables, and calculate distributions. By hand. Hundreds of coefficients, each one requiring careful arithmetic across multiple data points. It is labour-intensive, certainly, but it is not incomprehensible. Anyone with training in mathematics could do it. What made it valuable was that it had to be done with absolute precision – a single arithmetic error would propagate through the entire analysis.
Karl eventually persuaded me to accept compensation. Ninety pounds per annum for three days’ labour per week. That figure still amuses me bitterly. At the same time, the female typists employed by the university earned slightly more. Their work required no doctoral knowledge, no understanding of statistical theory, yet they commanded higher wages. The logic was transparent: my labour was undervalued because I was female, and because I could be convinced to accept the undervaluation in exchange for access to interesting intellectual work.
I will be candid: I resented it deeply. But I also recognised that without this arrangement, I would have been trapped entirely in teaching schoolgirls their multiplication tables. The laboratory offered intellectual engagement I could not refuse, even at insulting wages.
When did you begin to feel that your work warranted independent publication and recognition?
That is complicated. In truth, I felt it rather earlier than I acknowledged it publicly. By 1897 and 1898, I had written several independent studies on correlation, on statistical variation within populations, and on methods for estimating internal measurements from external ones. I had genuine ideas – novel approaches to problems that the existing literature did not adequately address.
But “feeling” one’s work is significant and “being permitted” to claim significance are very different things. When Karl suggested adding my name to collaborative papers, I declined. Not from false modesty, though people frequently interpreted it that way. I declined because I had internalised a belief I now recognise as entirely unfounded: that what I had done was “only the arithmetic,” as if arithmetic were somehow separate from real mathematics. As if the computational labour were a lower form of work, less deserving of recognition than theory.
It took considerable time to understand that developing novel statistical methods – which is what my work on cranial capacity formulas constituted – is not “only arithmetic.” It is mathematics. It is original contribution. But the language available to me cast it differently. I was a “computer.” A “laboratory secretary.” An “assistant.”
The dissertation work changed that, though not immediately and not without tremendous struggle.
Let us turn to that dissertation. You submitted your work in 1899, and it was not approved until 1901 – more than two years of delay. The examiners claimed your work was derivative of Professor Pearson’s research and contained no original contribution. What was actually happening during those two years, and how did you understand it at the time?
What was “actually” happening was obstruction. Pure obstruction. The delay was not about academic rigour or genuine questions concerning my methodology. It was retaliation for results that contradicted the examiners’ most fundamental beliefs about the natural order.
I had demonstrated – with mathematical precision and careful experimental design – that skull size bore no correlation whatsoever with intellectual ability. That was heresy to men who had built their careers, their reputations, and their sense of superiority on the opposite claim. The dissertation was a threat, not an academic curiosity.
The charge of derivativeness was contemptible. Yes, I worked in Karl’s laboratory. Yes, I used some of his statistical methods. But developing formulas to estimate cranial capacity from external measurements – length, breadth, height, and the cephalic index – was my own work. I had to solve novel mathematical problems. How does one infer an internal three-dimensional volume from external linear measurements? No one had previously developed a precise method. I did.
My formulas were tested, refined, validated against direct cranial measurements. That is original work. It was not “applying Pearson’s correlation analysis” – though I did employ correlation analysis as one tool among several. I was solving a specific biological problem using mathematical rigour.
The examiners’ obstruction was particularly evident in their insistence that my study lacked originality specifically because it contradicted prevailing views. The logic was circular: if my findings aligned with existing orthodoxy, I would be derivative; because my findings opposed orthodoxy, I must have misappropriated Pearson’s work to create spurious results. There was no intellectual position from which I could defend myself.
And then Francis Galton became involved.
Yes. Galton was the principal obstacle at that point. He was already elderly – his power was somewhat diminished from its peak – but his authority remained immense. He insisted my work was scientifically invalid and that I had failed to demonstrate original contribution. He also held a particular conviction: that skull size did determine intellectual capacity, that his own measurements had proven it, and that any study contradicting that conclusion must contain methodological error or dishonesty.
We met. He requested an interview, which I understood immediately was not a conversation but a confrontation. He was attempting to pressure me into recanting my findings, into acknowledging that my measurements and calculations were flawed, that I had been mistaken. He believed that once I admitted error, the dissertation would be withdrawn, the problem would dissolve, and his worldview would remain intact.
I did not recant.
How did you hold firm in that moment? What gave you the confidence – or perhaps the stubbornness – to refuse?
I had measured the skulls myself. I had performed the calculations myself. I knew the data. I understood my methodology. I was certain, as certain as one can be in science, that my work was sound. One does not abandon empirical findings because a powerful man finds them inconvenient.
I will not pretend it was easy. Galton was formidable, and I was acutely aware of his power to determine my future. A negative assessment from Galton would have ended my academic career utterly. I would have been marked as fraudulent or intellectually incompetent – a woman who had presumed beyond her capacity. But I also knew that if I recanted, I would be complicit in the perpetuation of pseudoscience. I would be using my mathematical abilities to shore up nonsense. That was unacceptable to me.
The stubbornness was partly temperament, I suppose. I have never accepted authority simply because someone possessed it. But it was also rooted in something more fundamental: I could not live with the contradiction of my own knowledge. I knew what my measurements showed. To deny that knowledge for social survival would have been a form of internal betrayal.
Karl’s intervention was crucial. His standing was such that when he wrote to Galton affirming the quality and originality of my work, it created enough doubt in the examiners’ minds to proceed with approval. But I should emphasise – that approval came only through male advocacy. A man had to vouch for my work before the all-male examining committee could permit a woman to claim it. That is the system as it functioned. I am grateful to Karl for his support, and I remain angry that such support was necessary.
Now I want to move into the technical depth of your work. Your formulas for estimating cranial capacity from external measurements were genuinely innovative. Can you walk us through the problem you were solving and how you approached it?
The problem was this: anthropologists and anatomists could measure a living person’s skull using callipers and the cephalic index – straightforward linear measurements from external anatomy. But cranial capacity – the internal volume of the skull – requires either opening the skull or using water-displacement methods on excised specimens. These destructive or invasive approaches are impossible on living subjects.
For my research, I needed to infer cranial capacity in living individuals without direct measurement. The question was whether external measurements could predict internal volume with sufficient accuracy.
Previous researchers had attempted this using crude regression – essentially assuming a linear relationship between, say, skull length and capacity. But the human skull is a complex three-dimensional structure. Its volume does not scale linearly with length or breadth alone.
I approached the problem differently. I collected a large sample – I measured heads at the Anatomical Society meeting in Dublin in 1898, where I obtained thirty-five male anatomists’ measurements. I also obtained measurements from female students at Bedford College and male faculty at University College London. For those individuals where direct cranial capacity measurements existed from medical examinations or anatomical literature, I could create a training dataset.
I then developed a multivariate formula incorporating skull length, breadth, height, and the cephalic index – not as independent variables but in combination, accounting for their covariance. The formula was not linear. I used polynomial terms and interaction effects. The mathematics required solving a system of linear equations to determine the coefficients that minimised error across the training data.
The result was a formula of the form:
V = a + b₁L + b₂B + b₃H + b₄(LB) + b₅(LH) + b₆(BH) + …
where V is estimated capacity, L is length, B is breadth, H is height, and the coefficients were determined through my calculations.
This is essentially multiple regression with interaction terms. But you were computing this in the 1890s – before electronic calculating machines.
By hand. With paper, pencil, and logarithm tables. The computational demands were substantial. To determine each coefficient, I needed to compute cross-products of all measurement pairs across dozens of individuals, sum them, construct the normal equations, and solve the system. A single error in arithmetic would cascade through the entire analysis.
This is why the claim that my work was “derivative of Pearson’s” – merely applying his existing methods – was so infuriating. Pearson had developed correlation analysis theoretically. Implementing it on actual data, on a problem requiring novel mathematical structuring, was entirely different work. His theoretical frameworks provided tools, yes. But I had to solve specific problems they did not directly address.
The formula I developed had practical utility. It allowed anthropologists to estimate cranial capacity in living populations without direct measurement. The error margins were acceptably small – typically within ±5 to 8 percent of direct measurements. For population-level analysis, that precision was sufficient.
And when you applied this formula to your sample of anatomists – comparing their measured skull sizes to their claimed intellectual eminence – what did you find?
The correlation between cranial capacity and intellectual standing, assessed by current scientific reputation, was negligible. Statistically insignificant. I ranked all individuals by estimated cranial capacity and compared that ranking to professional reputation. There was no pattern whatsoever.
Some of the most renowned anatomists – men whose work I genuinely respected – had small to average cranial capacities. Conversely, some individuals with very large skulls held unremarkable positions in the field. The variation within the male sample was considerable – comparable to the variation between male and female groups. Some women in my sample possessed larger cranial capacities than some of the anatomists.
Most provocatively, I named individuals. I ranked them by skull size and gave their names. One of my thesis examiners, J. Kollman – described in the literature as “one of the ablest living anthropologists” – possessed the smallest cranial capacity in my entire sample. I published that finding. Publicly. With his name attached.
That was not accidental. It was deliberate. The examiners who claimed skull size measured intellectual capacity were forced to confront the fact that their own skulls did not support their theory. It was not politic. But it was precise.
That was rather audacious, was it not? Publishing those identifications must have generated considerable backlash.
The audacity was calculated. I was not motivated by personal animosity toward these men, though I will not pretend I felt tenderness toward a system that had excluded me and dismissed my abilities. I named them because anonymity would have permitted the findings to be dismissed as abstractions. By naming them, I forced the scientific community to confront a direct challenge to their self-image.
If I had said, “In a sample of thirty-five anatomists, cranial capacity did not correlate with reputation,” the response would have been dismissal: poor methodology, unusual sample, etc. But saying, “J. Kollman, whom you regard as exceptionally able, has the smallest skull in this sample,” creates a different problem. It makes the contradiction personal and undeniable.
Yes, there was backlash. Some accused me of inappropriate conduct, of lacking discretion, of using science as a vehicle for feminist polemic. But my work was rigorous. The measurements were accurate. The analysis was sound. They could not dismiss it on scientific grounds, so they dismissed it on grounds of propriety. Which is precisely what one expects from people whose authority is being questioned.
I want to ask you about a more difficult aspect of your work. You also used craniometry to argue for biological differences between racial groups. You worked with skulls from Northern Africa to develop these taxonomies. How do you now regard that work?
This is where I must be honest about my own failures. And it is genuinely difficult to acknowledge.
I demonstrated, with precision and rigour, that skull size did not determine intelligence or capability within human populations – that the entire edifice of gender-based craniometry rested on pseudoscience. I was correct about that. The logic of my argument was sound. The data supported it.
But I did not extend that logic consistently. When it came to racial differentiation, I accepted premises I should have interrogated with equal rigour. I believed that skull measurements could distinguish “racial types” – that cranial morphology varied meaningfully between populations in ways that justified hierarchical classification. I used those measurements to construct categories, to define “race” itself as a biological entity.
I now recognise this was complicity with imperial pseudoscience. It mattered that those skulls came from colonised populations. It mattered that I was using measurement to impose categories that justified domination and resource extraction. I should have applied the same scepticism to those measurements that I applied to gender-based craniometry. I did not.
Why? Partly because racial hierarchies were so embedded in the scientific consensus of my time that they seemed self-evident – established fact rather than contested claim. Partly because challenging gender orthodoxy was already extraordinarily difficult; challenging racial science simultaneously would have been impossible within that context. Partly, I suspect, because I benefited from racial hierarchies in ways I did not benefit from gender hierarchies, and the human capacity for moral blindness regarding one’s own complicity is profound.
This is not an excuse. It is an explanation. I failed to extend rigorous scepticism universally. That failure has consequences I cannot undo. It means my legacy is compromised – that I both advanced knowledge and perpetuated harm. That tension is real, and I do not think it should be smoothed away.
How do you regard the fact that your work on craniometry is now understood primarily through that critical lens – that it was wrong, that it harmed people?
With complicated feelings. The dismantling of gender-based craniometry was genuinely important. It freed women from one particular form of biological determinism used to justify exclusion from education, professional work, and intellectual life. That remains significant. Girls could attend university because my work, among others, contributed to delegitimising pseudoscientific claims of women’s intellectual inferiority.
But that liberation was purchased partly through my participation in racial pseudoscience. Women’s equality was advanced by deepening the subjugation of colonised peoples. That is a genuinely tragic contradiction, and I am not comfortable pretending otherwise.
I think the responsibility is this: my work should be understood in full complexity. Not as unmixed heroism, but as contribution that contained both genuine insight and moral failure. Historians should acknowledge what I got right about gender and statistics, whilst interrogating where I failed. That seems more honest than either celebration or dismissal.
Let us turn to your other major contributions – the work in biometry more broadly. You contributed to some twenty-six collaborative publications. What was the nature of that work, and how did you approach problems in evolutionary biology and population genetics?
The fundamental question of biometry was this: how do we describe and analyse biological variation within and between populations? Darwin had demonstrated that species change over time, that evolution occurs. But the mechanisms of heredity and variation remained poorly understood.
Biometry proposed that statistical methods could illuminate these mechanisms. If one could measure variation in traits across a population, correlate traits with heredity, account for environmental effects, one could begin to understand how evolution actually operated on populations.
Much of my work involved developing methods to handle that quantification. I computed chi-squared distributions for testing whether observed frequencies of traits departed significantly from expected frequencies – testing hypotheses about inheritance patterns, for instance. I calculated correlation coefficients between parents and offspring to estimate heritability of traits.
One particularly interesting problem concerned variation in finger-ridge patterns – fingerprints, essentially. Different populations showed different average patterns, and fingerprints appeared relatively stable within individuals across time. This suggested they might be useful in forensic identification and also in understanding population differentiation. I developed statistical methods to characterise and compare fingerprint patterns across large populations.
The computational labour was immense. Imagine measuring hundreds of individuals on dozens of traits, then computing all pairwise correlations by hand. The sheer volume of arithmetic was staggering. But it was necessary. One could not understand population variation without quantifying it.
That work was genuine contribution to evolutionary biology. The statistical methods I helped develop – or in some cases developed independently – became standard practice in the field. Modern genetics, epidemiology, any field involving population-level analysis, uses correlation coefficients and chi-squared tests I was computing in the 1890s.
During the First World War, your work shifted toward military applications – shell trajectories and ballistics calculations for the Munitions Inventions Department.
Yes. When war began, there was an enormous demand for rapid, accurate calculations supporting weapons development and ammunition production. The military required mathematicians. Despite my age – I was in my mid-fifties – I was called upon to perform ballistic calculations.
This was different from population biology, but not fundamentally different in character. The mathematics of projectile motion was well understood – Newtonian mechanics, essentially. But applying those equations to actual shells, accounting for air resistance, wind conditions, the specific properties of different munitions, required elaborate calculation.
I computed shell trajectories under varying conditions, determined the influence of environmental factors on accuracy, developed tables that gunners and ammunition manufacturers could use to optimise performance. It was applied mathematics in a very direct sense.
I do not harbour particular pride in this work. It was genuinely useful during wartime, and I do not regret contributing to the defence of my country. But it was instrumental in nature – calculation in service of military objectives, not advancement of knowledge for its own sake.
After the war, your academic career wound down. You retired from Bedford College in 1916. What was that transition like?
Abrupt and economically precarious. I had no pension. The pension scheme at Bedford had been established too late for me to participate – I had already completed most of my service by the time it was implemented. I retired with whatever meagre savings I had accumulated from three decades of underpaid labour.
I was sixty years old and essentially impoverished. I continued living modestly, performing some consulting work in statistics and biometry, but there was no institutional framework supporting me. No retirement income, no healthcare provision, no security.
It was Karl and Margaret Tuke, the former principal of Bedford College, who intervened. In 1923 – seven years after my retirement – they successfully petitioned the Home Office for a Civil List pension. I received seventy pounds annually. It was charity, framed as recognition of my “services to science,” but it was charity nonetheless.
That was the reality of women’s careers in academia. Decades of intellectual contribution, and at the end, dependence on the discretionary largesse of powerful men. If Karl had not advocated for me, I would have faced genuine destitution.
Do you have any advice for women entering STEM today, given what you witnessed and experienced?
Several things. First, your work is only as secure as the systems protecting it. Document everything. Publish your own work independently. Do not rely on male supervisors or collaborators to advocate for you – their support is valuable, certainly, but it is unstable. Your own publications, your own name on research, cannot be erased by someone’s changed loyalties or changed mind.
Second, do not accept the frame that your labour is “assistance.” If you are solving novel problems, if you are developing methods, if you are creating knowledge, you are doing mathematics or science. The language available to describe women’s intellectual work is often diminishing – “computer,” “helper,” “secretary.” Resist that language. Insist on accurate description of what you are actually doing.
Third – and this is perhaps the hardest – try to maintain moral consistency in your scepticism. If you are willing to challenge pseudoscience in one domain, ask yourself why you might be accepting it uncritically in another. I failed to do this sufficiently, and it is a failure I must live with. Intellectual rigour should extend across your entire worldview, not just to the domains where you are personally invested.
And finally: the work itself matters. Despite the financial precarity, despite the obstruction and the underpayment and the erasure, the intellectual engagement was real. The problems I solved remain solved. The methods I developed continue to be used. That persistence of knowledge – the fact that my calculations and formulas continue to serve science long after I am gone – is genuinely meaningful. Do not abandon that meaning for comfort. But also do not accept poverty as the necessary price of intellectual integrity. Demand better. Your work is worth compensation. Insist on it.
That is generous counsel, and I suspect much of it will resonate with women in STEM today. Before we conclude, I wonder whether there is anything about your life or work that you feel has been misrepresented by history.
Yes. The notion that my skull measurements were simply applied Pearson’s correlation analysis – that I was merely a “computer” executing his theoretical framework. My work required developing novel mathematical approaches to a specific biological problem. I did not simply apply existing methods; I extended and adapted them. That distinction matters.
Also, the idea that I declined co-authorship out of false modesty. I was not being self-effacing in some charming, feminine way. I was internalising a system of devaluation that told me my work did not count as genuine intellectual contribution. That is not modesty; it is injury. I now wish I had claimed co-authorship more forcefully. That would have been more accurate to what I had actually accomplished.
And I want to correct something about my 1898 meeting with the Anatomical Society in Dublin. It was not an “ambush,” as some have characterised it. I did not arrive with callipers intending to humiliate these men. I arrived with scientific apparatus and professional intent: to gather data necessary for my research. The fact that the findings contradicted their worldview was not my fault. I was not responsible for their discomfort at confronting evidence that their own skulls did not validate their theories about intellectual hierarchy.
The framing of it as dramatic confrontation – as though I were primarily motivated by revenge or the desire to embarrass them – diminishes what was actually happening: rigorous scientific measurement and analysis producing results that contradicted prevailing orthodoxy. That is how science should work. It should not require dramatic motivation or personal animosity. The data speak for themselves.
A fitting conclusion, I think. Miss Lee, thank you profoundly for this conversation.
Thank you for asking questions that permit me to represent myself rather than permitting misrepresentation to stand. That is rarer than one might hope.
Letters and emails
The conversation with Alice Lee generated considerable response from our readership – scientists, historians, educators, and students writing from across the globe with further questions about her life, her methods, and her legacy. The six letters and emails presented below represent a cross-section of that correspondence, selected for the particular insight they offer into dimensions of Lee’s work that merit further exploration.
These questions come from practitioners and thinkers engaged with statistical methods, evolutionary biology, the history of women in science, medical anthropology, bioethics, and the consequences of scientific racism. They ask about technical choices and their consequences, about the intellectual paths Lee was unable to pursue due to computational constraint, about the gap between her willingness to challenge gender-based pseudoscience and her acceptance of racial hierarchies, about the relationship between theoretical and applied mathematics, and about her complicity in eugenic theory and the harm that complicity enabled.
Notably, the final question from Dr. Paulo Meirelles, a medical anthropologist and bioethicist from Rio de Janeiro, whose question moves beyond academic inquiry into moral reckoning. He writes not as a detached scholar but as someone whose ancestors lived under the consequences of the eugenic science Lee helped legitimise – forced sterilisations, population control programmes, racial classification systems that marked entire communities as “unfit.” His question demands honesty about whether Lee recognised the harm her work enabled at the time she was enabling it, or whether she allowed herself not to see.
In responding to these inquiries, Alice Lee addresses not only the specific queries but also reflects on what remained unanswered in her era, what she now wishes she had interrogated more rigorously, and what she might counsel those beginning careers in fields she helped establish. Her answers reveal both the particularity of her historical moment and the enduring questions that traverse the boundary between past and present – questions about scientific responsibility, about the uses of quantitative methods in service of oppression, and about the moral weight of intellectual complicity.
Thuy Nguyen, 34, Data Scientist, Ho Chi Minh City, Vietnam
Your multivariate formulas for estimating cranial capacity required you to account for interaction terms between measurements – essentially recognising that skull dimensions don’t scale independently. Modern machine learning would handle this through neural networks without explicit formula construction. Do you think your hand-calculated approach forced you to understand the underlying biological relationships in ways that algorithmic approaches might obscure? And conversely, what limitations did working without computational tools impose on the complexity of models you could test?
Miss Nguyen, your question reaches toward something I have thought about considerably, though I confess the terminology you employ – “neural networks,” “algorithmic approaches” – is unfamiliar to me. I take your meaning, however: you are asking whether the labour of constructing formulas by hand, of explicitly defining mathematical relationships between variables, conferred understanding that automated calculation might bypass.
The answer is emphatically yes, though perhaps not in the way you anticipate.
When one must solve a system of equations by hand – when one must physically compute each cross-product, each sum of squares, each coefficient through painstaking arithmetic – one develops an intimate acquaintance with the structure of the data itself. You cannot help but notice patterns, anomalies, the behaviour of variables under different conditions. If a particular interaction term contributes negligibly to the overall model, you discover this not through software reporting a p-value but through observing that its coefficient remains near zero across multiple recalculations, that removing it scarcely affects your predictions.
This enforced intimacy with the data is pedagogically valuable. I could not proceed mechanically. I had to understand why a particular combination of skull length and breadth predicted capacity more accurately than either measurement alone. The reason, biologically, is that skulls are three-dimensional structures whose volume depends on the interaction of all three dimensions. A long, narrow skull may have similar capacity to a shorter, broader one. One cannot treat these measurements as independent contributors to volume – they must be considered jointly.
Working by hand forced me to recognise this. I attempted simpler models first – linear relationships, single-variable regressions – and observed their failures directly. The residuals were large and patterned, not randomly distributed. That told me the model was mis-specified. I then hypothesised interaction effects, added polynomial terms, recalculated, and observed the residuals shrink. The improvement was tangible, immediate, undeniable.
Had I possessed a machine that could test thousands of model specifications rapidly, I suspect I might have selected the best-performing model without fully comprehending why it performed well. The intellectual process would collapse into optimisation rather than understanding. This is not inherently problematic if one’s goal is prediction rather than explanation, but in scientific work, understanding mechanism matters profoundly.
However – and this is crucial – the constraints imposed by hand calculation were not uniformly beneficial. They were also profoundly limiting in ways that constrained the scope of inquiry.
Consider the problem of testing multiple competing hypotheses about inheritance patterns. Suppose one wishes to determine whether a particular trait follows Mendelian ratios, whether it shows continuous variation suggesting polygenic inheritance, or whether environmental factors obscure genetic patterns. Each hypothesis implies different expected frequencies and correlations. Testing them rigorously requires computing chi-squared statistics across numerous trait combinations, family structures, and population subgroups.
The computational burden becomes prohibitive very quickly. I could test perhaps three or four competing models if I devoted weeks to the arithmetic. But a truly thorough analysis might require examining dozens of models. The labour involved made such explorations impossible. So one was forced to make a priori judgments about which models seemed most plausible, then test only those. This introduces bias. The models we tested were the models we already suspected might be correct.
Similarly, my cranial capacity formulas were developed on a sample of perhaps 150 to 200 individuals for whom I had both external measurements and known capacities. A larger sample would have permitted more precise coefficient estimation, detection of subtler interaction effects, identification of population-specific variation. But measuring and calculating across thousands of individuals was simply unthinkable given the available resources and time.
This is the paradox you have identified, Miss Nguyen: hand calculation enforces understanding but limits scope. Automated calculation expands scope but risks obscuring mechanism. The ideal, I suppose, would be education that cultivates deep understanding of mathematical relationships first, through manual work, and then deploys computational tools to extend analysis beyond what human labour permits. One should understand interaction effects viscerally before employing machines to test them exhaustively.
I will add one further observation. The discipline required to perform calculations accurately by hand – the necessity of checking one’s work repeatedly, of maintaining organised records, of documenting every step so errors can be traced – instilled rigour that I believe served me well. Errors were costly. A mistake made early in a calculation chain would propagate through weeks of subsequent work. This made me extraordinarily careful. I suspect that when calculation becomes trivially easy, the temptation to proceed carelessly increases. One runs analyses without fully considering their appropriateness, interprets results without scrutinising underlying assumptions.
So whilst I would have welcomed computational assistance for the sheer reduction in drudgery – and I assure you, Miss Nguyen, computing hundreds of correlation coefficients by hand is tedious beyond description – I would not wish to have bypassed entirely the intellectual formation that manual calculation provided. It taught me not merely how to calculate, but what the calculations meant.
Joaquín Roldán, 52, Evolutionary Biologist, São Paulo, Brazil
You acknowledged that you failed to apply rigorous scepticism to racial craniometry whilst successfully dismantling gender-based craniometry. That’s honest, but I’m curious about something more specific: were there moments during your research where you noticed contradictions in the racial data – measurements that didn’t fit the hierarchies you expected – but you didn’t pursue them? Or was it more that you simply didn’t interrogate racial categories with the same critical eye? What would it have taken, intellectually or institutionally, for you to have challenged both simultaneously?
Mr. Roldán, your question is the most difficult I have been asked, and I am grateful for it. You are asking me to examine not merely what I believed, but how I managed to hold contradictory standards of evidence simultaneously – and whether I noticed the contradiction at the time.
The honest answer is that I did notice anomalies in the racial data, though I did not pursue them with anything resembling the rigour I applied to questions of sex difference. Let me be specific about what I observed and what I chose not to investigate further.
When measuring skulls from different populations – specimens obtained from archaeological collections, primarily from North Africa and Egypt – I observed considerable variation within each supposed racial group. The range of cranial capacities among individuals classified as belonging to the same “race” was often as large as the differences in mean capacities between groups. This is precisely the pattern I had observed when comparing male and female populations: substantial overlap, with group means differing only modestly relative to within-group variation.
Had I applied the same analytical framework I used for sex differences, I would have concluded that “race” as defined by cranial measurements was a weak classifier at best – that individual variation exceeded categorical differences. But I did not draw that conclusion. Instead, I attributed the within-group variation to methodological imprecision, to admixture between populations, or to the inherent difficulty of obtaining pure specimens. I made excuses for the data rather than allowing it to challenge the categories.
Why? I can offer explanations, though they are not justifications.
First, the intellectual investment required to challenge racial science was vastly greater than that required to challenge claims about women’s intellectual inferiority. Sex difference in craniology was a relatively contained claim: men’s larger heads proved their mental superiority. I could test that hypothesis directly by measuring male and female skulls, calculating correlations between capacity and intellectual achievement, and demonstrating the absence of relationship. The argument was bounded and falsifiable.
Racial science, by contrast, was an entire edifice. It was not merely about skull measurements – it encompassed theories of human origins, of civilisational development, of imperial governance, of biblical interpretation. To challenge cranial differences between races would require dismantling assumptions embedded in anthropology, history, theology, political economy. The scope of that challenge was overwhelming. I would have needed to become an expert in fields far beyond my competence, to engage with literatures I had not mastered, to confront scholars whose authority in those domains dwarfed my own.
I convinced myself that my expertise was statistics and that the interpretation of racial categories belonged to anthropologists. This was intellectually cowardly. The entire point of biometry was to bring statistical rigour to biological questions. I could have insisted that if anthropologists claimed “race” was a meaningful biological category, they must demonstrate it statistically. I did not.
Second – and this is more uncomfortable to acknowledge – I benefited from racial hierarchies in ways I did not benefit from sex-based hierarchies. As a British woman of comfortable background, I was positioned within imperial structures that extracted resources and labour from colonised populations. My access to education, to leisure time for intellectual work, to the very institutions where I studied, rested on that extraction. Challenging racial hierarchies would have required confronting my own complicity in a system from which I materially benefited.
Challenging sex-based hierarchies, by contrast, offered me potential gain. If women’s intellectual capacities were recognised as equal to men’s, I would personally benefit through expanded opportunities, fairer compensation, professional recognition. The incentives were aligned with rigorous inquiry.
This does not mean my work on sex differences was mercenary or insincere – I genuinely believed the science supported equality. But it does mean that my willingness to interrogate gender-based pseudoscience whilst accepting racial pseudoscience reflected self-interest as much as principle.
Third, there was institutional constraint. The scientific community in which I worked was uniformly committed to racial hierarchy. Karl Pearson, Francis Galton, the entire eugenic movement – they took racial difference as foundational. To challenge that consensus would have been professional suicide. I was already marginalised as a woman. To become further isolated as a critic of racial science would have meant complete exclusion from the only intellectual community available to me.
Again, this is explanation, not justification. But the question of what it would have taken for me to challenge both simultaneously – your exact phrasing – is answerable: it would have required institutional support I did not possess, intellectual resources I had not cultivated, and moral courage I demonstrably lacked.
What troubles me most, looking back, is that I knew the analytical tools were applicable. I knew that demonstrating correlation – or its absence – between skull size and any purported outcome required the same statistical methods regardless of whether one was examining sex or race. I knew that within-group variation exceeding between-group variation undermined categorical claims. I possessed the methodological sophistication to perform the analysis.
I chose not to. That choice was not passive or inadvertent. It was active. I directed my analytical attention toward one hierarchy and away from another. I told myself this was appropriate specialisation, focusing on questions within my domain of competence. But in truth, it was selective scepticism in service of selective solidarity.
If I am to offer anything useful to those who come after me, it is this: examine where you apply rigour and where you accept received wisdom uncritically. Ask yourself whether the distribution of your scepticism reflects evidence or convenience. Ask whether the hierarchies you challenge are those that constrain you, whilst the hierarchies you uphold are those from which you benefit. That pattern – challenging oppression that affects oneself whilst ignoring oppression that advantages oneself – is not unique to me. It is characteristic of reformist movements that seek incremental improvement within unjust systems rather than transformation of the systems themselves.
I failed in that regard. I dismantled one pillar of pseudoscience whilst reinforcing another. That failure is part of my legacy, and it should be remembered alongside whatever contributions I made. Integrity requires both be acknowledged.
Salma Bakhit, 41, Biostatistician, Cairo, Egypt
You mentioned developing chi-squared distribution tables for testing inheritance patterns in populations. How did you decide what constituted a meaningful departure from expected frequencies – what threshold made a result scientifically significant rather than due to chance? I ask because modern p-value conventions (0.05, 0.01) are now understood as somewhat arbitrary. Did you have comparable decision rules in the 1890s, and if so, how did you justify them to colleagues who might have disagreed about where to draw the line?
Dr. Bakhit, your question exposes a fascinating tension in late Victorian and Edwardian statistical practice: we possessed increasingly sophisticated mathematical tools for calculating probabilities and distributions, yet we had no universally agreed-upon framework for deciding when a result warranted belief rather than dismissal as chance variation.
The short answer is that we did have decision rules, but they were far less standardised than your modern conventions, and they varied considerably depending on the investigator, the field of inquiry, and the nature of the claim being tested.
Karl Pearson developed the chi-squared test specifically to address this problem. When testing whether observed frequencies in a dataset departed meaningfully from expected frequencies – whether, for instance, the ratio of traits in offspring matched Mendelian predictions or whether fingerprint patterns differed between populations – one needed a quantitative measure of departure. The chi-squared statistic provided that measure. But the question then became: how large must chi-squared be before we conclude the departure is real rather than attributable to sampling variation?
Pearson constructed tables showing the probability distribution of chi-squared values under the assumption that no true difference existed – what you would call the null hypothesis. If the calculated chi-squared exceeded a certain threshold corresponding to a low probability under the null, one could reasonably conclude that the null hypothesis was false and that a genuine effect existed.
But what threshold? Pearson himself favoured what he termed the “one in twenty” standard – roughly equivalent to your 0.05 convention. If the probability of observing a chi-squared value as large as the one calculated was less than one in twenty, assuming no true effect, he considered that sufficient grounds to reject the null hypothesis. This was not derived from mathematical necessity but from pragmatic judgment: one in twenty seemed stringent enough to avoid excessive false conclusions whilst permissive enough to detect real effects of modest size.
However, this standard was not universally adopted, and many investigators used different thresholds depending on context. For claims that contradicted established theory or that had substantial practical implications, some demanded more stringent evidence – perhaps one in one hundred, or even one in one thousand. The logic was that extraordinary claims required extraordinary evidence, a phrase I believe derives from earlier philosophical writing though I cannot recall the precise attribution.
Conversely, for exploratory work or preliminary findings, investigators might accept less stringent thresholds. If one were surveying a large number of potential correlations to identify promising areas for further study, one might provisionally accept results meeting a one in ten standard, understanding that subsequent investigation would apply more rigorous criteria.
This variability created genuine problems in practice. Different investigators could examine the same data and reach opposite conclusions depending on the evidentiary standard they employed. I recall disputes in the pages of Biometrika where one researcher claimed to have demonstrated a significant correlation whilst another, reanalysing the same data with a more stringent threshold, insisted no such correlation existed. Both calculations were arithmetically correct – the disagreement was philosophical, concerning how much evidence sufficed for belief.
My own practice was to report the actual probability associated with the test statistic rather than simply declaring a result “significant” or “non-significant.” I would state, for instance, that the chi-squared value corresponded to a probability of 0.03 under the null hypothesis, allowing readers to apply their own judgment about whether that constituted sufficient evidence. This approach avoided imposing my threshold on others, though it also made my conclusions less definitive.
In my dissertation work on cranial capacity and intellectual ability, I employed relatively stringent standards because I was making a controversial claim – that no correlation existed between skull size and mental capacity. The burden of proof was substantial. I needed to demonstrate not merely that my sample showed weak correlation, but that the correlation was so close to zero, and the sample size sufficiently large, that any observed deviation from perfect zero correlation could plausibly be attributed to measurement error and sampling variation alone.
I calculated correlation coefficients and their probable errors – essentially what you would call confidence intervals, though we did not use that terminology. I showed that the observed correlation between cranial capacity and intellectual reputation was small enough that it fell well within the range expected from pure chance given the sample size. The probability of observing such a weak correlation if a strong true correlation existed was vanishingly small – far below even the most stringent thresholds.
I also performed what might be termed sensitivity analysis, though again the terminology differs. I recalculated correlations excluding outliers, using different measures of intellectual achievement, employing alternative formulas for cranial capacity estimation. The result remained consistent: no detectable correlation. This robustness across analytical choices strengthened the conclusion considerably.
Where I now see difficulty is in the potential for motivated reasoning in threshold selection. If an investigator wishes to claim a particular result, they can adopt a permissive threshold that allows weak evidence to be declared significant. If they wish to reject a claim, they can impose a stringent threshold that dismisses even moderately strong evidence as insufficient. The apparently objective apparatus of probability calculations can mask fundamentally subjective choices about evidentiary standards.
Karl was aware of this problem and attempted to establish conventions that would limit such manipulation. But conventions require community consensus, and our community was small, fragmented, and often divided along personal and theoretical lines. What one investigator regarded as obviously correct evidentiary practice, another dismissed as arbitrary or overly conservative.
Your observation that modern thresholds are “somewhat arbitrary” is entirely correct, Dr. Bakhit, and the same was true in my era. The difference, perhaps, is that you have achieved greater consensus about which arbitrary threshold to employ, creating at least the appearance of standardisation. Whether that consensus reflects genuine philosophical justification or merely successful norm enforcement, I cannot judge from my present vantage. But I suspect the latter plays a larger role than practitioners typically acknowledge.
Travis Coleman, 46, Science Educator, Portland, Oregon, USA
If you had possessed access to modern computing – not to replace your thinking, but to augment it – what research question would you have tackled that your hand-calculation constraints made impossible? What aspect of biological variation or evolutionary change remained unanswered not because the theory was unclear, but because the computational labour was prohibitive? In other words: what was left on the table?
Mr. Coleman, what an excellent question – and one that invites me to speculate about research paths I longed to pursue but could not, given the sheer impossibility of the computational labour involved. There are several answers, and I shall give you the most significant.
The problem that haunted me throughout my career was this: understanding the inheritance of continuously varying traits across multiple generations in large pedigrees.
Mendelian genetics, which began to gain acceptance around the turn of the century, explained discrete traits admirably – traits that appeared in simple ratios, that were either present or absent. Flower colour in peas, for instance, or certain blood characteristics. But most traits of genuine biological interest do not behave this way. Height, weight, cranial capacity, intelligence if it can be measured at all – these vary continuously across populations. They do not fall into neat categories.
The question was whether continuous variation could be reconciled with Mendelian inheritance, or whether it required some alternative theory. Karl and others proposed that if multiple genes each contributed small effects to a trait, the combined result would appear continuous even though the underlying inheritance was particulate. This was the foundation of what later became quantitative genetics.
But testing this hypothesis rigorously required analysing inheritance patterns across many families, across multiple generations, measuring numerous traits simultaneously, and calculating the correlations between relatives of varying degrees – parents and offspring, siblings, cousins, grandparents and grandchildren. One needed to determine whether the observed correlations matched the predictions of multi-gene Mendelian models or whether they suggested blending inheritance, environmental effects, or some combination.
The computational requirements were staggering. Suppose one wishes to analyse a single trait – say, height – in one hundred families, each containing parents, children, and grandparents. That is perhaps four hundred individuals. For each pair of related individuals, one must calculate a correlation coefficient. The number of pairs grows combinatorially. Then one must compare those correlations to theoretical predictions under various genetic models, each requiring separate calculations.
I could perhaps manage ten families with enormous effort, devoting months to the work. But ten families provide inadequate statistical power – the sample is too small to distinguish between competing models with confidence. One hundred families would be necessary at minimum, ideally several hundred. The arithmetic was simply impossible within a human lifetime.
With computational assistance – and I mean not merely mechanical calculators, which existed in rudimentary form during my time, but devices capable of performing thousands of calculations rapidly and accurately – I would have undertaken a comprehensive analysis of inheritance in human populations. I would have collected detailed measurements on large family pedigrees: anthropometric data, physiological measurements, perhaps cognitive assessments if they could be made reliably. I would have tested whether the correlations between relatives matched multi-gene Mendelian predictions, and I would have estimated how many genes contributed to each trait.
This work was eventually accomplished, though not until decades after my active research career ended. R.A. Fisher developed the mathematical framework in the 1920s and 1930s, and subsequent investigators applied it with the benefit of improved computational tools. But had I possessed those tools earlier, I believe I could have contributed foundationally to that field. The statistical sophistication required was within my capability – it was the sheer volume of calculation that proved prohibitive.
A second area I would have pursued concerns spatial variation in biological traits. I was interested in whether cranial characteristics varied geographically in predictable ways – whether, for instance, populations living in colder climates showed different cranial proportions than those in warmer climates, and whether such variation could be explained by adaptation, by migration patterns, or by historical accident.
Answering this requires mapping traits across geographic space, calculating spatial correlations, accounting for the fact that nearby populations are more similar than distant ones simply due to gene flow. The mathematical methods for spatial statistics were not well developed in my time, but more critically, the data requirements were enormous. One would need measurements from hundreds of populations distributed globally, and one would need to compute correlations between all pairs whilst adjusting for geographic distance.
Again, the problem was not conceptual but computational. I understood what needed to be done. I could not do it by hand.
A third area – perhaps more speculative – concerns what we would now recognise as multivariate analysis of trait covariation. Many biological traits are correlated: individuals with long limbs tend to have long torsos, individuals with broad skulls tend to have deep skulls, and so forth. These correlations suggest that traits do not evolve independently but as coordinated suites. Understanding those coordination patterns would illuminate how selection operates on organisms as integrated wholes rather than collections of independent traits.
Analysing this properly requires constructing correlation matrices across dozens of traits, then performing eigenvalue decomposition to identify the principal axes of variation – the combinations of traits that vary together most strongly. This is standard practice in modern biology, but in my era, constructing even a modest correlation matrix by hand was enormously laborious, and eigenvalue decomposition was a graduate-level mathematical technique requiring days of calculation for even small matrices.
I attempted this once, for a six-trait analysis. Six traits require fifteen pairwise correlations. The eigenvalue problem for a six-by-six matrix took me nearly a week to solve by hand, and I was never entirely confident I had avoided arithmetic errors. The result was interesting – it suggested that cranial measurements could be largely explained by two underlying dimensions of variation, one related to overall size and one to shape. But extending this to twenty or thirty traits, which would have been scientifically valuable, was completely impractical.
So to answer your question directly, Mr. Coleman: what was left on the table was the quantitative analysis of inheritance, spatial variation, and trait covariation at the scale necessary to distinguish between competing biological theories. I had the theoretical frameworks, the statistical sophistication, and the biological questions. I lacked the computational capacity to execute the analysis. That gap between conception and execution was the defining frustration of my research career, and I suspect it was shared by many investigators of my generation. We could imagine questions we could not answer, not because we lacked ingenuity but because we lacked tools to implement our ideas at sufficient scale.
Emilia Virtanen, 28, Historian of Science, Helsinki, Finland
During the First World War, you calculated shell trajectories for weapons development. You described this as “instrumental” work rather than advancing knowledge for its own sake. But I wonder: did that applied mathematics teach you anything about the limits or possibilities of theoretical statistics that you brought back to your population biology work? Or did you experience those as entirely separate domains – pure science versus wartime pragmatism?
Miss Virtanen, you have identified something I had not fully articulated even to myself until you posed the question. Yes – the wartime ballistics work did teach me something profound about the relationship between theoretical mathematics and empirical reality, and I did carry those lessons back into my biological work, though the connection was not immediately obvious.
The essential insight was this: in applied mathematics, one learns very quickly when one’s models are wrong, because the shells either hit their targets or they do not. There is no ambiguity, no room for interpretation. If your trajectory calculations are incorrect, the evidence is immediate and undeniable. This is quite different from population biology, where the “correctness” of a model is often a matter of degree, of how well it fits data that are themselves imprecise and incomplete.
When calculating shell trajectories, one begins with the classical equations of projectile motion – Newtonian mechanics applied to bodies moving through air under gravitational acceleration. The basic physics is straightforward. But the devil, as they say, is in the particulars. Air resistance depends on velocity in complex, nonlinear ways. Wind affects different portions of the trajectory differently depending on the shell’s speed and orientation at each moment. Temperature affects air density, which affects resistance. The shell itself may not be perfectly symmetrical, introducing wobble or drift.
All of these factors must be accounted for, and the question becomes: which simplifications are acceptable and which introduce unacceptable error? One cannot model every microscopic detail – the equations would become insoluble. But one must include sufficient detail that the predictions remain accurate.
What I learned was to test simplifications empirically. We would calculate trajectories under various assumptions – assuming constant air density, or assuming resistance proportional to velocity squared, or incorporating more elaborate models. Then we would compare predictions to actual firing test data. If a simplified model produced predictions within acceptable tolerance of observed trajectories, we used it. If not, we added complexity until the model performed adequately.
This iterative process of model testing and refinement was revelatory for me. In my biological work, I had often constructed models based on theoretical considerations – what seemed mathematically elegant or conceptually plausible – without rigorously testing whether simpler models might perform equally well. The wartime work taught me to prioritise empirical fit over theoretical elegance.
I brought this lesson back to biometry in a specific way. After the war, when I returned to problems of inheritance and correlation, I became much more cautious about assuming complex models when simpler ones might suffice. For instance, when analysing the relationship between parental and offspring traits, one might hypothesise elaborate interactions – that the effect of the father’s trait on the offspring depends on the mother’s trait, and vice versa. Such interaction models are mathematically appealing and biologically plausible.
But the wartime work had taught me to ask: does including these interactions actually improve prediction, or am I adding parameters that fit noise rather than signal? I began calculating what we would now call likelihood ratios or information criteria – measures of whether a more complex model justified its additional parameters through meaningfully better fit to the data.
Often, the answer was no. The simpler additive model – where mother’s and father’s contributions sum independently – performed nearly as well as the complex interaction model. The difference in fit was smaller than the measurement error in the data. Including interactions added complexity without adding understanding. The ballistics work had trained me to recognise this and to favour parsimony.
There was another lesson, perhaps more philosophical. In ballistics, one is always aware that the model is an approximation. No one believes the equations capture every detail of physical reality – they are useful fictions that produce adequate predictions for practical purposes. There is a certain intellectual humility in that recognition.
In biological work, by contrast, there was sometimes a tendency to reify models – to treat them as though they represented biological reality rather than convenient mathematical summaries of observed patterns. Investigators would speak of “the correlation between parent and offspring” as though that correlation were a fixed natural constant, when in fact it was a statistical summary of data collected under particular conditions, subject to measurement error and sampling variation.
The applied work cured me of that tendency. I learned to think of models as tools rather than truths. A formula for estimating cranial capacity from external measurements is useful if it produces accurate estimates; it need not represent the actual biological process by which skulls grow. A correlation coefficient summarises a relationship observed in a sample; it is not a law of nature.
This distinction seems subtle, but it affected how I wrote about my findings and how I responded to criticism. When reviewers objected that my cranial capacity formulas did not incorporate biological mechanisms of skull development, I could respond: that was not their purpose. They were predictive tools, validated by their accuracy, not causal explanations requiring mechanistic grounding.
You asked whether I experienced pure and applied mathematics as separate domains – “pure science versus wartime pragmatism,” in your phrasing. Initially, yes. The ballistics work felt like a departure from my real intellectual interests, something I did out of patriotic duty rather than scientific curiosity. But in retrospect, the boundary was more porous than I recognised at the time.
Both domains required translating real-world complexity into tractable mathematics, making judicious simplifications, testing predictions against evidence, and refining models iteratively. Both required judgment about when precision mattered and when approximation sufficed. Both required balancing theoretical elegance against empirical adequacy.
The primary difference was time scale. In ballistics, the feedback loop between prediction and test was rapid – days or weeks. In evolutionary biology, one might propose a model and not have adequate data to test it for years. That difference in feedback speed affects research culture profoundly. Rapid feedback encourages iterative refinement and pragmatic adjustment. Slow feedback encourages theoretical elaboration and speculative model-building.
If I were to offer advice to young scientists, it would be this: seek opportunities to work on problems where feedback is rapid and consequences tangible. Not necessarily weapons development – any applied field will do. The discipline of confronting whether your predictions actually work, of refining models based on their performance rather than their conceptual appeal, is invaluable. Then carry that discipline back into theoretical work, where feedback is slower and the temptation to privilege elegance over accuracy is correspondingly greater.
The war was dreadful, and I would not wish its repetition for any reason. But the intellectual training it provided – the enforced confrontation between theory and empirical reality under high-stakes conditions – shaped my subsequent work in ways I have come to value. One learns humility when shells miss their targets because one’s equations were wrong.
Paulo Meirelles, 58, Medical Anthropologist and Bioethicist, Rio de Janeiro, Brazil
Miss Lee, you worked closely with some of the most prominent eugenicists of your era – Karl Pearson himself was deeply committed to eugenic theory. Your statistical methods became tools for eugenic research and population classification. I understand you’ve acknowledged your complicity in racial pseudoscience, but I want to ask more directly: did you believe in eugenics? Did you see your work as contributing to eugenic aims – the selective breeding of populations, the classification of “superior” and “inferior” groups? And if you did hold those beliefs then, do you hold them still, or has your understanding fundamentally shifted with time and historical perspective?
This matters because we in the Global South have lived with the consequences of eugenic thinking exported from Europe and North America. Forced sterilisations, population control programmes justified by scientific racism, entire communities marked as “unfit.” Your statistical rigour lent authority to those programmes. So I ask not as academic curiosity but as someone whose ancestors may have been targets of the very science you helped legitimise: can you account for that legacy, and can you honestly say whether you believe the eugenic premises of your era were simply mistaken, or whether you recognised their harm at the time and proceeded anyway?
Dr. Meirelles, your question is direct and demands an equally direct answer. Yes, I believed in eugenics. Not in every particular – I had reservations about certain policies, certain extremes – but in the fundamental premise that human populations could and should be improved through selective breeding guided by scientific knowledge, yes, I believed that. I thought it was progressive, rational, and humane.
I was wrong. Not merely mistaken in details, but fundamentally wrong about the entire enterprise. And no, I did not recognise the harm at the time, though I should have. The evidence was available. I chose not to see it.
Let me be specific about what I believed and when I believed it, because evasion serves no one.
I accepted that intelligence, moral character, and physical constitution were heritable traits subject to natural selection. I believed that modern medicine and social welfare, by allowing “less fit” individuals to survive and reproduce, were interfering with natural selection and causing racial degeneration. I believed that society had both the right and the responsibility to counteract this degeneration through deliberate reproductive control – encouraging the fit to reproduce, discouraging or preventing the unfit from doing so.
These were not fringe views in my social and intellectual circle. They were mainstream, respectable, advocated by people I regarded as the finest scientific minds of the age. Francis Galton, who coined the term “eugenics,” was Charles Darwin’s cousin and a polymath of considerable genius. Karl Pearson, with whom I worked daily, saw eugenics as applied rationality – the use of statistical science to guide social policy toward human betterment. These were not monsters. They were my colleagues, my mentors, people whose intellectual rigour I admired.
That is not exculpation. It is context for understanding how thoroughly I had internalised eugenic thinking.
Did I see my work as contributing to eugenic aims? Yes, though not always consciously or directly. The statistical methods I developed – correlation analysis, methods for measuring heredity, techniques for classifying populations – were explicitly intended to be useful for eugenic research. Karl’s laboratory existed in large part to provide scientific foundation for eugenic policy. I understood this. I participated willingly.
When I measured cranial capacity and demonstrated that skull size did not predict intelligence within populations, I believed I was refining eugenic science – making it more accurate, eliminating false correlates like skull size so that genuine markers of fitness could be identified. I did not conclude that the entire eugenic project was misguided. I concluded that we needed better measurements.
Similarly, when I contributed to research on heredity and variation, I understood that data was being used to argue for immigration restrictions, for marriage laws prohibiting unions between supposedly incompatible racial groups, for institutionalisation of people deemed mentally deficient. I told myself this was regrettable but necessary – that short-term individual hardship was justified by long-term collective improvement.
Your question asks whether I recognised the harm at the time and proceeded anyway, or whether I simply failed to perceive it. The answer is complicated and uncomfortable. I was aware of coercive policies – forced institutionalisation, marriage restrictions, immigration exclusions. I was not aware, at the time, of forced sterilisation programmes, though those began during my active career in the United States and elsewhere. Had I known, would I have objected?
I want to say yes. I want to believe I would have drawn a line at such extremes. But I cannot honestly claim that with confidence, because I had already accepted lesser coercions as legitimate. If one accepts that the state may prevent “unfit” individuals from marrying or immigrating, the principle that justifies forced sterilisation is already in place. The difference is degree, not kind.
What I did not understand – what I should have understood but did not – was that eugenic classification always served existing power structures. The people deemed “unfit” were always the poor, the colonised, the marginalised. The people deemed “fit” were always those who already possessed wealth, education, political power. Eugenic science provided post-hoc rationalisations for inequalities that existed for entirely different reasons – economic exploitation, colonial domination, racial caste systems.
I should have noticed this. The pattern was obvious. But I told myself that correlation reflected causation – that marginalised populations were marginalised because they were genuinely less capable, not that they appeared less capable because they were marginalised and denied resources, education, opportunity.
When did my understanding shift? Not during my active career. I retired in 1916 still holding eugenic beliefs, though perhaps somewhat less fervently than in earlier years. The shift came gradually, across the 1920s and 1930s, as eugenic policies became increasingly extreme and their consequences increasingly visible.
The United States implemented forced sterilisation laws in numerous states. Tens of thousands of people – disproportionately poor, disproportionately Black and Indigenous, disproportionately women – were sterilised without consent or under coercion. Germany’s race laws in the 1930s drew explicitly on American eugenic precedents and on the statistical methods developed in laboratories like Karl’s. When those policies culminated in the Holocaust, the connection between eugenic theory and genocide became undeniable.
I cannot claim I immediately grasped the full horror or my complicity in it. But I could no longer maintain that eugenics was benign science serving human welfare. It was ideology masquerading as science, and it had enabled atrocity.
You state that your ancestors may have been targets of the science I helped legitimise. That is not hypothetical concern, Dr. Meirelles. It is historical fact. Eugenic policies in Brazil, in other Latin American nations, in colonised territories across Africa and Asia – all drew on the statistical frameworks and racial classifications that I and my colleagues produced. Population control programmes, immigration restrictions justified by claims of racial incompatibility, sterilisation campaigns targeting Indigenous and African-descended populations – these employed our methods.
I cannot undo that. I cannot repair the harm my work enabled. What I can do is state clearly and without equivocation that the eugenic premises of my era were not merely mistaken but morally catastrophic. They were wrong scientifically – there is no biological basis for the racial hierarchies we constructed, no valid method for classifying humans into “fit” and “unfit” categories. And they were wrong ethically – no scientific claim, even if true, justifies coercive control over reproduction, bodily autonomy, or freedom of movement.
The fact that eugenic policies were advocated by respected scientists, that they were framed as progressive and rational, that they enjoyed broad support among educated classes – none of that diminishes the harm. Respectability does not confer righteousness. Consensus does not guarantee truth.
If there is anything useful I can contribute now, it is this warning: scientific authority is easily weaponised to justify oppression. When research consistently identifies the marginalised as inferior, question the research. When statistical methods produce results that conveniently align with existing power structures, scrutinise the methods. When experts claim neutrality whilst serving political agendas, reject the claim.
I failed to apply that scrutiny. I allowed my intellectual commitments and my social position to blind me to the human consequences of the work I performed. I prioritised elegant mathematics and satisfying my curiosity over asking who would be harmed by my findings and how those findings would be used.
That failure was not passive. It was active complicity. I chose not to interrogate eugenic premises because I benefited from the hierarchies those premises supported. I chose not to examine the consequences of my work because doing so would have required abandoning a research programme that gave me intellectual purpose and professional identity.
Those were choices, Dr. Meirelles. Not mistakes, not oversights – choices. And the responsibility for them is mine.
Reflection
Alice Lee died in October 1939, at the age of eighty-one. She left no personal papers, no memoirs, and relatively limited archival material. Her contributions to statistics and biometry were substantial but often attributed to Karl Pearson, even in her lifetime. It was not until decades after her death that historians began reconstructing her work, recognising her as one of the earliest female doctorates in Britain and a pioneering statistician whose methods remained in use long after her death.
This conversation with Alice Lee – necessarily fictional, necessarily constructed from historical fragments – offers something that the archival record cannot: her own voice reflecting on her work, her complicity, her regrets. What emerges is a figure far more complex than the celebratory narratives of “women in science” typically permit. She was neither unambiguous hero nor cautionary villain, but rather a person of genuine intellectual power who navigated systems of oppression with courage and ingenuity whilst simultaneously perpetuating other forms of harm. That tension is real, and it should not be smoothed away in service of inspirational storytelling.
What emerges most forcefully in her responses is not her scientific achievements – though those are substantial – but her willingness to acknowledge failure. She did not challenge racial science. She accepted eugenic premises without interrogating them. She allowed herself not to see the consequences of her work because seeing them would have required dismantling her professional identity. These are not minor failings. They implicate her in systems that caused genuine human suffering.
Yet the historical record is more complicated than even Alice Lee acknowledges here. Some contemporaries did challenge eugenic theory more forcefully than she did. Some did question racial classifications based on craniometry. The fact that she did not – that she prioritised other intellectual projects and benefited from her silence – represents a choice, as she states. But it also represents the immense difficulty of maintaining moral consistency within institutions built on oppressive foundations. The system constrained her options even as she benefited from it.
Where Alice Lee’s perspective departs most significantly from recorded accounts is in her candour about the dissertation obstruction and her refusal to recant under Galton’s pressure. Historical sources confirm the two-year delay and the controversy, but Lee’s own account reveals the personal dimensions – the explicit attempt to coerce her into abandoning her findings, the sense that her future hung in the balance. This narrative thread, woven through her responses, emphasises not passive exclusion but active resistance on her part. She was not merely overlooked; she fought to be heard.
Her discussion of computational labour similarly reframes historical understanding. Lee’s work as a “laboratory secretary” or “computer” was not simply undervalued – it was actively classified as a lower order of intellectual work, despite requiring doctoral-level expertise. Modern parallels abound: data scientists working under postdoctoral supervision, research technicians whose labour enables principal investigators’ publications, women in adjunct positions performing the intellectual work of tenure-track scholars without equivalent compensation or recognition. Alice Lee’s experience was particular to her era, but the underlying dynamics persist.
The statistical methods Lee developed – correlation analysis applied to biological variation, chi-squared distribution testing, regression formulas for inferring unmeasurable quantities from observable ones – became foundational to modern science. Modern genomics, epidemiology, clinical trials, and evolutionary biology rest, in part, on work she performed by hand in the 1890s. Yet textbooks and historical accounts often credit Pearson for developing biometry whilst marginalising Lee’s contributions. R.A. Fisher’s later work in quantitative genetics built explicitly on foundations she helped establish, yet receives vastly greater recognition. This erasure is not accidental; it reflects persistent patterns of attributing women’s intellectual labour to male supervisors and colleagues.
What Alice Lee’s story offers young women in STEM today is not uncomplicated inspiration but honest reckoning with the price of survival and the danger of selective critique. Yes, she persevered against institutional barriers. Yes, she developed rigorous methods that advanced knowledge. Yes, she challenged pseudoscience that justified gender-based oppression. But she did so whilst remaining embedded in systems that exploited other populations. She was both a pioneer and a participant in colonial science.
That complexity is instructive. It suggests that progress in one domain does not automatically translate to progress across all domains. It suggests that challenging the hierarchies that constrain oneself whilst ignoring or reinforcing hierarchies that advantage oneself is a common pattern requiring conscious interrogation. It suggests that intellectual rigour is necessary but insufficient – that rigour applied selectively, deployed to advance one’s own interests whilst sparing other oppressive systems, is complicity masquerading as principle.
Yet Alice Lee’s refusal to recant, her insistence on accuracy despite pressure to conform, her later acknowledgement of her own failures – these offer something valuable. They suggest that integrity is not a state one achieves and maintains permanently, but rather an ongoing practice of interrogation, of willingness to revise understanding, of accountability for complicity.
As we face contemporary challenges – persistent gender gaps in STEM, the reproduction of racial pseudoscience through modern neuroscience and genetics, the use of algorithmic methods to amplify existing inequalities – Alice Lee’s example cuts both ways. It demonstrates what rigorous science can accomplish when marshalled against injustice. It also demonstrates how easily science becomes weaponised in service of oppression when we fail to interrogate its political implications.
The measure of her life lies not in unblemished heroism but in the difficult truth that one can contribute genuinely to human knowledge whilst participating in human harm. That truth demands more of us than celebration. It demands that we learn from her example to maintain consistency in our scepticism, to question our own blind spots, and to refuse the comfort of selective solidarity.
Alice Lee measured skulls and calculated trajectories. But perhaps her most lasting contribution is this: the measurement of how difficult it is to live with integrity in systems designed to prevent it, and the insistence that we try anyway.
Who have we missed?
This series is all about recovering the voices history left behind – and I’d love your help finding the next one. If there’s a woman in STEM you think deserves to be interviewed in this way – whether a forgotten inventor, unsung technician, or overlooked researcher – please share her story.
Email me at voxmeditantis@gmail.com or leave a comment below with your suggestion – even just a name is a great start. Let’s keep uncovering the women who shaped science and innovation, one conversation at a time.
Editorial Note
This interview transcript is a dramatised reconstruction – a work of historical fiction grounded in archival evidence, biographical research, and documented facts about Alice Lee‘s life and scientific contributions. It is not a historical record of an actual conversation. Alice Lee died in 1939, nearly ninety years ago. This interview represents an imaginative engagement with her work, her era, and the intellectual and personal challenges she faced, constructed through careful attention to historical sources but inevitably shaped by interpretive choices, narrative conventions, and the perspective of the present moment.
The basic facts about Alice Lee are established: her birth and death dates, her education at Bedford College, her work in Karl Pearson’s Biometric Laboratory, her 1901 dissertation challenging craniometry, her £90 annual salary, the two-year obstruction of her degree approval, her contributions to ballistics calculations during World War I, her eugenic complicity, and her eventual Civil List pension in 1923. Her statistical innovations – correlation analysis, chi-squared testing, formulas for estimating cranial capacity – are accurately represented. The quotations attributed to her derive from published sources where possible, though the extended responses to contemporary questioners are entirely imagined.
However, the tone, personality, reflective depth, and particular framings of her ideas in this conversation are interpretive constructions. We cannot know precisely how Alice Lee spoke, what she thought privately about her complicity in eugenic science, or how she would respond to questions posed from the perspective of 2025. The interview attempts to remain faithful to what historical evidence suggests about her beliefs, her intellectual sophistication, and her era, whilst acknowledging the fundamental impossibility of perfect historical recreation.
Readers should approach this text as a thoughtful engagement with Alice Lee’s legacy rather than as documentary truth. It is offered in the spirit of bringing historical figures into conversation with contemporary concerns, not as a substitute for rigorous historical scholarship or primary source research.
Bob Lynn | © 2025 Vox Meditantis. All rights reserved.


Leave a comment