3268 stories

Financial systems take a holiday

1 Share

Article URL: https://www.bitsaboutmoney.com/archive/financial-systems-take-a-holiday/

Comments URL: https://news.ycombinator.com/item?id=39553801

Points: 161

# Comments: 139

Read the whole story
1 day ago
Share this story


1 Share

Article URL: https://ciechanow.ski/airfoil/

Comments URL: https://news.ycombinator.com/item?id=39526057

Points: 1525

# Comments: 159

Read the whole story
1 day ago
Share this story

This is not a good way to fight racism in America

1 Share

You might think the image above is a joke, but it’s the actual output of Google’s new AI application, Gemini. A few days ago, a bunch of people realized that Gemini — which was released on February 8 — wouldn’t draw pictures of White people, no matter what the context. Much ridiculousness ensued. People asked the app to draw the original American revolutionaries, 17th-century French scientists, Vikings, the Pope, and so on; the resulting images almost never included White people, except occasionally as part of a much larger ensemble. The ultimate facepalm-worthy moment was when Gemini decided that Nazi soldiers were Black and Asian:

But to me, the funniest was when someone asked the app to draw the founders of Google:

Some people wondered how the app’s creators had managed to train it never to draw White people, but it turned out that they had done something much simpler. They were just automatically adding text to every image prompt, specifying that people in the image should be “diverse” — which the AI interpreted as meaning “nonwhite”.

But that wasn’t the only weird thing that was going on with Gemini with regards to race. It was also trained to refuse explicit requests to draw White people, on the grounds that such images would perpetuate “harmful stereotypes” (despite demonstrably not having a problem depicting stereotypes of Native Americans and Asians). And it refused to draw a painting in the style of Norman Rockwell, on the grounds that Rockwell’s paintings presented too idealized a picture of 1940s America, and could thus “perpetuate harmful stereotypes”.

Embarrassed by the national media attention, Google employees hastily banned Gemini from drawing any pictures of people at all.

The main thing that everyone seemed to agree on was that this episode showcased the decline of Google’s prestige as a company. In two decades, the internet giant’s reputation has gone from that of a scrappy upstart, hiring the smartest nerds and shipping product after game-changing product at blinding speed, to that of a sleepy behemoth, quietly milking the profits of its gargantuan search ad monopoly and employing a vast army of highly paid entitled lifers who go home at 3 in the afternoon and view it as their corporate duty not to ship anything that works.

Obviously, that’s a huge generalization, and it’s only pockets of Google that are actually like that. But big companies with stable sources of monopoly profit do tend to become fairly predictably sclerotic — Intel being just one more recent example. The question of how to turn companies around once they go down this path is an important unsolved problem.

Gemini also provides an interesting example of Gary Becker’s theory of discrimination. Becker believed that when companies have a big profit cushion — whether from a natural monopoly, government support, or whatever — they have the latitude to indulge the personal biases of their managers. In the 1970s, that largely meant discriminating against Black and female employees. At Google in the 2020s, it means creating AI apps that refuse to draw White people in Hitler’s army. The theory predicts that only the ruthless pressure of market competition will force companies to stop discriminating. There’s actually some empirical evidence to support this. But Google’s search ad monopoly is probably so powerful that it can afford to goof around in the AI space without suffering real consequences — at least in the short term.

But beyond what it says about Google itself, the saga of Gemini also demonstrates some things about how educated professional Americans are trying to fight racism in the 2020s. I think what it shows is that there’s a hunger for quick shortcuts that ultimately turn out not to be effective.

The challenge of creating a multiracial society

Nations require norms and public goods in order to function well. We have to agree not to beat each other up, steal each other’s things, etc. We have to be OK paying taxes for a road or a school or an army that might benefit our neighbor more than it benefits us. This requires a certain psychological outlook — a lot of us have to believe, whether tacitly or explicitly, that most of our neighbors are part of our in-group.

This is inherently challenging in a multiracial society. How much of a difference it makes, though, is not clear. A number of papers have found that in America, more diverse cities tend to spend just as much or even more of their income on public goods. There is some evidence that diversity reduces certain types of social cohesion at the neighborhood level, but most measures of cohesion and trust are unaffected by diversity. Meanwhile, a consistent finding in social science is that extended, cooperative contact between different racial or ethnic groups leads to increased trust. In other words, Atticus Finch was right.

So the goal of creating a functional diverse society is achievable — it just takes a lot of work. And one especially difficult part is forging a shared sense of national identity between Americans of various races.

In a famous speech in 1852, Frederick Douglass said:

The blessings in which you, this day, rejoice, are not enjoyed in common. The rich inheritance of justice, liberty, prosperity and independence, bequeathed by your fathers, is shared by you, not by me. The sunlight that brought light and healing to you, has brought stripes and death to me. This Fourth July is yours, not mine.

Slavery still existed in 1852, of course, but the end of slavery didn’t exactly make it easy for Black people to embrace a shared national identity with White people — or vice versa. Segregation, official discrimination, and pervasive bigotry made Black people second-class citizens until the 1960s. Desegregation and civil rights were a big step toward enabling a shared national identity, but the weight of all that history of oppression lingered in people’s minds, reinforced by disparities that still existed on the ground. It’s surely easier in 2024 for Black and White Americans to think of themselves as one unified nation than it was in 1852, or in 1952. But that doesn’t mean it’s easy.

And a lesser form of the same problem applies to Americans of other races. The Chinese Exclusion Act and the Japanese Internment might not loom large in the minds of most White Americans, but they are definitely something that Asian Americans know about. Today, in 2024, can a 34-year-old Asian American man (the same age Frederick Douglass was in 1852) look up at a statue of George Washington in a New York City park and think, even in some generalized symbolic sense, that this is a statue of his predecessor? Or Alexander Hamilton? Or Teddy Roosevelt? Or FDR?

It is important to the future of our nation that he be able to do so. But it is not as easy as just reciting the Pledge of Allegiance or standing for the national anthem. It will take careful crafting of a national narrative that tells the story of why Chinese Americans are just as American as Dutch Americans or Irish Americans.

And this is where the idea of retroactive representation comes in. Normal representation — putting people of color in movies, TV, etc. — is intended to show Americans that they live in a diverse, integrated, multiracial society today — which is true. But that isn’t the same as showing Americans that their society was similarly diverse, integrated, and multiracial from the start. It was not. It has changed. And because many people feel a need to essentialize their own nation — to believe that it has been basically the same since the very start — it is in the service of our national identity in the present to make up some fantasies about our own past.

And so we have Hamilton. By casting people of color in the roles of America’s White founders, Lin-Manuel Miranda made the case that America might as well have been founded by the same races of people who live here today. Hamilton was a Scottish immigrant instead of a Puerto Rican one, but who cares? An immigrant is an immigrant, and what’s important is that they get the job done. Hamilton sent a message to every nonwhite American that it’s OK to imagine themselves as descended from America’s founders. It was a patriotic message, intended to bind diverse Americans into a sense of shared national heritage.

Ernst Renan, in his essay “What is a Nation?”, argues that intentional forgetting is an essential part of nationhood. Retroactive representation is intended to be a way of consciously, actively forgetting that America’s racial history is different than its present.

Google’s release of an AI app that forces users to see nonwhite people in place of White historical figures is, on some level, an attempt at something similar to what Hamilton tried to do. But Google’s attempt failed disastrously. Why? In my view, it was because the Google team tried to take a shortcut.

The 2010s made Americans look for shortcuts to integration

The 2010s changed America’s attitudes about race. At the start of 2013, most White and Black Americans thought race relations in their country were good; eight years later, most thought the opposite.

Source: Gallup

This was partly driven by the rise of social media, but it’s also just a cycle that America periodically goes through. In the 2010s, Americans — especially educated White Americans — gained a sense of extreme urgency about the need to eliminate racial disparities right now. That impatience created a demand for quick fixes — i.e., shortcuts.

Creating a multiracial nation is an inherently long and arduous process. This is only partly because of political opposition. Mostly, it’s that the things you have to do in order to create a widespread sense of equality and shared nationhood involve making a lot of very deep changes to society.

A prime example is the effort to advance diversity, equity, and inclusion within U.S. corporations and universities. The goal of teaching people how to respect, get along with, and work productively with a diverse set of coworkers is a laudable one. It’s the kind of thing that we don’t really know how to do yet; there’s no proven, effective method for corporate diversity training, so finding what works will inevitably involve a lot of experimentation and evidence-gathering. It’s the kind of task that requires patience, long-term commitment, open-mindedness, and empathy.

Instead, many corporations chose to outsource their DEI training to some opportunistic entrepreneurs. Robin DiAngelo and Tema Okun leveraged their fame to take advantage of the moment of urgency created by the unrest of 2020, selling their programs to companies and schools as a fix for racism. These programs often veered into the utterly ludicrous, characterizing useful work traits like hard work and punctuality as part of “white supremacy culture”. This approach probably added more racism than it subtracted. Meanwhile, there’s little evidence for any concrete benefits in the workplace, and even some diversity consultants now admit that these programs are far less effective than their creators have claimed.

In other words, corporations tried to take a shortcut to a racially inclusive workplace, and the shortcut failed.

A more harmful type of shortcut is when companies and universities actively discriminate against White employees and applicants in an attempt to correct for discrimination against people of color. Ibram Kendi, probably the leading scholar of the post-2020 antiracist movement, has explicitly advocated for this approach:

The only remedy to racist discrimination is antiracist discrimination. The only remedy to past discrimination is present discrimination. The only remedy to present discrimination is future discrimination.

This isn’t quite as crazy a proposition as it sounds. Chances are that a very large percentage of Americans engage in subtle forms of “antiracist discrimination” that most Americans would have little or no problem with. For example, any time you choose to mentor a Black employee, because you think they’re likely to come from a disadvantaged background, you’ve engaged in antiracist discrimination, because you’ve implicitly diverted your time and energy away from mentoring a White employee.

This kind of thing makes right-wingers mad, but most Americans are probably fine with it. There’s a pretty consistent pattern where Americans reject explicit and procedural racial discrimination, but accept tacit, implicit, quiet forms. A good example is affirmative action at colleges; most Americans oppose racial preferences in admissions, but most also support efforts to “increase the racial diversity of students on college campuses”:

In the years since 2014, however, and especially since 2020, the more explicit, formal, hard-edged discrimination has probably been on the rise. Lawsuits alleging anti-White discrimination have become more frequent, and courts have begun to strike down racially targeted government assistance programs. Some ex-Google employees are alleging that they have screenshots of being explicitly denied for promotion because they were White.

A lot of corporate managers, university administrators, and so on seem to have forgotten that this sort of discrimination is against the law. Or perhaps they thought White employees would simply feel that it’s unseemly to sue over discrimination. But those who documented their discrimination in emails are in for an unpleasant surprise.

Discrimination against White employees in companies and universities is another kind of shortcut. It’s an attempt to circumvent the hard work of changing attitudes and prosecuting companies for discriminating against people of color, and instead simply leap to a solution by implementing discrimination in the opposite direction.

But it won’t work. In addition to the legal obstacles, it seems likely that the companies engaging in “antiracist discrimination” started out as the least racist companies, and thus were the ones in the lowest need for intervention in the first place. There are definitely still plenty of organizations out there that discriminate against nonwhite people, but these are unlikely to be the ones who adopt anti-White discrimination in an attempt to compensate. Instead, each company or organization will simply have its own list of favored and disfavored races. This is why Kendi is wrong; racism and antiracism don’t cancel each other out like matter and antimatter.

So “antiracist discrimination” looks to some like a shortcut to a multiracial society, but it isn’t. Instead, it’s likely to have the opposite effect — pushing more White people into a bitter, defensive embrace of White racial identity in reaction to having their careers stymied. That will have a negative impact on the shared national identity that America needs in order to increase social trust and provide public goods. Academics may be able to convince themselves of a definition of the word “racism” in which institutionalized discrimination against White people can never be “racist”, but the general public has not yet been convinced of this definition, and is unlikely to ever be convinced.

History can be reimagined, but it can’t be revised

Which brings me back to Gemini. Google Senior VP Prabhakar Raghavan apologized earlier today, declaring that his team hadn’t intended Gemini to do what it did:

When we built this [image generation] feature in Gemini, we tuned it to ensure it doesn’t fall into some of the traps we’ve seen in the past with image generation technology…[B]ecause our users come from all over the world, we want it to work well for everyone. If you ask for a picture of football players, or someone walking a dog, you may want to receive a range of people. You probably don’t just want to only receive images of people of just one type of ethnicity (or any other characteristic)…

[But] our tuning to ensure that Gemini showed a range of people failed to account for cases that should clearly not show a range…This wasn’t what we intended. We did not want Gemini to refuse to create images of any particular group. And we did not want it to create inaccurate historical — or any other — images.

This is a good statement, but I don’t entirely believe it. First of all, explicitly adding a diversity requirement to every single image generation prompt does not constitute “tuning”. Second, when the issue became widespread, it appears that the Gemini team’s first reaction was simply to make its method explicit instead of hidden, by adding the word “diverse” to the chatbot’s answers:

That doesn’t look like the action of a team that’s worried about depicting “cases that should clearly not show a range”. This looks like doubling down on a strategy of injecting diversity into any and all depictions of human beings, including historical ones.

But third and most conclusively, the app itself explained why it was willing to depict a limited range of races in some contexts, but not in others:

Gemini explicitly says that the reason it depicts historical British monarchs as nonwhite is in order to “recognize the increasing diversity in present-day Britain”. It’s exactly the Hamilton strategy — try to make people more comfortable with the diversity of the present by backfilling it into our images of the past.

But where Hamilton was a smashing success, Gemini’s clumsy attempts were a P.R. disaster. Why? Because retroactive representation is an inherently tricky and delicate thing, and AI chatbots don’t have the subtlety to get it right.

Hamilton succeeded because the audience understood the subtlety of the message that was being conveyed. Everyone knows that Alexander Hamilton was not a Puerto Rican guy. They appreciate the casting choice because they understand the message it conveys. Lin-Manuel Miranda does not insult his audience’s intelligence.

Gemini is no Lin-Manuel Miranda (and neither are its creators). The app’s insistence on shoehorning diversity into depictions of the British monarchy is arrogant and didactic. Where Hamilton challenges the viewer to imagine America’s founders as Latino, Black, and Asian, Gemini commands the user to forget that British monarchs were White. One invites you to suspend disbelief, while the other orders you to accept a lie.

I believe that we need to modify the basic story we tell about America, in order to help Americans of all races embrace the country’s new diversity and forge a more unified national identity. That is a tricky and subtle task, and I expect it to take a long time. It’s tempting to believe we can take a shortcut, by simply commanding AI algorithms to remove White people from history. But like most shortcuts to an integrated multiracial society, this one is doomed to failure.

Subscribe now


Read the whole story
5 days ago
Share this story

Microsoft Is Spying on Users of Its AI Tools

1 Share

Microsoft announced that it caught Chinese, Russian, and Iranian hackers using its AI tools—presumably coding tools—to improve their hacking abilities.

From their report:

In collaboration with OpenAI, we are sharing threat intelligence showing detected state affiliated adversaries—tracked as Forest Blizzard, Emerald Sleet, Crimson Sandstorm, Charcoal Typhoon, and Salmon Typhoon—using LLMs to augment cyberoperations.

The only way Microsoft or OpenAI would know this would be to spy on chatbot sessions. I’m sure the terms of service—if I bothered to read them—gives them that permission. And of course it’s no surprise that Microsoft and OpenAI (and, presumably, everyone else) are spying on our usage of AI, but this confirms it...

Read the whole story
11 days ago
Share this story

Anna (23) gaat op zoek naar zichzelf in Thailand en heeft beet bij vijfde tempel

1 Share

De 23-jarige Anna uit Zutphen besloot na haar bachelor Communicatie na lang wikken en wegen dat het tijd was om op zoek te gaan naar zichzelf. Na ruim twee weken backpacken in Thailand, waarin ze moest slapen met andere westerlingen in vieze hostels en verschillende Full Moon-party’s bezocht, had ze beet. Bij de vijfde tempel vond ze dan eindelijk zichzelf.

“Ik wist in Nederland niet meer wie ik was. Op de studie word je op een gegeven moment echt een nummer. Dan verlies je jezelf”, vertelt Anna. “Daarom besloot ik een vliegticket van ruim 800 euro te kopen en in een ander land echt anders te gaan leven. Geen boterhammen meer. Ik at ineens rijst, zoende met drie Spaanse jongens en ging slapen met andere mensen op één kamer. Op weg terug naar mezelf.

”Inmiddels is Anna bij terugkomst een neusring en tatoeage rijker: “Ik heb de coördinaten van Koh Samui laten vereeuwigen op mijn bovenarm.”

Wegens gebrek aan succes verlengd! Word nu Vage Kennis om betrouwbaar nieuws nog betrouwbaarderder te maken in deze tijden. Mocht dat je verder niet zoveel uitmaken, maar wil je vooral een gratis boek? Dan krijg je deze maand het jaaroverzicht ‘2023: Toch een Jaar’ erbij als je Vage Kennis wordt. Ga naar speld.nl/abonnementen.


Read the whole story
13 days ago
Share this story

We’re entering a golden age of engineering biology

1 Share

As you all know, I am a techno-optimist. The current decade truly seems like a time of marvels, with major simultaneous advances in energy technology, AI, biotech, space, and a number of other fields. But whereas I’m pretty well-equipped to understand some of those technologies, the biotech industry often feels like a completely separate world. Thus, I rely more on experts in the field to tell me what’s going on there.

Two such experts are Joshua March and Kasia Gora, founders of SCiFi Foods. Although their company does cultivated meat — which Josh wrote a Noahpinion guest post about in 2022 — they’re pretty knowledgeable about the state of the biotech industry in general. So when I was looking around for someone to write a guest post explaining why the 2020s are an exciting decade for biotech, they were a natural choice. I thought this post did a great job of summing up the importance of various disparate advances with one core concept: the transformation of biology from a scientific discipline to a field of engineering.

Financial disclosure: I have no financial interest of any kind in SCiFi Foods, and no plans to initiate one. But this post does discuss the promise of lab automation, so I should mention that I do have an investment in a company called Spaero Bio, which does lab automation.

“Where do I think the next amazing revolution is going to come? … There’s no question that digital biology is going to be it. For the very first time in our history, in human history, biology has the opportunity to be engineering, not science.” 

—Jensen Huang, CEO, NVIDIA

The field of biology has driven remarkable advancements in medicine, agriculture, and industry over the last half-century, despite facing a significant hurdle: The immense complexity of biological systems makes them incredibly difficult to predict. This lack of predictability means that any innovation in biology requires many expensive trial-and-error experiments, inflating costs and slowing down progress in a wide range of applications, from drug discovery to biomanufacturing. But we are now at a critical inflection point in our ability to predict and engineer complex biological systems—transforming biology from a wet and messy science into an engineering discipline. This is being driven by the convergence of three major innovations: advancements in deep learning, significant cost reductions for collecting biological data through lab automation, and the precision editing of DNA with CRISPR. 

Although we can trace the history of biology from ancient to modern times, until very recently, people have had remarkably little understanding of the mechanistic basis of life. It wasn’t until the late 18th and early 19th centuries that scientists began to understand the nature of inheritance, and it wasn’t until 1943 that we discovered DNA was the heritable material. It took another decade for James Watson, Francis Crick, and Rosalind Franklin to work out the structure of DNA and how it encodes information. Once biologists understood that DNA codes for mRNA, which codes for proteins, it suddenly became possible to start manipulating DNA for our purposes!

The biologist Herb Boyer was at the cutting edge of this field, and in 1976 he and venture capitalist Robert Swanson founded Genentech, the world’s first biotechnology company. Genentech set out to produce human growth hormone (HGH) “recombinantly” in bacteria, replacing the expensive procedure of extracting HGH from human cadavers, which had been responsible for at least one disease outbreak. The rise of Genentech met a critical medical need for safe HGH and spawned the biopharmaceutical industry, changing the trajectory of human health and medicine forever. 

Today, biotech is a trillion-dollar industry, built on a foundation of 1970s recombinant DNA technology. It has been responsible for many huge wins, including the rapid development of novel mRNA vaccines to fight the COVID-19 pandemic, gene therapies against cancer, blockbuster weight loss drugs like Ozempic (already prescribed to a whopping 1.7% of the US population), and an ever-expanding pharmacopeia of drugs. And while human health applications garner the most attention, biotechnology also plays an increasingly significant role in agriculture and industrial production.

One of the most striking examples of the speed of progress in biology has been the exponentially decreasing cost of DNA sequencing. The Human Genome Project started in 1990 and took 13 years to sequence the genome of a single human, considered a wildly ambitious venture at the time, at a cost of billions of dollars. It’s worth noting this feat was also accomplished using 1970s technology, Sanger sequencing, though greatly improved through automation. By the 2000s, the aptly named next-generation sequencing (NGS) dramatically accelerated the rate of DNA sequencing while dropping the cost significantly. From the billions of the Human Genome Project to ~$1m in the mid-2000s, we can now sequence an entire human genome for ~$600, with the cost soon expected to go below $200! Because of this decreasing cost, genome sequencing is becoming much more common—just last year, the UK Biobank, one of the world's largest databases, released the complete genome sequences of half a million individuals. We’re just starting to scratch the surface of the insights this data will make possible. 

The vast complexity of biology 

Although we have made rapid progress in our ability to read DNA cheaply and quickly, we are still far from a comprehensive understanding of biology. For example, 20 years after the publication of the first human genome, we still don’t understand the molecular, cellular, or phenotypic function of many of our 20,000 genes, much less the complex interactions between these genes and the environment. What will happen if a particular gene mutates? What’s the impact on the cell, or the overall organism? And how can we apply all this genetic data to diagnose, treat, or prevent complex diseases like cancer, depression, or diabetes? These answers are hard to come by because biology is staggeringly complex, and this complexity is the characteristic feature of life. 

Take, for instance, the complexity of a human being. Each of us is comprised of trillions of cells, each operating more or less independently. Each of those cells has its copy of the genome, encoding those 20,000 genes, about 300,000 mRNA molecules, and about 40 million proteins floating around, interacting and doing various things. It’s little wonder that we don’t have models that are predictive across the different biological modalities (from DNA > RNA > protein > trait). Unlike an engineer designing a bridge, who can apply physical laws and some basic software models to predict whether a design will work with near certainty, biologists have no choice but to do a lot of expensive and time-consuming laboratory experiments.  For example, a researcher looking for the next cancer drug has to test various natural compounds against a cancer cell line, which involves being hunched over in a biosafety cabinet for 5 hours a day for months, comparing the growth of the cell lines with and without the presence of a compound. And even then, what they discover is context-dependent, and may not work on another cancer cell line, much less an actual patient!

The impact of these laborious efforts is apparent in drug development. Because of the complexity of biology and our lack of predictive power, developing new drugs is an incredibly long and expensive endeavor—about $2-3B per drug. Historically, it has been impossible to predict the efficacy of a drug molecule in a system as complex as the human body, so traditional drug development entails screening thousands of molecules in a scaled-up and automated version of that cancer cell experiment. Any drug that looks effective in screening gets tested in animals, and ultimately humans, in long and expensive clinical trials with meager success rates (10-15%) and no guarantee of treating disease and improving patient outcomes. Effective drugs often fall out of the pipeline because of unexpected toxicity, which is very difficult to predict until the molecules are tested in large numbers of humans. 

We’re at an inflection point in our ability to engineer biology 

Several critical developments are now coming together to accelerate the progress of biotechnology exponentially, shifting us to a future where reliable predictive models enable us to engineer complex biological systems quickly and easily instead of relying on today’s brute force strategy of expensive wet lab experimentation. Three fundamental shifts are enabling this: First, advances in AI are making truly predictive models for biology possible; second, the rapidly decreasing cost of running biological experiments to generate data for those models, driven by innovations in lab automation and robotics; and third, our ability to quickly engineer animal and plant cells through technologies like CRISPR.

Deep Learning is now enabling truly predictive models of complex biological systems

Every Noahpinion reader will be familiar with ChatGPT as a blockbuster example of how AI is revolutionizing our world. ChatGPT is a type of deep learning model called a large language model (LLM) that can generate human-like text based on any written prompt. It is a foundation model that is pre-trained and versatile right out of the box and can be applied to diverse tasks like answering questions, summarizing documents, and—our favorite application—writing boring business emails. The evolution of LLMs has been driven primarily by two significant advancements: First, the introduction of transformer technology in 2017 enabled LLMs to process and understand the context of words within large blocks of text, a substantial improvement over earlier models that struggled to capture long-range dependencies; Second, the ability to train these models on large-scale data sets (about 1% of the internet) became possible because of advancements in GPU technology that made this feasible, albeit still very expensive. The resulting model, ChatGPT, seems nothing short of magic, with some users confusing it with general intelligence—but rest assured, it doesn’t know how to do math, physics, or biology.

We have, however, seen success in applying similar approaches to biology. For example, in 2021, Google’s DeepMind released their protein-folding prediction model AlphaFold2, a deep learning model that, like LLMs, is based on transformer architecture. AlphaFold2 can predict the three-dimensional structure of a protein from the RNA sequence encoding it to within 1.5 angstroms of accuracy—on par with high-quality crystallography. While a crystal structure is a static representation of a protein that shifts dynamically between configurations in the cell, it is nonetheless a helpful representation, and AlphaFold predicts this static structure dramatically better than any other protein folding model. This was only possible because of the combination of transformer architecture in a deep learning model trained on a large, expensive data set of 29,000 crystal structures that had been painstakingly collected over decades. This is excellent news for anyone needing information about protein structure, but bad news for the generation of grad students who spent their entire Ph.D. working out the crystal structure of a single protein! 

Another excellent example of the predictive power of deep learning in biology comes from MIT, where a team recently used a deep learning model to discover novel antibiotics effective against the superbug MRSA. Their model was trained on an experimental data set of 39,000 compounds, which then allowed the team to computationally screen 12 million compounds and predict which ones were likely to have antimicrobial activity against MRSA. The team took advantage of other AI models to predict the toxicity of the compounds on various human cell types, narrowing down the list to 280 compounds, 2 of which were good candidates for new antibiotics. This is huge—the development of new antibiotics has essentially stalled in recent decades, while antibiotic resistance is one of the most significant threats to human health today. And these are just two of many examples of how AI is now being used to develop truly predictive models in biology.

Lab automation is enabling us to speed up biological data collection to build predictive models

While advances in deep learning are making predictive models possible, these models are only as good as the data they are trained on. And this is where building AI models for biology gets much more complex than building a foundational model for language. While OpenAI could train ChatGPT on a fraction of the internet, one can easily argue that we already have much more biological sequence data. Still, sequence data alone is insufficient because models need outputs like crystal structures or antibiotic efficacy to train on. And while DNA sequencing is now reasonably cheap, most other output data is still outrageously expensive to collect because of the difficulty of conducting biology experiments: you need a highly trained scientist, and costly equipment and consumables, to generate only a modicum of data (hence our joke about a PhD student spending their entire graduate career solving a single protein structure). Luckily, we are also at another inflection point in biology: automation now enables us to run biology experiments with much higher throughput and significantly less manual labor. 

We have already moved beyond the early stages of lab automation, where liquid-handling robots outproduced bench scientists by pipetting 96 samples instead of one at a time, toward droplet microfluidics, allowing the scale-down of those reactions to a single drop, increasing throughput to millions of samples from thousands. Today, supported by significant advancements in computer vision, it’s possible to extract massive amounts of data from high-resolution microscopic images of cells, adding another dimension to the available data sets. It’s also possible to use computer vision to train laboratory robots to automate almost anything a scientist can do in the lab, including cell culture. We are also developing new “omics” technologies to measure virtually all the molecular components of the cell, not just DNA. This data collection is further catalyzed by the evolution of laboratory information management systems, versions of which are readily commercially available to capture any data modality a scientist can dream up (and subsequently use to develop the next blockbuster model!). The combination of all of these improvements in lab automation means that we are rapidly increasing the amount (and types) of biological data that can be collected, opening the door for more predictive and accurate models. 

We now have the ability to easily edit the DNA of plants and animals, not just simple organisms

In the 1970s, Genentech introduced a single gene into a bacteria to produce HGH recombinantly, and since then, we’ve mastered several relatively simple microbes for use in industrial processes. Today, many essential agricultural products, including amino acids like methionine, lysine, and tryptophan, are made with highly edited microbes in large-scale industrial processes. 

The reason we’ve made so much progress in biomanufacturing in microbes is that they are easier to genetically engineer—a scientist can zap in a fragment of DNA into yeast, for example, and it will incorporate that DNA into its genome. This doesn’t work, however, with animal or plant cells. While 90’s technologies such as zinc-finger nucleases and TALENS made it technically possible to edit the DNA of plants and animals, they were complicated and expensive to use, and none of these technologies came close to the potential of CRISPR Cas9. Cas9 is a protein that enables the precise cutting of DNA at a specific location, revolutionizing our ability to gene edit just about anything. It makes advanced genome editing of animal (including human) cells as easy as cloning microbes—and way more straightforward than the early Genetech HGH experiments. 

Although the methods to use CRISPR as a gene editing tool were first described in 2012, it takes time for discoveries in fundamental science to translate into industrial applications, and now we are at that point! Last December, the FDA approved the first CRISPR-based human gene editing therapy to cure sickle cell anemia, and there are almost 100 CRISPR clinical trials in the pipeline. While Cas9 was revolutionary, many new CRISPR systems have since been discovered, as well as modifications of the original technology that all contribute to increasing the precision and expanding the number of applications. The technology works on all plant and animal cells, enabling the engineering of everything from crops and livestock to cultivated meat and designer dogs. Our emerging ability to build predictive models of biological systems, and then to change that biology at the genetic level with great precision, is a huge inflection point for humanity. 

Biology as an engineering discipline

A foundation model in biology would leverage the unique capabilities of deep learning (and future AI technologies) to efficiently process and model the complex nature of molecular systems. It would integrate massive amounts of data—spanning DNA sequences, RNA levels, protein expression, and environmental factors—to predict all the characteristics of complex systems such as humans or animals. And it would allow us to predict with precision what kind of drugs could best treat a disease or what genetic modifications could reasonably cure it.  

Ultimately, we don't know how much data—or what kind of data—is needed to build a foundation model in biology. Are the current datasets like Biobank UK enough, with 500,000 genomes linked with health information and biomarkers? Or do we need to include many more modalities to make it worthwhile? For years, the assumption in AI was that we would need to develop a lot more sophisticated AI models for them to become useful. It turned out that a relatively simple shift to transformer architecture, combined with a lot of data from the internet, was enough to create ChatGPT. The same could be true for biology, which means the right combination of model and data set could be imminent. But regardless of whether it’s tomorrow or years out, it is clear that the combination of AI technology and advancements in automation with easy genetic editing means that we are already at the point where we can engineer biology to an extent never before possible. 

For hundreds of years, physicists and chemists have been able to translate a mechanistic understanding of science into real-world use cases, while the immense complexity of biology has made it intensely challenging to translate fundamental scientific discoveries into real-world applications. But now, we stand at the precipice of not just understanding but truly mastering the intricacies of biology. This mastery promises to revolutionize how we approach human health, sustainability, agriculture, and industry, transforming the once elusive realms of biological science into powerful tools that will redefine our capabilities and expand the horizons of human potential.

Subscribe now


Read the whole story
13 days ago
Share this story
Next Page of Stories