AI the Rabbi?

Musings on the 'hot topic' of the day

Jan 10, 2025

[NOTE: I wrote a version of this essay many months ago, in May of 2023, but as of December 2023 my predictions seem less likely to be the case. I am still going to post it here, because (1) many people still seem to be talking about this (2) I think the topic is interesting even without the AI-connection (3) it keeps coming up in the first 20 pages of Maseches Sanhedrin, which is the current Daf Yomi study (4) I started this Substack recently and feel the pressure to actually put things on here, even if I have nothing interesting to say. I think the whole point of a Substack is that I can publish things as semi-articles, which are not so polished and I’m not so confident in. But I’d love to discuss these ideas in the comments.]

A futuristic robot rabbi with a sleek metallic body, wearing a large traditional black rabbinic hat and no tallit. The robot's face is a blank white screen emitting a soft glow. It is holding an open Gemara (Talmud) book, with Hebrew text visible, in a gesture of teaching. The setting is a modern, minimalistic study room with a few bookshelves and warm lighting.

Over the past few years, many people much greater and smarter than I have written and spoken about the use of AI for Torah learning and teaching.1 Here, I’d like to use this very modern question as a lens to reflect upon a very old question, which is the issue of how to balance creativity and fidelity to authoritative sources within a Torah context.2

Training Yeshiva Students and Artificial Intelligence Models

When I was in yeshiva, my teacher (who studied in Chaim Berlin for ten years) once told us that when he was a student learning under Rav Yitzhak Hutner, he was present when someone responded to Rav Hutner’s question on a Gemara or Rishon by citing an answer provided by Rav Elchanan Wasserman.3 Rav Hutner reacted by glaring at the boy and telling him sternly (in Yiddish), "The mind is like a closet. It has finite space. If you fill it up with other people's thoughts, you won't have any room for your own."

As a matter of psychological fact, Rav Hutner was probably incorrect; the mind of a neurologically healthy person stretches to accommodate more information (and thoughts) as necessary. But the basic idea is an intuitive one: the more a person is beholden to a pre-existing knowledge base, the less creative they will be. The corollary is that the more a person wants to develop as an ‘independent thinker’, the more they should remain ignorant of what other people have said. In my essay on Seforim (published here on my website)4 I quote a few rabbinic sources to this effect, such as the advice of R. Yitzhak Canpanton (d. 1463, Spain), who writes that when learning Talmud, the reader must first attempt to understand a given passage on his own before checking the commentaries of Rashi and the like.

In his published discourses, Rav Hutner writes eloquently about the need for creativity in generating genuinely new Torah.5 Rav Hutner and others before him (e.g. the author of the Beis Halevi) have even taken this a step further, arguing that the loss of authentic Torah traditions may itself be a boon for Torah study because it requires engaging with the sources afresh, partnering with God to generate new avenues of learning. R. Naftali Zvi Yehudah Berlin, another great advocate (and exemplar) of Torah creativity, has a powerful essay meditating upon this theme from a historical lens, demonstrating how it was precisely the ‘darkness’ of successive exiles and the increasing distance from the era of prophetic insight that allowed for a new type of Torah reasoning to flourish.6 Jorge Luis Borges dramatizes this same idea in the short story 'Funes the Memorious,' about a man who gains the ability to remember everything with perfect clarity. Rather than becoming a genius, Funes becomes overwhelmed by details, unable to abstract or truly think. 'To think, is to forget differences, generalize, make abstractions.' Funes, with his perfect memory, loses the ability to generate original thought because he is so overburdened by knowledge; forgetting has its value.

In reality, it is hard to tell if increasing one’s knowledge does or does not come at the cost of creativity. It may not be fair to assess this question just by looking at the biggest giants of Jewish tradition, who are, by definition, unrepresentative, but some of our most ingenious rabbis were also among the most knowledgeable. Probably the most innovative Talmudic mind of the Andalusian tradition was R. Yehosef ibn Megas, and yet he wrote a responsum (which I discuss here) saying that a community should prioritize appointing a rabbi who has no capacity for independent study and instead slavishly follows the rulings of the earlier authorities. Ramban, the most brilliant commentator of Christian Spain, drew upon more of the earlier Talmudic/halakhic literature than nearly any of his predecessors and devoted a large proportion of his own works to explain the positions of earlier authorities. Yet, his writings prove that this vast library of information just served as more grist for his astonishingly fertile and creative intellectual mill. To bring back R. Yitzhak Canpanton, it is true that he advises coming up with your own solution to a Talmudic question first before looking at the commentaries, but then, you are supposed to actually read the commentaries.

Interestingly, R. Yitzhak Canpanton’s model of how to train oneself to think like the classic commentators - come up with your own thoughts, check them against traditional works, assess how closely your ideas align with theirs, and adjust accordingly - is remarkably similar to how Artificial Intelligences (speaking here in particular about LLMs)7 may be “aligned” to human preferences through Reinforcement Learning from Human Feedback (RLHF), which is, as one author recently explained, this is:

where you make the computer say something and then you either pat it on the head and go “good computer!!” or you hit it with a stick and go “bad computer!!”

One may have thought, especially since the LLMs have access to more training data than any single human in history, that they would eventually perfect this ‘alignment’ and be able to reason exactly like the earlier commentators, leaving no room for creativity. In reality, we find the opposite to be the case.

The Story of the Lying LLMs

A recent Substack article showed that ChatGPT is excellent at generating well-reasoned “lomdus,” but with little to no relationship to the actual sources. Many former or current yeshiva students in particular may be astonished by the LLM’s reasoning powers, at its “lomdus,” which in so many contexts is the benchmark for the highest levels of Torah learning. After all, it is a very basic skill to just read a source and regurgitate it; “lomdus” is what takes real brains. Yes, the LLMs invent Talmudic sources, misquote nearly everything that they cite, and make up all kinds of false facts, but we may expect this to improve soon. Yeshiva students, tremble at this mighty power that can out-think any of your classmates!

I think that this is totally wrong, and to explain why, we should investigate why LLMs confabulate (make stuff up) in the first place.8 [But reader beware: I myself barely know what I’m talking about and have only vague and shallow understandings of any of this.] Broadly speaking, I think that there are two reasons for this, and both are important. One has to do with what, exactly, these AIs were trained to do, and the deeper reason is related to how these LLMs are constructed in the first place.

When a student learning Gemara checks their interpretation against Rashi or Tosafos, they're comparing their ideas against a fixed text. But when an LLM is "aligned" through RLHF, it's being trained to generate outputs that humans will judge as good responses - essentially, to be more convincing rather than more accurate. If the LLM ‘thinks’ you want a particular outcome, it will construct just such an outcome, and prioritizes that goal over fidelity to its own sources. Even if you explicitly give the LLM the text of the Mishnah Berurah and ask it a halakhic question, saying, “only respond by citing the relevant passage from this text of the Mishnah Berurah that I have provided you,” unless your “question” is just the first half of a sentence that actually exists in the Mishnah Berurah, the LLM is more likely to fabricate a quotation, because that fabrication will be phrased so that it directly answers your question. That is what you wanted, isn’t it <evil smile>?

Surely, you may think, this is a bug, but a fixable one; newer and better models will be trained to not only answer what the user deems helpful but also to adhere more strictly to the texts that are present in the training dataset, to be more “true,” more faithful to the real world. This is possible, but it is not necessarily a given, for precisely the reason why Rav Hutner reprimanded his student who quoted Rav Elchanan Wasserman: when it comes to LLMs, there truly is a tradeoff between generative capability and what we might call ‘honesty.’

Let me share how I heard the “deep learning” origin myth, which goes something like this:9

In the primordial darkness, computer scientists sought to create artificial intelligence using sets of symbols and rules. "If you see X, then conclude Y," they would chant in endless iterations, carving countless lines of logic of increasing complexity into the minds of their machines. But no matter how many rules they wrote, their work bore little fruit; the reality of the world and human knowledge of it is simply too complex, too messy to be captured by even the most intricately detailed set of rules. A small few of these workers were attempting a different approach; they thought to create something in their own, more human image. Human children, they reasoned, do not learn the grammar of their first language by memorizing a set of arcane strictures of syntax; they observe, they listen, they create abstract patterns and then somehow learn to speak via a process that is still mostly mysterious. Despite the concept’s fuzziness and essential obscurity, when fed with vast amounts of data, these networks began to recognize relationships and make connections that their creators had never explicitly programmed. The greatest leap came with the magic of the Transformer10 which could attend to patterns across vast distances of text, weaving together meanings in ways that seemed entirely miraculous. “The magic of deep learning is that it just works.”11 These brilliant creatures were released upon the world, but it quickly became clear that like all powerful magics, deep learning came with a price: the more these systems could spin new patterns from their training, the more likely they would be to espouse elaborate fictions.

The very mechanism that makes an LLM creative - its ability to generate novel combinations of ideas based on abstracting patterns from its training set - is inseparable from its tendency to confabulate. Because these models are not following explicitly programmed rules and are instead relying on their ability to make novel connections - essentially, by being creative with their training data - they cannot be “constrained” as such without compromising on their intelligence. Not only is this not a “bug” that will be fixed with more training, but there is good reason to believe that this phenomenon will actually get worse as models become more powerful. Increasingly, new models and updates are becoming more “intelligent” not by using more training data, which is finite (after all, there are only so many books and texts out there), but by better “reasoning,” that is, by improvements in its extrapolation abilities. Modern architectural improvements focus on making models better at inference and reasoning with less data, making them more likely to confabulate, not less.12

To be clear, I don’t think that this is necessarily going to be the case; I think that this is a likely but not definite scenario. After all, nearly all of the LLMs currently available to the public can be tuned to different ‘temperatures so as to make them more creative vs. more reliable. Models that are able to search the internet will be more reliable (in this sense) because they will use the extra step of internet search to build off more specifically relevant texts instead of generating its own. Returning to the question of Reinforcement Learning, new models could conceivably be put through more rigorous training and be rewarded for adhering more closely to the texts of the training data,13 and in certain cases, other workarounds have been shown to be effective.14 In fact, as I was working on this essay, reports came out of OpenAI’s o3 model which blew past the many benchmarks for both creativity and reliability. There is also the possibility that new paradigms for AI models will become available, which work based on entirely different principles.15 So far, however, I feel justified in expecting the LLMs to be more useful when provided with more constraints. Currently, an out of the box LLM is going to be much better at generating perfectly plausible-sounding Talmudic analyses while completely fabricating sources and quotations.

The issue of outright making up citations would not have been addressed directly by the medieval or modern commentators; it is axiomatic that forging rabbinic sources would be so completely beyond the pale of acceptability that this could be taken for granted.16 Instead, there are a dozen jokes and cautionary stories told among rabbinic students meant as a caution against veering too far away from the source material while learning Torah. Did you hear the one about the two study partners who opened the first page of Bava Kamma, which begins, “ארבע אבות, there are four fathers…” and proceeded to spend their three hour learning period debating who the fourth “forefather” would be, after Avraham, Yitzhak, and Yaakov? (The joke is that the “four fathers” of the Mishnah refers not to the biblical progenitors of the Jewish people, but to the four conceptual categories of tort law, which the students would have realized if they had bothered to read the next word.) Or how about the chassidishe rebbe who began his discourse, “why does “Lech Lecha” in the first verse of the parasha begin with two large letter “het”s? A chassid responded, “but rebbe, the Torah actually does not spell those words with a het at all!” The rebbe replied, “ah, that’s one good answer, but I have another one…”

Sinai, Storehouses, and Sophistry

If it cannot even quote a source correctly, what then, is AI good for? These models may be helpful in interpreting a very specific text, editing a written devar Torah for greater clarity, or finding a connection between two specific ideas for the purpose of giving a shul ‘derashah’. But can it be used to support halachic reasoning and logical argumentation based on Torah sources (whether using “Brisker lomdus” or other methods), when it is not asked to “find” any new sources, which it is liable to invent? Can it apply old sources to new scenarios and new questions? And if it can, is such a tool even useful?

Already in the Talmud we find the question of the relative importance of studying the texts vs. being able to creatively interpret them, which it colorfully refers to as a tension between “Sinai” (the knowledge base), and “oker harim,” the ability to “uproot mountains,” i.e. analytical skill (Berachos 64a and Horayos 14a). In those instances, the Gemara concludes that the rabbi who is a “mountain” or a “storehouse” (the one who knows more) takes priority over the mountain-raiser (cf. Shabbos 63b, Zevachim 96b, Gittin 74b).

Every few hundred years, some commentaries note that the nature of Torah study has changed sufficiently such that this is no longer the case, while other rabbis maintain that the Talmudic preference for gaining knowledge over skills is still in effect. Just twenty years ago, for example, there were plenty of articles published on whether or not we should reconsider these educational priorities now that computer databases allow searching through the entire bookshelf of halachic literature in seconds.17 One of the greatest of these “storehouses” from the past century, Rabbi Ovadiah Yosef, defends his focus on amassing knowledge by quoting from the 14th century authority R. Yitzhak b. Sheishes, or the “Rivash”:

And we have seen with our own eyes many scholars who were sharp in dialectics and brilliant in discussions, who could 'push an elephant through the eye of a needle' [i.e., make extremely subtle distinctions], and on every minor point would pile mountains of questions and answers. Yet for all their sharpness, they didn't arrive at the correct legal conclusion, declaring what is forbidden to be permitted and what is permitted to be forbidden. Already in the first chapter of Eruvin (13b), they said: 'It was revealed and known before the One who spoke and the world came into being that there was none in Rabbi Meir's generation like him, yet why wasn't the law established according to his view? Because his colleagues couldn't fully grasp his reasoning, as he would declare the impure pure and show reasons for it, and declare the pure impure and show reasons for it.' Similarly, the opinion of Beit Shammai when it contradicts Beit Hillel is not considered authoritative, even though Beit Shammai was sharper in analysis, as we say in the first chapter of Yevamot (15b). And it has already been decided at the end of Horayot that “Sinai” is preferable to an “uprooter of mountains.”

An intelligence that is capable of generating text that reasons its way towards a particular conclusion, without being able to ‘know’ the accuracy of its source material, is dangerous for precisely this reason; it will permit the prohibited and purify the impure. As discussed above, current models of AI are trained to be helpful, to provide the user with answers to their questions. If the LLM provides a certain answer to a question using plausible-enough reasoning, usually all a user needs to do is ask “Are you sure?”, and the LLM will generate text demonstrating why the exact opposite answer is also plausible. This is remarkably similar to how the Gemara in Eruvin (cited by Rivash, above) speaks about Rebbi Meir, and why the other rabbis felt that they could not rely on his rulings.

Aside from the Talmudic model of “Sinai” vs. “uprooter of mountains,” another potentially fruitful rabbinic source on this question would be an examination of the qualifications needed to render halachic judgements, where the Gemara indicates that a student must be both savir and gamir, both “learned” and “analytical” (Sanhedrin 5b, Horayos 2b), which seems like another way of saying that to some extent, the rabbi must be both a mountain and know how to uproot it. Even more germane to the present discussion, however, may be what the Gemara says a few pages later in the name of R. Yehudah quoting from Rav: “Someone may only be seated on the Sanhedrin if he knows how to render [the carcass of] a rodent [“sheretz,” creeping animal, the quintessential impure object] as pure by Torah law.” In other words, it seems like the Gemara is not just condoning, but requiring the precise skills that were so vociferously rejected by the Rivash!

Many of the commentaries on that Gemara, interestingly, seem to share the instinct that the rabbinic sages could not possibly be praising the skill of sophistry.18 To quote Tosafos, “why would we be interested in vacuous reasoning skills, to purify a sheretz which the Torah has made impure”? Such argumentation is not actually an asset in arriving at the truths of Torah. Even those who do take this Gemara at its face value nevertheless temper its seeming embrace of a lawyer’s ability to argue for whatever position he is paid to support.

Again, this is especially relevant when it comes to LLMs that are trained to generate plausible sound text, not to get at “the truth,” and certainly not the halachic truth. To echo Karl Popper regarding the philosophy of science, if an experiment will result in the same outcome whether a particular hypothesis is true or false, it is not actually a scientific experiment. If you ask ChatGPT to explain to you why using an electrical appliance is prohibited on Shabbos, it will do so. If you ask it to explain to you why using an electrical appliance is permitted on Shabbos, it will also do so. Like the scholar who could "push an elephant through the eye of a needle" but couldn't arrive at correct legal conclusions, LLMs can generate impressively sophisticated arguments while remaining fundamentally unmoored from truth. Ultimately, (as ChatGPT is likely to “tell” you anyways), the human user needs to judge which side is more decisive.

In fact, all of this assumes that the LLM’s “reasoning” is even logically sound. The truth is, the text generated by an LLM more often has the illusion of being well-thought out, because it is coherent and conforms to the conventions of logical speech, but often just a few extra seconds of thinking through the response will reveal that it is totally meaningless. Imagine having a study partner (charvrusa) who knows how to sound highly articulate, is also a bit daft sometimes, and has a tendency to make things up. For now, it can still be said about AI models that we maxed out intelligence but minimized wisdom. Returning to the earlier statement of the Gemara, this is precisely why any judge must be both gamir and savir: even with all the analysis in the world, the only way to actually determine the halakha is to be both knowledgeable and capable of reasoning. As one of the great geniuses of the Yeshiva world put it, commenting on the Gemara in Kiddushin (10b):

“Yehudah ben Beseira said to Yohanan ben Bag Bag, [I am shocked to hear] that you are an expert in all areas of Torah but you do not know how to teach halakhot based on a kal va-homer [logical inference]?” Really, what should a kal va-homer have to do with expertise in all areas of Torah? But my teacher and master Rabbi Chaim Halevi [Soloveitchik] zt”l, that [although it may be true that] in order to derive a kal va-homer there is no need for expertise [“bekius”], but to know that the kal va-homer is actually true and correct, and that there is no question as to its veracity, one must be an expert in all areas of Torah, because it is possible that a person makes a kal va-homer in the laws of Shabbat and there is a question on it from the laws of impurity or the like19

Even if better models and better training help alleviate this problem in the future, the fundamental flaw will remain: LLMs are pattern-matching engines that can mimic the form of Torah analysis without possessing the essential combination of deep textual knowledge and analytical skill that the Talmud requires. They have the potential to be enormously helpful, but the user must keep in mind that the AI should be thought of more as a (sometimes unreliable, but articulate) chavrusa than a rabbi.

See “A Chavrusa in Latent Space” - by Yitz, this collection of articles in “YUTorah to Go,” especially the one by Rabbi Netanel Weiderblank, “AI and Jewish Law” by Michael J Broyde, the articles by Ezra Brand at his Substack, and this article by Moshe Koppel and Avi Shmidman. Josh Waxman, who has an article in the linked “YUTorah To Go” also has been developing these ideas in his substack, linked here.

We should at least devote a footnote to acknowledging the fifty-ton elephant-gorilla-monster in the room, which is the fact that increasingly powerful AIs have a nonzero chance of completely destabilizing Western economies, polluting our already fragile media and political ecosystems to the point where they implode, and by *mumble mumble mumble* end up killing us all. But as of today, even if it sometimes makes scary noises (such as those documented here), the elephant-gorilla-monster is just peering at us benignly from its cage, so we’ll just nod to it and move on.

At least, I’m about 90% confident that the ‘quoted rabbi’ in question was R. Elchanan Wasserman; presumably Rav Hutner would have responded similarly no matter which 20th century rabbi it would have been.

Besides for being self-recommending, as Tyler Cowen would say, that essay is suggested as background reading for the context of this current article, because many sources quoted there are relevant here as well but I prefer not to reproduce them all.

Pahad Yitzhak: Chanukah 4; cf. Chanukah 9, Shabbat 1

Most extensively in Kidmas ha-Emek, published as the introduction(s) to his commentary on R. Ahai Gaon’s Sheiltos.

Specifically, this essay is about the widely available AI chatbot models, which include ChatGPT, Claude, Gemini, LLaMA, Mistral and others.

In both the popular discourse and even in much of the academic literature, the tendency of LLMs to confidently state false information is referred to as “hallucinating.” I strongly object to this term, because it implies a particular mental state that is inapplicable to AIs (or at least, proving that AIs do have such mental states has this far eluded researchers). Instead, I prefer to use a term from Michael Gazzaninga’s neuropsychology research, “confabulate.” Half a minute of internet search revealed that at least some neuroscientists (see link) share my frustrations with this terminology.

Here’s one article on this story; I haven’t bothered trying to find a better one right now but the old “connectivist” vs. “symbolic” discussion is detailed in this academic paper from 2008, which in AI terms might as well be the iron age.

See link for actual paper; A more easily digestible explanation can be found here

This doesn’t really need a source, but the exact quote is from “SITUATIONAL AWARENESS: The Decade Ahead”. For a detailed “insider view” on just how little AI researchers actually understand their own creations, see this podcast with Dan Hendrycks

This has been my intuition for a while, but a Sept 2024 white paper shows more rigorously how even with increasing the available training data, other mathematical limitations prevent generative LLMs from avoiding confabulations.

See ELK And The Problem Of Truthful AI

The most impressive example that I have seen using LLMs is PaperQA, but the question of reliability has actually been fully solved in some very specific domains (for example, the AI known as AlphaFold informs the user how confident it is that its predictions are correct; these are extremely accurate as far as anyone has checked).

I still have some hope in alternative models that are likely to be better anchored in the real world (and safer) such as spatial reasoners like Mamba, or those using neuro-symbolic reasoning such as CLEVRER, but right now the people, the money, and the hype all reside elsewhere

Actually, they do discuss it in the context of rare instances where the Talmud implies (Pesachim 112a) that one is allowed to falsely quote a halakhic teaching in order to ensure that it be accepted by hoi polloi. Such statements deserve fuller treatment elsewhere (cf. Yabi’a Omer 2, Hoshen Mishpat 3), but here it suffices to note that, as far as I am aware, not a single medieval commentator ever admitted (or accused a colleague) of fabricating a rabbinic source.

For example, see this 1996 article, מאגרי מידע ממוחשבים ושימושם בהוראה / משה זהבי (daat.ac.il)

As usual, the question of how to read this line in the Gemara and make sense of its particular context as well as the Gemara in Eruvin is not straightforward, especially because different editions of the Gemara have different names for the various people cited. See James Randi on the Sanhedrin? - by Joshua Waxman

Kovetz Shi’urim of R. Elhanan Wasserman, Kiddushin #84. Notably, in context, the very kal va-homer under discussion can, in fact, be rejected; see Tosafos and Ritva there (Kiddushin 10b)

Shmuel's Substack

Discussion about this post

Ready for more?