Stranger Apologies

Fair enough! I definitely think the "this is just replicating human biases" hypothesis is a live one, and maybe even a plausible one. But I do think the alternative is worth taking seriously, especially since we know it doesn't JUST give us back human-style responses—for example, it's grammar and clarity of writing is MUCH better than the average internet user. (As a professor, the different between 20-year-old college students' writing and ChatGPT is striking.)

Thanks for the pushback! Well taken.

Expand full comment

Mitchell Porter

Aug 23, 2023

Bard had some thoughts: https://pastebin.com/2PtVAaGt

Expand full comment

Brian Weatherson

Nice post!

I've been thinking about running some of the experiments that purport to show irrationality in auction participants in my classes, e.g., the first two experiments here: https://veconlab.econ.virginia.edu/auctions.php. I think it's reasonably well known how humans do behave in these cases, and how it differs from textbook rational behavior. I wonder if there's a way to test GPT against it, or whether getting it to participate in a group setting, like an auction, would be too tricky.

Expand full comment

Aug 22, 2023

Thanks! Interesting, yeah would be curious to see what it does. You could definitely run a version where you just tell it that there are other people also getting a noisy signal of the value (eg) and then give it its signal and ask it to bid. Always a question about how robust the behavior will be to variations; but I found pretty good consistency in my test runs on these problems so who knows? Worth a try! If you run some version of it, curious to hear what comes out!

Expand full comment

Robert J R

Aug 19, 2023

A very interesting post -- it's fascinating to see ChatGPT evaluated in this type of way. It's a thorough yet concise post, and very well-written.

I am a little confused at the claim "a trillion-parameter large language model that we know reasons in a fundamentally probabilistic way".

I know you address this briefly at the end, but your logic doesn't seem fully compelling there. Most internet users never provide any text relevant to a given coding problem, or LSAT question, or what-have-you-here. Sure there's a lot of babble in ChatGPT's training data -- there's also a lot of not-babble! The argument that ChatGPT > average internet user -> ChatGPT is doing beyond-human reasoning -> ChatGPT's reflection of these 'errors' should have us question these 'errors' thus doesn't seem to hold water for me. Even without strongly claiming otherwise, actually understanding how ChatGPT reasons is a big and very difficult project, and I don't think you can simply point to it being efficacious to draw conclusions from its style of reasoning.

I may be missing something in your argument. Would love to hear more on this. Thanks for the excellent posts!

Expand full comment

Thank you! Fair points. I definitely think this is speculative, and very far from an airtight argument.

I was thinking of this mostly as a negative argument *against* the heuristics-and-biases story, than a positive argument for resource-rationality. The thought is that the irrationalist narrative is an inference-to-the-best-explanation: "people make errors XYZ; that would make sense if they didn't maintain probability distributions over possibilities and instead reasoned with simple heuristics". Then GPT4 comes along and shows that a system that DOES maintain complex probability distributions (albeit over word-tokens) and seems to learn some of the best of human reasoning, and it also exhibits the same features. In this case, we KNOW the explanation is not that GPT4 can't do probabilistic reasoning and so needs to simplify, so we at least of an example of a fairly domain-general reasoner that doesn't fit the heuristics-and-biases picture.

Of course, there are competing explanations—like the ones people have been raising, that ChatGPT is just giving us back our biases. I don't think I have a knockdown argument against it, but just wanted to point out at the end of the post that that explanation risks overgeneralizing, and so we should consider alternatives.

Expand full comment

Herbie Bradley

Aug 19, 2023Edited

The problem with this reasoning, is that ChatGPT trained on the entire internet with no fine-tuning will babble, and will probably not score above the average internet user on many benchmarks. The ChatGPT we use every day is fine-tuned on extremely extensive and expensive datasets of high quality answers and ratings of math, reasoning, and code questions - basically any problem people are likely to use the chatbot for, high quality answers and preferences were collected. This is responsible for the vast majority of the capabilities the model exhibits when you interact with it.

So what we see is exactly what we would expect from a model trained on high quality human answers to reasoning questions, and that's why it exhibits the same flaws.

I think the idea that known, obvious flaws in reasoning exhibited by humans are actually in some way generally beneficial, is a nice story to tell ourselves which is sadly completely not true.

Expand full comment