Is ChatGPT Eroding Cognitive Ability?

Subscribe to Newsletter

Subscribe to our newsletter for latest insights.

ChatGPT makes us faster and more efficient, but is it making us less intelligent in the process?

First, large language models clearly change performance, but not in one direction. In a randomized field experiment with 758 BCG consultants, generative AI lifted performance on creative and idea-generation tasks, yet hurt performance when people used it outside its “competence frontier,” especially on complex analytical reasoning. In those cases, accuracy dropped and consultants over-trusted wrong answers.

Productivity effects are real, and uneven. In a study of 5,179 customer support agents, access to a ChatGPT-like assistant increased issues resolved per hour by 14 percent on average, with the biggest gains among novices and low performers, while experts benefited little. That pattern suggests skill compression and a risk of deskilling at the top if professionals lean on the tool for work they already do well. 

Over-reliance is a genuine cognitive risk. A 2024 systematic review on dialogue systems in education found that heavy reliance correlates with weaker decision quality and critical thinking, driven by automation bias, hallucinated content, and reduced effort. Reviews of automation bias in human–AI collaboration reach similar conclusions, and experiments show people will follow AI advice even when it conflicts with available context. These are classic mechanisms for atrophy of independent reasoning if use is uncritical. 

Education is where the signal is sharpest, and it cuts both ways. A 2025 meta-analysis of 51 studies found mixed effects on learning and higher-order thinking, with outcomes depending on how ChatGPT is used. Randomized studies highlight a key danger pattern: students who practiced with unrestricted AI performed better immediately, then did worse on later tests without AI, consistent with shallow processing and reduced retention. Other controlled studies show improvements when AI is used as a scaffold for debate, feedback, or step-by-step explanation. Tool use matters more than tool presence.

In professional writing and content work, assistance speeds drafting and raises average quality, but it can homogenize voice and weaken originality if people lean on it for generation rather than judgment. Experimental work in Science Advances found AI-seeded stories were rated more creative and polished, especially for less creative writers, yet outputs converged in style. Nature has warned that scientific prose improves in form while risking misinterpretation if authors fail to verify claims and context.

Law is a cautionary tale about reasoning without verification. Courts have sanctioned attorneys for filing briefs with fabricated citations generated by chatbots, and judges in several countries have issued formal warnings. Separate evaluations from Stanford and others find pervasive legal hallucinations across popular models, underscoring that unverified AI prose can look authoritative while being wrong. The cognitive trap is misplaced confidence, not only the underlying error rate. 

Healthcare shows why domain boundaries matter. A 2024 Stanford report found that physicians using GPT-4 alongside conventional tools did not materially improve diagnostic reasoning relative to conventional tools alone, while recent reviews catalog how medical hallucinations can distort summaries and treatment implications. These findings point to value in structured, supervised use, and risk if clinicians cede reasoning to the model. 

Across groups, the pattern is consistent. Students gain speed and confidence but risk weaker retention when the model does too much of the thinking. Junior professionals benefit most on routine communication and ideation, mid-career knowledge workers improve on brainstorming and drafting but can stumble on intricate analysis if they accept outputs uncritically, and experts face the highest risk of subtle errors, since AI can be persuasive on topics where it is unreliable. This aligns with the jagged frontier result in consulting and the “novice boost, expert plateau” result in customer service.

What should we do about the derogatory effects? The literature points to a few practices that preserve cognition. Keep the human as the decider, and use AI for option generation and first drafts. Require explicit verification on facts, sources, and citations, especially in law, medicine, and finance. Teach people to recognize automation bias and ask the model to state where it is uncertain or likely to err, which reduces blind trust and improves collaboration on complex tasks. Pilot within bounded task types where the tool is strong, and withhold it on tasks that demand unassisted reasoning, especially during training or assessment phases where you want durable learning.

The bottom line is that LLMs can raise the floor on writing and routine analysis, and they make novices look a lot more like pros. They can also flatten judgment when people stop doing the thinking themselves. The risk is not that ChatGPT makes us less intelligent by default, it is that uncritical, heavy use encourages cognitive offloading and overconfidence in fluent answers. The effects differ by segment because task type, expertise, and oversight differ. Use the model as a catalyst and a critic, not as the final arbiter, and you keep the upside while protecting the muscles you actually want to strengthen.

Written by the human, Curt Schwab and edited by ChatGPT 5.  

Research sources: Harvard Business School, Stanford Medicine, and Reuters.

Subscribe to Newsletter

Subscribe to our newsletter for latest insights.

Ready to Transform with AI?

Simple, clean, and aligned with your core message