LLMs in Science


You've probably noticed AI, particularly in the form of Large Language Models, is insinuating itself more and more into daily life. As AI becomes more accessible and more commonplace, researchers are increasingly using it both as a target of study, and also as something to aid in their work. AI is a fundamentally distinct technology from previous advancements, and it's important that researchers understand how to use AI and use appropriate methodology.


Our Paper


A while ago, the editor of Nature Computational Science reached out to our lab and told us they were going to be doing a special issue on AI in various disciplines. Since we published some premier work on AI in social science, we were asked to contribute. In our paper, we provide guidelines and suggestions on how researchers should think about using LLMs as tools in social and political science contexts.


Summary of the Work


We start the paper by talking about what exactly is the point of science, which we define as the ability to infer something about the world based on data. Scientists perform experiments in tightly controlled conditions that they hope will generalize to real-world scenarios. Our main contention is that researchers should make clear the target of inference (what they are hoping to learn about the world) early on, and keep that in mind throughout the research process.


We discuss a number of other topics, including the importance of ensuring the LLM is doing what the researchers intend. With this, we discuss several pitfalls to avoid, and talk about best practices for documenting LLM usage.


We finish the paper with a summary of our suggestions, which I've reproduced here in their entirety, as I feel like they provide a good overview of our work:


"Before the start of a study using AI, researchers should:


  1. Clearly define the target of inference. Researchers should clearly state what they are hoping to make inferences about and what element of the LLM-empowered research design corresponds to or enables inference about that target.
  2. Identify one or more standards by which they will determine whether an LLM has successfully completed a needed task. These standards should be specific, tailored to the particular use case and comprehensive.
  3. Provide pre-registration showing either (1) evidence that a particular combination of model, prompt and parameters already sufficiently meets the identified standard of success, or (2) the standard and process by which the model, prompt and other parameters will be selected and deemed sufficient for inference.

In reporting post-study data, researchers should:


  1. Report the inferential target early in the paper so that this focus and objectives are clear to readers, reviewers and other social scientists.
  2. Provide confirmatory evidence that the LLM faithfully completed the particular study task(s) as deployed. This includes clearly stating (and justifying) validation metrics at the time of pre-registration and clear reporting using these metrics once the research has been completed.
  3. Report, ideally in the main text of the paper, the relevant details of LLM use, including exact models, dates of calls, mode of access, hyperparameters and prompts. If these details will not fit in the main text, they should be thoroughly summarized in detail and connected to stable, publicly accessible links to such information elsewhere.
  4. Report, in supplementary information, a summary of explorations of other models, prompts, settings or approaches taken in the process of seeking task success."

Conclusion


While I usually write up my research in more detail, this work is more accessible, so I didn't feel a need to explain everything quite as much. If you're interested, I encourage you to read the whole paper (it's not that long). Sometimes, academics publish pre-prints of their papers, or versions that are publicly available before the papers go through peer review. While some rights to the eventual published paper are retained by the journal (and access is sometimes gated), the pre-print continues to be available. You are welcome to read the pre-print (which is very similar to the final version) here.