Until 31.03.2026 ULB Düsseldorf is licensing access to the Web of Science Research Assistant for a testing phase.
AI-assisted research
Discover. Understand. Try out.
Artificial intelligence is used in various activities of scientific work and is leading to changes in the organisation and results of research. This development also affects the search for literature, information and data.
The number of AI-based research applications is growing, both cost free applications available on the internet and paid products such as the AI-supported research assistant of the Web of Science database.
However, not only technology is changing, but also our research behaviour: both prompting and natural-language questions posed to research tools are currently establishing themselves as approaches to research.
Generative artificial intelligence in the form of so-called large language models generates a statistically probable series of words (‘output’) based on an input (‘prompt’). The major difference to traditional search engines is that queries can be formulated in natural language.
However, because of the way these large language models work they pose considerable issues when searching for scientific information:
- The training data set of these models consists of a very large amount of data, including data that would not normally be used for searching scientific information. The exact scope and content of the training data set are unknown.
- The training data set is limited, meaning that certain time periods or topics, e.g. niche topics, are not or only incompletely covered.
- The training data set consists of data that may be systematically biased, e.g. in terms of language or culture. For example, a chatbot’s answers could reproduce discriminatory views.
- AI language models are not designed to create a search result list of relevant documents, as search engines and subject databases do. Even though if prompted they may generate credible-looking citations, these may contain factual errors or may be fictitious altogether.
However, there is a relatively new approach to develop AI applications that attempt to compensate these disadvantages when searching for information.
This approach is called retrieval-augmented generation (RAG). At the moment, a large market is emerging for these applications, particularly in the scientific community.
Retrieval-augmented generation refers to a kind of technology that makes it possible to compensate for problems that occur when using AI models to search for information. As of right now, this approach is implemented in almost all available AI search tools. Simply put, they work like this:
The AI language model is linked to a separate database on the basis of which it is to answer questions. This database can be expanded independently of the model’s training data, e.g. to cover the latest developments or publications. These can be websites, scientific publications in a subject database, a company’s documents or any other collections of documents.
After being asked a question in natural language by the user, the tool translates it into a search query. The system then uses a search algorithm to select relevant documents for the database. These documents (or extracts thereof) are passed to the model.
Then, the model generates an answer to the question posed and refers to the relevant documents. It can make references, usually in the form of links, to the documents used for the answer. This allows the generated answers to be verified. If the database is expanded, this new information can be taken into account.
These search tools automate a process that people normally carry out themselves when researching: they try to synthesise the information they find.
The following questions can help you to determine whether the use of an AI assistant makes sense for your search.
- Which database is used? Is it the right database for my subject or topic? Are there other tools relevant to my subject or topic that I should consider?
- Is the selection of the results transparent?
- How much prior knowledge to I have? Am I able to sufficiently evaluate the content of the generated information?
- Can I make my search reproducible for others?
- Are my search queries saved and used for further training of the system?
On their websites, many AI assistants provide tips for better prompting, i.e. for the best possible formulation of search queries. Apart from that, it is also possible to describe some generally applicable rules for prompting. A particularly recommendable framework (CLEAR) was described in 2023 by Leo S. Lo, a librarian from the USA.
Concise: brevity and clarity in prompts
Logical: structured and coherent prompts
Explicit: clear output specifications
Adaptive: flexibility and customization in prompts
Reflective: continuous evaluation and improvement of prompts
Detailed prompting examples can be found in the corresponding publication.