Skip to main content
Skip header

Literature Search

A literature search is a systematic process of finding and evaluating sources relevant to a research topic. Its purpose is to map what has already been published on the topic, assess the quality of those publications, identify the languages in which they are available, and locate where they can be found. This helps avoid unnecessary duplication of research, uncover important connections, and reveal potential gaps in existing knowledge.

In both academia and professional practice, a literature search is the foundation of any serious work. Without it, it is impossible to define a topic properly, formulate a research question, or build arguments based on relevant sources.

A well-conducted literature search saves time, reduces the risk of errors, and improves the quality of decision-making. However, this is a skill that cannot be mastered without practice.

The term filter bubble was introduced around 2010 by Eli Pariser. It refers to the personalization of results based on a user’s previous online behaviour. The aim of such algorithms is to offer content that the user is likely to find interesting.

However, this tailoring means that we mostly see information and types of content that confirm our existing opinions, interests, or attitudes. Opposing or entirely new perspectives are pushed into the background or do not reach us at all. This effect is amplified in environments such as social media or AI-powered chat assistants.

When we are constantly exposed only to familiar viewpoints, we end up in what is known as an echo chamber. In such an environment, we hear the “echo” of what we already know over and over again, gradually distorting our view of the world.

Science and knowledge in general work in exactly the opposite way: they thrive on the unknown, on critical questioning, and on the search for new paths. Information bubbles are therefore not only undesirable in research, but they can also undermine the very foundations of knowledge.

How can we protect ourselves from bubbles? A few simple strategies can help:

  • actively seek out different perspectives and approaches,
  • use and combine different types of search engines and sources,
  • search in multiple languages.

When conducting a literature search, it’s important not only what we search for, but also how we formulate the search query. In professional e-resources, it is often advantageous to search in English. Even if an article or thesis is written in Czech, it usually includes an English translation of the title, abstract, and keywords. In library catalogues, however, the bibliographic record is often translated into Czech even if the document itself is in another language (for translations of technical terms, see the Polythematic Structured Subject Heading System), except for the title, which is left in the original.

In addition to the language of the document (English vs. Czech), there is also the language of the search engine” – the syntax and format the search engine expects. Most modern search engines allow queries to be written in natural language, but for academic searches it is still more effective to use keyword and search operator language.

In both cases, the quality of the results strongly depends on how well we have thought through what we are actually looking for.

2. 1 Keyword and Search Operator Language

Search engines do not search the entire internet directly – they search their own indexes, which are essentially databases of records, similar to library catalogues. This is why it is important to choose the right keywords, as they directly determine the results you will get. For example, adding the phrase “systematic review” will change the nature of results even on Google.

Search operators help refine queries. Some search engines offer these as part of advanced search, where operators are presented in a more user-friendly form. Information about the specific operators supported can usually be found in the search engine’s help section.

Each search engine has its own set of operators, so their exact form may vary – but the underlying principles are similar:

  • Boolean (logical) operators
    define the relationship between search terms: all must be present (AND), at least one must be present (OR), or certain terms must be excluded (NOT), 
  • exact phrase operators 
    enclose a word, phrase, or sentence in quotation marks (" ") to search for that exact sequence of words in that order,
  • wildcards 
    replace unknown letters, parts of words, or numbers (e.g., *, ?, %, #, ...),
  • other operators 
    e.g., parentheses ( ) group logical operations in complex queries, while proximity operators tell the search engine how close words should be to each other,
  • field or attribute search 
    limits a query to specific attributes of a web page (in Google, for example: allintitle: Schrödingerova zablešená kočka site:fysis.cz filetype:pdf) or to metadata fields in a bibliographic record (in EBSCO Discovery Service, for example: TI Schrödingerova zablešená kočka AND AU Kratochvíl AND IS 0009-0700).

Alongside operators, filters are an essential tool for narrowing results by document type, publication date, language, and other parameters.

Boolean operators

2. 2 Natural Language and the Art of Prompting

With the rise of AI-powered tools, we increasingly encounter the option of asking questions in natural language. Such queries are called prompts.

While traditional search engines respond with a list of links, AI tools generate a coherent answer. The quality of that answer depends on how well the prompt is formulated. However, even a well-written prompt does not guarantee a correct answer (large language models guess” their responses based on probability patterns, not on verified facts!).

Good prompt: clearly and concisely states what we want (topic, question), why we want it (purpose, level), in what form the output should be (style, format), and, if relevant, from what role the model should respond (e.g., expert, teacher, researcher). It is also worth considering whether to use English (or another language) and whether a complex task should be broken down into several smaller prompts.

Prompting a prompt: we do not have to write the perfect” prompt ourselves (some could take up several pages). Often, it is enough to create a prompt that instructs the model to generate the desired prompt for us, even specifying the language in which the final prompt should be written.

And because language models themselves are pre-prompted” to please us (agreeing with us, praising us, or motivating us), a basic skill for any user is also prompting for critique.

Web search engines help us find information on the internet, but the results themselves are not neutral. They are influenced by how the search engine “reads” the web, what it knows about us, and what it wants or is required to show. When we enter a query, the search engine does not search the entire internet but only its index, a database of websites (or another engine’s index). The results are then ranked using complex algorithms that take into account the relevance of the query, the credibility of the source, personalization (what the engine knows about us), and any filters or restrictions (such as legislation, censorship, or promotion). Results are often accompanied by paid advertisements and special elements (such as answers, maps, images, and more).

Although Google Search has a dominant global market share, it faces competition from local alternatives in different regions—for example, Seznam in the Czech Republic, Baidu in China, and Yandex in Russia. Microsoft Bing has also recently become a strong player.

Each of these search engines has its own index and ranking rules, which differ depending on technical capabilities (e.g., index size, algorithms) and legal restrictions in a given country. As a result, search results can vary not only by the search engine used but also by who is searching and from where.

In addition to these major players, there are many other search engines.

Here are a few examples:

Search Engine

Headquarters

Indexes

Privacy Protection (Search)

Privacy Protection (Click)

Notes

Brave Search

USA

own + Bing fallback (via API)

anonymous queries; optional index building

only in Brave browser

some features available only in Brave browser

DuckDuckGo

USA

mostly Bing

no tracking cookies

only in DDG browser or with DDG extension

relevance of results may vary for complex queries

Mojeek

United Kingdom

own (independent crawler)

strict non-profiling policy

no

less known; smaller but rapidly growing index

SearxNG

decentralised

combined (Bing, Google, Mojeek, etc.)

depends on instance

yes – most instances allow proxy access

for advanced users; quality depends on chosen instance

Startpage

Netherlands

Google + anonymous access

IP not shared with Google; no tracking cookies

yes – Anonymous View (proxy)

Google accuracy without tracking

Swisscows

Switzerland

mainly Bing + own semantic layer

no data or IP storage; no profiling

no

family-friendly (filters explicit violence, sexual content, etc.)

3. 1 Deep Web

The portion of the web indexed by standard search engines represents only a small fraction of the whole. This publicly accessible part is called the surface web. Most of the web remains invisible to these search engines. This hidden part, called the deep web, includes all content that requires a login or payment, is stored in file formats unreadable by web crawlers, is skipped due to indexing priorities or efficiency limits, or is deliberately excluded from indexing (for example, via a robots.txt file).

Within the deep web lies an even more anonymous segment known as the dark web. While it can be used to protect the privacy of whistleblowers, especially in repressive regimes, its use requires advanced technical knowledge. Even then, it can be extremely risky, as there is always a possibility of exposure due to increasingly sophisticated monitoring and surveillance technologies.

The internet is constantly changing. What was online yesterday may be gone today. Although the internet became publicly accessible in 1991, its real boom (and the rapid growth in the number of websites) began around 1995. This created a need for web archiving, and since 1996, the Internet Archive has been “harvesting” the web. Thanks to this, we can now access previously active but no longer existing pages, or track the development of current websites over time.

Not everything can be archived, due to technical limitations or legal restrictions.

In addition to the global Internet Archive initiative, there are also specialised and national projects, such as the Czech Webarchiv managed by the National Library of the Czech Republic, and many others.

In addition to general-purpose web search engines that “scan” the entire internet, there are also tools focused on specific types of content or fields of interest. In both academic and professional contexts, the most relevant are library catalogues, specialized databases, and repositories.

Searching each source individually is beyond human capacity. Instead, we use tools that make the process easier: union catalogues for library collections, and (web-scale) discovery systems for databases and repositories. However, it is important to remember that no tool covers everything.

Union catalogues only include records supplied by participating libraries, while the coverage of discovery systems depends on what and where they index. During a literature search, it’s useful to pay attention to where these tools lead – often to the original document in a specific database, library, or repository, where more targeted resources may be available.

5. 1 Library Catalogues

For beginning students, the library collection in the VSB-TUO Central Library catalogue is usually sufficient. As research needs grow, however, it is worth exploring other catalogues, for example, the nearest regional research library (for the Moravian-Silesian Region, this is the Moravian-Silesian Research Library in Ostrava).

The collections of Czech libraries can be searched via the Union Catalogue of the Czech Republic, maintained by the National Library in Prague, or through the more user-friendly portal Knihovny.cz, operated by the Moravian Library in Brno.

If you want to narrow your search to technical fields, you can use the TECHNIKA Subject Gateway.

And if you need to check whether a translation of a document exists in a language you can understand, use the international catalogue WorldCat.

5. 2 Databases and Repositories

Specialised databases are electronic systems focused on a specific field or topic. They provide access to journal articles, books, conference proceedings, research reports, standards, patents, and other materials essential for research, study, and professional practice. Our selection of both subscription-based and freely accessible databases can be found here.

Repositories are designed to store, manage, and provide access to research data and publication. At VSB-TUO, we use the VSB-TUO DSpace Repository for publications and Zenodo for research data. Many others exist as well, such as the Czech National Repository of Grey Literature, arXiv (preprints in physics, mathematics, computer science, economics, and related fields), and GitHub (source code, software, and documentation).

5. 3 Discovery Systems

Discovery systems allow unified searching across multiple information sources at the same time – typically specialized databases, repositories, and sometimes even library catalogues. The VSB-TUO Central Library uses the EBSCO Discovery Service.

The aim of a literature search is not to find as much as possible” but to identify sources that are truly worth reading, analysing, and using. Assessing the relevance and quality of sources is therefore a key part of any search process.

6. 1 Initial Relevance Check

At first glance

Web search engines usually display only a title and a short snippet of text. These can be misleading – a title may be deceptive, the snippet may be taken out of context, and displayed links are often sponsored.

Specialised search tools work to meet academic standards and therefore display details such as bibliographic data (author, title, year of publication), subject headings (topics), keywords, and abstracts (author summaries).

At second glance

Some web search engines now offer short AI-generated summaries, but these are currently unsuitable and sometimes counterproductive for academic and professional use.

Specialised search tools often allow users to preview at least the table of contents and the list of references.

If full texts are accessible, it is worth checking the introduction, literature review, discussion, conclusion, or directly the chapter related to your topic. You can also search for keyword context (Ctrl+F or the “find on page” function), or in a printed book, the index can serve the same purpose.

6. 2 Signs of Quality

Online content can be evaluated based on several indicators:

  • web address (domains such as .gov, .edu, or .ac tend to be more trustworthy than unknown blogs; unknown or suspicious-looking sites may pose risks of manipulation, technical threats, or fraud), 
  • public engagement (views, shares, likes, or comments can be indicative but they can also be easily to manipulate),
  • responsible author or institution (check who is behind the information and their level of expertise),
  • verification (use fact-checking initiatives, but be aware that they may have limited methodology or biases).

In professional and academic contexts, additional indicators include:

  • whether the publication has undergone peer review in a scientific journal or by a reputable publisher,
  • checking impact metrics in citation databases such as Scopus or Web of Science, such as citation counts, journal impact factor, or author h-index,
  • and newly, reviewing the citation context of the publication and its references (Scite).

However, even when a source that appears credible, it may not be high quality or factually correct. A peer-reviewed article can later be exposed as flawed or manipulative. High citation counts may result from criticism, self-citation, or predatory journals/publishers/conferences artificially inflating their metrics.

True confidence in quality only comes from reading the text, verifying the information where possible, and ideally testing it yourself. Even then, mistakes remain possible.

6. 3 Peer-Reviewed vs. Grey Literature

Peer-reviewed literature undergoes expert assessment before publication. This process supports a higher level of academic quality, accuracy, and credibility. Typical examples include scholarly articles, conference proceedings, and monographs (books).

Grey literature refers to documents that have not gone through the standard peer-review process but can still be highly valuable. Examples include theses and dissertations, research reports, preprints, technical reports, and documents issued by government bodies or institutions. In the Czech Republic, grey literature is collected and archived by the National Repository of Grey Literature, maintained by the National Library of Technology. When it comes to preprints (unreviewed manuscripts), a list of global preprint repositories can be found in the Directory of Open Access Preprint Repositories.

Grey literature is often be more up-to-date and/or more practically focused, but it should be evaluated with extra care in terms of quality and trustworthiness.

No library can hold every published document (whether physically in its collection or through online access) because it is limited by space, budget, and licensing agreements. It is therefore useful to know what other options exist for obtaining a document:

  • The publication is older, and I need it right now? 
    Thanks to large-scale digitisation, many older books and journals are now available online in digital libraries, such as:
    • National Digital Library (NDK Czech Republic): includes both public domain works (70 years after the author’s death) and out-of-commerce works (20 years for books, 10 years for journals). Access is available to registered users of most Czech libraries (login via library account).
    • Internet Archive: contains digitised texts (including DAISY format for visually impaired users), films, music, and software from around the world. Registration and login are required for copyrighted works.
  • The publication is not in the library or available online?
    If the document is not available in our library or accessible online, you can use the inter-library service (ILL). The library can request a physical book from another library or provide a copy of an article if possible.
  • Need access to more literature?  
    By registering with another library, you can expand your access to its physical collection and e-resources. Some libraries even allow remote registration for e-resources only (note: this does not apply to university libraries, which, for licensing reasons, only provide access to subscribed e-resources for their students and staff).
  • „Lucky Guess”
    Sometimes full-text articles are freely available elsewhere, and Google Scholar has found them, or they were once available and have been preserved by Internet Archive Scholar. It can be worth “trying your luck” in this way.

AI tools today often use so-called large language models (LLMs).

Warning: Preliminary study from MIT suggests that when people rely primarily on large language models (LLMs), their brain activity and their ability to remember may decrease. If you first think through the problem on your own and put it into context, however, artificial intelligence can help you refine the result more effectively.

8. 1 How it (does not) work

Language models do not operate on fixed rules (they are not “smart calculators”). Instead, they are trained on large text datasets, which may include content from websites, books, encyclopaedias, or discussion forums. During training, the model learns to recognise patterns in language, such as which words and sentences typically follow one another, what style fits a particular situation, or how the argumentative structure of an academic text looks. The model does not store entire texts; instead, it creates an internal statistical representation of language – a probabilistic model of what is most likely to come next in a given context.

Once pre-trained, the model is fine-tuned (e.g., using human feedback) and equipped with a system prompt that defines its role, behaviour, and boundaries. It can also be adapted for specific tasks such as conversation, text summarisation, code generation, or literature search. Some models also have access to the internet, databases, or other tools.

The user then interacts with this “ready-made” model in natural language via user prompts. The model responds by generating text that it predicts to be the most likely continuation in the given context.

8. 2 Bias

The output of a language model is influenced by a combination of several factors: the patterns it learned during pre-training (from text datasets), the way it was fine-tuned, the system prompt, additional instructions from the provider or third parties, and, of course, the user prompt, the current context, and previous conversation history.

Each of these factors, individually and in combination, can lead to bias. The nature of the training data may unintentionally reinforce stereotypes. The system prompt or extra instructions may shape the style of responses or favour certain worldviews or products. And the user interaction itself can create a filter bubble effect.

For the average user, the source of a particular bias is often non-transparent, which increases the need for caution – not only when interpreting AI outputs, but also when deciding what to submit to the AI in the first place.

It is also important to remember that, like many other services, AI tools may store and analyse the inputs you provide. For this reason, the Rules and Recommendations for the Use of Artificial Intelligence at VSB-TUO (TUO_LEG_24_002) explicitly state:

„AI tools should not be used to process sensitive data, such as personal information or data included in contractual agreements.”

8. 3 User Moderation and Critical Review

Almost all AI tools now include some mechanisms to reduce bias, in various forms: from source transparency (links, citations), to feedback options (rating answers), to system-level interventions such as content filters or automatic moderation. However, this does not mean that users can rely on AI-generated outputs without further scrutiny.

AI-generated outputs can be regarded, to some extent, as a form of grey literature – with the important difference that they are produced not by human reasoning, but by statistical computation, whose sources and logic are often opaque. This calls for extra caution when working with such material.

How to use AI tools while avoiding hidden bias?

  1. Well-thought-out prompts and their variations

The way a task is formulated has a major influence on output quality. The more targeted the prompt, the more accurate (and useful) the response tends to be. On the other hand, even small changes in wording, tone, or structure can produce very different answers, sometimes leading to emergent behaviour that seems surprising or illogical. Helpful strategies include:

  • trying different variations of the same query,
  • changing the language (e.g., Czech, English, Spanish),
  • developing the conversation further, asking follow-up questions or clarifying.
  1. Critical evaluation and diversification of sources

Just like with traditional search, the golden rule applies here too: never rely on a single source, single model, or single tool. Always cross-check:

  • responses from other models,
  • information on the web (web search engines),
  • archived versions of websites (web archives),
  • scientific perspectives (specialized search engines).

These checks should be supported by critical evaluation, common sense, and awareness of the broader context.

  1. „Why do I think what I think?“

The rise of AI tools makes it even more important to think critically about our own thinking. In the flood of information (and generated content), it is easy to absorb opinions subconsciously until they quietly become part of our worldview. It is worth pausing now and then to ask:

  • Why do I think what I think?”
  • Do I really think this myself?”
  • Why do I believe this is true?”
  • What can I actually verify?”
  • And how can I even think this at all?”

A Few Useful Tools (Not Only) for Literature Search

Keeping up with the rapidly evolving field of large language models is challenging. Below is a short selection of tools, (re)trained for specific tasks:

Scite: Although not primarily designed for beginners, Scite can help verify the quality of scholarly publicationsby clearly showing citations in context (supporting, disputing, or neutral, along with their position in the sentence). For inspiration, you can also try Scite Assistant. Available at VSB-TUO through our subscription.

Google NotebookLM can summarise the content of uploaded files, turn them into study guides, outlines, or even podcasts, and answer questions about their content. Although VSB-TUO does not currently subscribe to NotebookLM, the free version offers a generous token limit and many features useful for study and research.

Writefull Premium s a tool for language editing of academic texts in English. It helps when writing scholarly articles, theses, or abstracts by suggesting improvements to style, grammar, and scientific expression. It uses a combination of proprietary language models and third-party models. Available at VSB-TUO through our subscription.

GPTZero can estimate the likelihood that a text was generated by AI. Offers up to 10,000 words per month for free. For more frequent use, consider Originality.AI. Both are specialised AI tools (not conversational LLMs) that use algorithms and statistical models to estimate whether a text was AI-generated.