The principle of verifiability on Wikipedia mandates that all content must be verifiable by readers through reliable sources. Wikipedia insists on information being based on what has been previously reported in reputable publications, not the personal convictions or unpublished work of its contributors. If there are contrasting views from reliable sources, Wikipedia maintains an impartial stance by presenting each viewpoint proportionately.
All content in the Wikipedia articles must be backed up by reliable sources. Direct citations are required for all quotations and any content that might be questioned or is prone to questioning. Without proper inline citations, such content is subject to removal.
Wikipedia emphasizes the importance of grounding its articles in dependable, autonomous sources known for their diligence in fact-checking and accuracy. Such sources should be published, which in Wikipedia’s context means they should be accessible to the public in any format. Wikipedia does not consider unpublished materials reliable. It is important to use sources that adequately support the content and are suitable for the statements being made, especially when dealing with sensitive topics like biographies of living individuals or medical information.
Given the vastness of the internet, which hosts over a billion websites, it becomes a challenging task for Wikipedia users to individually evaluate the reliability of each sources. In some editions of Wikipedia across different languages, there are specific guidelines detailing which sources may be deemed reliable. However, there is not complete list of websites that can be used in Wikipedia as reliable sources of information. Additionally, the reliability and reputation of a website can vary over time, depending on the language and subject matter, necessitating frequent updates to these lists. Therefore, a more comprehensive and current compilation of such trusted sources would not only benefit the editors who curate Wikipedia’s content but also its readers who rely on the encyclopedia for accurate information.
BestRef serves as a tool to evaluate the importance of information sources utilized in Wikipedia. It offers insights into the most significant sources of information across various language editions of Wikipedia, facilitating the assessment of the quality and credibility of the content presented within this vast online encyclopedia. This aids in ensuring that Wikipedia remains a trustworthy repository of knowledge.
Currently the BestRef database contains results of assessment of 3.8 million websites in over 300 language versions of Wikipedia. Based on analysis of over 60 million Wikipedia articles in October 2023 it was possible to extract information about over 330 million references. This allowed to identify the best information sources of Wikipedia using different assessment models. The table below shows the results of reference extraction for selected language versions and number of unique websites (links lead to rankings of the best sources of information in the selected language versions):
|Wiki||Language version||Article count||Reference count||Unique websites|
Importance of each of websites was assessed by the BestRef using three models (which were described in the research published in 2020):
- F-model: based on frequency (F) of source usage.
- PR-model: based on cumulative pageviews (P) of the article in which source appears divided by number of the references (R) in this article.
- AR-model: based on number of authors (A) of the article in which source appears divided by number of the references (R) in this article.
Frequency of source usage in F-model means how many references contain the analyzed domain in URL. This method was commonly used in different research works. So, F-model takes into account a total number of appearances of such reference, i.e., if the same source is cited 3 times, then the frequency will be equal 3. Equation  shows the calculation for F-model, where s is the source, n is a number of the considered Wikipedia articles, Cs(i) is a number of references using source s (e.q. domain in URL) in article i.
PR-model uses cumulative pageviews divided by the total number of the references in a considered article. Comparing to the previous model, here additionally popularity of the Wikipedia article and visibility of the references that used the analyzed source was taken the into account. This model amuses that in general the more references in the article, the less visible the specific reference is. The equation  shows the calculation of measure using PR-model, where s is the source, n is a number of the considered Wikipedia articles, C(i) is total number of the references in article i, Cs(i) is a number of the references using source s (e.q. domain in URL) in article i, V(i) is cumulative pageviews value of article i. Please note, that overcklocked values of the pageviews for some Wikipedia articles were reduced.
As the pageviews value of article is more related to readers, there is also another important measure that addresses the popularity among authors, i.e., number of users who decided to add content or make changes in the article. Given the assumptions of previous model, AR-model is related to authors. It is described on the equation , where s is the source, n is a number of the considered Wikipedia articles, C(i) is total number of the references in article i, Cs(i) is a number of references using source s (e.q. domain in URL) in article i, E(i) is total number of registered authors (non-bots) of article i.
More detailed information on the use of these and other models can be found in relevant scientific publications:
- Companies in Multilingual Wikipedia: Articles Quality and Important Sources of Information (2023)
- Identification of Important Web Sources of Information on Wikipedia across various Topics and Languages (2022)
- Reliability in Time: Evaluating the Web Sources of Information on COVID-19 in Wikipedia across Various Language Editions from the Beginning of the Pandemic (2022)
- Identifying Reliable Sources of Information about Companies in Multilingual Wikipedia (2022)
- Modeling Popularity and Reliability of Sources in Multilingual Wikipedia (2020)