The Python package deal bibliometrics, with supply code on GitHub and accessible for set up from PyPI, is a command line utility applied 100% in Python that extracts frequent bibliometrics (complete citations, h-index, i10-index) from a researcher’s Google Scholar profile, calculates others (g-index, i100-index, i1000-index) from the primary web page of their profile, and generates an SVG summarizing the metrics which might then be displayed maybe on an inventory of publications on their web site. Right here is an instance (colours are user-configurable) of what this produces when pointed at my Scholar profile:
The meant use-case is for a researcher to watch their very own publications. For instance, I’m at present operating this in a cron job twice monthly (as soon as monthly might be additionally enough). It’s designed with that use-case in thoughts. Additionally it is designed to respect Google Scholar’s present robots.txt
, which at present permits accessing the primary web page of a profile, whereas disallowing just about every thing else. It has no dependencies, and doesn’t use any of the present Python libraries that acquire Scholar knowledge. This isn’t a device for extra usually scraping such knowledge. If you’re on the lookout for extra basic scraping performance, you will discover a number of such Python libraries by looking out PyPI.
This put up is organized as follows:
Supported Quotation Metrics
This utility helps the next bibliometrics:
- Complete variety of citations.
- Complete variety of citations in previous 5 years.
- Variety of citations to most-cited article.
- h-index: An h-index equal to h signifies that the researcher’s h most-cited articles have been cited a minimal of h instances every.
- g-index: A g-index equal to g signifies that the researcher’s g most-cited articles have been cited a mean of g instances every.
- i10-index: A researcher’s i10-index is the variety of their articles cited at the very least 10 instances.
- i100-index, i1000-index, i10000-index: These are just like the i10-index, however as an alternative are the numbers of articles cited at the very least 100 instances, 1000 instances, and 10000 instances, respectively.
Why These Quotation Metrics for This Utility?
A number of of those might be extracted from the researcher’s Google Scholar profile immediately, whereas respecting Scholar’s robots.txt
. The others (g-index, i100-index, i1000-index, i10000-index), likewise whereas respecting Scholar’s robots.txt
, might be calculated utilizing solely the primary web page (prime 100 publications) of the researcher’s Google Scholar profile (supplied the metric is at most 100). For any of those that will require retrieving greater than the primary web page of outcomes to compute, the appliance merely skips them. For instance, if the researcher’s g-index is definitely 105, the appliance will not have the ability to compute this since it will possibly solely retrieve an inventory of the highest 100 publications of that researcher with out violating Scholar’s robots.txt
, and thus the SVG that’s produced merely will not present the g-index.
Learn how to Use
Putting in from PyPI
To put in from PyPI:
python3 -m pip set up bibliometrics
Or on Home windows:
py -m pip set up bibliometrics
Configuring
You configure the utility with a JSON file. The JSON configuration file should be named .bibliometrics.config.json
. The .
at begin just isn’t a typo. Its rationale is my very own private use-case, the place I run this in a listing containing contents of a GitHub Pages web site, and GitHub Pages by default does not serve recordsdata with names starting with a .
. Right here is an instance of the configuration (clarification follows):
{
"scholarID": "YOUR-SCHOLAR-ID-HERE",
"jsonOutputFile": "bibliometrics.json",
"svgConfig": [
{
"background": "#010409",
"border": "rgba(56,139,253,0.4)",
"filename": "images/bibliometrics2.svg",
"text": "#c9d1d9",
"title": "#58a6ff"
},
{
"background": "#f6f8fa",
"border": "rgba(84,174,255,0.4)",
"filename": "images/bibliometrics.svg",
"text": "#24292f",
"title": "#0969da"
}
]
}
The above instance configures the bibliometrics utility to generate two SVG recordsdata, certainly one of them with a lightweight colour theme, and the opposite with a darkish colour theme. The "svgConfig"
subject can be utilized to configure as many SVGs as you wish to generate (all for a similar Scholar ID). In case you solely need one SVG, simply present an inventory there with a single JSON object describing the varied colour properties. The fields textual content
, title
, border
, and background
can all be specified through any legitimate methodology of defining a colour in an SVG, similar to 6-digit hex colours (a lot of the colours within the instance), 3-digit hex colours, rgba (see instance), in addition to named colours. Whether it is legitimate as a colour in an SVG, you should use it. The bibliometrics utility merely inserts it for the colour.
The "jsonOutputFile"
subject is elective. If supplied, then along with producing an SVG, a JSON file may even be generated containing the extracted and computed bibliometrics.
You’ll be able to specify your Scholar ID in certainly one of two methods. Within the above instance, the sphere "scholarID"
is used. Alternatively, the bibliometrics utility may even test for an atmosphere variable SCHOLAR_ID
. A single Scholar ID is used irrespective of what number of SVGs you’re producing. The intention of this utility is to be used by a researcher for their very own bibliometrics, and among the many design standards was to make it inconvenient to make use of to extract bibliometrics for a number of researchers.
Operating
After you have accomplished configuration, change your working listing to the listing containing the .bibliometrics.config.json
file, and execute the next:
python3 -m bibliometrics
Or on Home windows:
py -m bibliometrics
Information for Potential Contributors
The bibliometrics package deal is licensed through the MIT license. Supply code is maintained on Github right here:
Summarize your Google Scholar bibliometrics in an SVG
This command line utility does the next:
- retrieves the primary web page of your Google Scholar profile;
- parses from that web page your complete citations, your five-year quotation depend, your h-index, your i10-index, and the variety of citations of your most-cited paper;
- computes your g-index supplied whether it is lower than 100 (motive for limitation later);
- computes your i100-index, i1000-index, and i10000-index (doi:10.1007/s11192-020-03831-9), hiding any which are 0, and supplied they’re lower than 100 (motive for limitation later);
- generates a JSON file summarizing these bibliometrics; and
- generates a number of SVG photographs summarizing these bibliometrics.
The intention of this utility is as a device for a researcher to generate an SVG of their very own
bibliometrics solely. For instance, I’m utilizing it to generate and replace such an SVG for my very own
profile twice month-to-month…
If you’re all for submitting points or contributing code, any proposed new options should be implementable whereas respecting Scholar’s robots.txt. This largely means restricted to what might be extracted or computed from the primary web page of a profile (as much as the primary 100 publications). Moreover, proposed new options should not be solely for the aim of creating it simpler to scrape a number of profiles. For instance, using a configuration file (with a restrict of 1 scholar ID) slightly than command line arguments intentionally makes it much less handy (although not unimaginable) to make use of inside a script that processes a number of profiles. The identify of the configuration file, and its location relative to present working listing, are usually not configurable for that very same motive.
The place You Can Discover Me
You’ll find me on the net: https://www.cicirello.org.
Right here on DEV:
On GitHub:
If you wish to generate the equal to the above in your personal GitHub profile,
try the cicirello/user-statistician
GitHub Motion.