About & Methodology

UTP Publication Lens — data sources, processing pipeline & limitations
Overview

UTP Publication Lens is an internal bibliometric tool that consolidates, cleans, and visualises Universiti Teknologi PETRONAS’ (UTP) research output. Data are sourced from two major indexing platforms — Scopus (via SciVal) and Web of Science (via InCites & JCR) — and enriched with CiteScore, MyCite, and a list of Scopus-discontinued journals. The dashboard supports internal reporting, benchmarking against world averages, and strategic research planning.

Data Sources

Scopus scopus.csv · 10,382 records · 45 cols

Raw export from Elsevier Scopus. Bibliographic metadata per publication.

Key fields:Authors, Author IDs, Title, Year
Source title, DOI, Affiliations
Cited by, Author Keywords
Join key:EID
SciVal scival.csv · 10,382 records · 70 cols

SciVal export matched 1-to-1 with Scopus by EID. Primary analytics source.

Key fields:FWCI, Scopus Author IDs, Publisher
Institutions, Collaboration Type
Topic Cluster & Topic name
ASJC, QS, THE, ANZSRC subject fields
Join key:EID
CiteScore citescore.csv · 31,138 journals · 10 cols

Elsevier CiteScore 2024 rankings. Assigns Q1–Q4 quartile per publication by source title match.

Key fields:Source title, CiteScore, Quartile
SNIP, SJR, % Cited, Publisher
2021–24 Citations & Documents
Join key:Source title (fuzzy)
Web of Science wos.csv · 6,653 records · 95 cols

Clarivate WoS Core Collection export. Citation data and Open Access designations.

Key fields:Article Title, Authors, Source
Times Cited (WoS Core), Document Type
ISSN, eISSN, Open Access Designations
Research Area, Publication Year
Join key:Accession Number
InCites incites.csv · 6,653 records · 22 cols

Clarivate InCites export matched 1-to-1 with WoS. Normalised impact metrics.

Key fields:CNCI — Category Normalised Citation Impact
JNCI — Journal Normalised Citation Impact
Percentile in Subject Area
Collaboration Type, Research Area
Join key:Accession Number
JCR / JIF jif.csv · 29,270 journals · 18 cols

Clarivate JCR 2024. Assigns JIF value and Q1–Q4 quartile to WoS publications via ISSN/eISSN.

Key fields:JIF 2024, 5-Year JIF, JIF Quartile
ISSN, eISSN, Journal Name
Total Cites, Cited/Citing Half-Life
Join key:ISSN / eISSN
Discontinued discontinued.csv · 893 journals

Journals removed from Scopus indexing. Publications matched here are flagged is_discontinued = True.

Key fields:Source Title, ISSN, EISSN
Reason for Re-evaluation, Year
Join key:ISSN / EISSN (8-digit normalised)
MyCite mycite.csv · 416 journals

Malaysian Citation Centre (MCC) accredited journal list. Matched publications flagged is_malaysian = True.

Key fields:Title, ISSN, EISSN, Publisher
Bidang (field), Indexing, OA status
Join key:ISSN / EISSN (8-digit normalised)

Data Processing Pipeline

Scopus Pipeline
1. Load & Normalisescopus.csv and scival.csv loaded; EID normalised (lowercase, stripped whitespace).
2. Merge — SciVal merged into Scopus on EID (left join). SciVal contributes FWCI, topics, collaboration type, and subject classifications.
3. Publisher standardisation — Names mapped via scopus_active_sources.csv master map plus keyword overrides (Elsevier, Springer Nature, MDPI, Wiley, IEEE, Taylor & Francis, etc.).
4. CiteScore quartile lookup — Source titles normalised and matched to citescore.csv; _quartile column (Q1–Q4 / No-Q) added per publication.
5. Status flags — 8-digit ISSN/EISSN matched against discontinued.csv (is_discontinued) and mycite.csv (is_malaysian).
Web of Science Pipeline
1. Loadwos.csv (95 cols) loaded; incites.csv shares the same 6,653 rows, keyed on Accession Number.
2. Citation metricsTimes Cited, WoS Core provides raw counts for C/P and % cited. InCites CNCI/JNCI give normalised impact.
3. Percentile indicators — InCites Percentile in Subject Area ≥ 99 → top-1%; ≥ 90 → top-10%; ≥ 75 → top-25%.
4. JIF quartile lookup — WoS ISSN and eISSN matched against jif.csv; JIF 2024 value and JIF Quartile (Q1–Q4) assigned per publication.
5. Open Access flag — Any non-empty, non-zero Open Access Designations value treated as OA.
UTP Affiliation Detection

UTP authors are identified from the Author full names and Authors with affiliations columns in scopus.csv. Each semicolon-delimited author entry is checked for the keywords “Universiti Teknologi Petronas” or “PETRONAS” in the affiliation string. A regex extracts the numeric Scopus Author ID from the name field; the count of unique IDs becomes the UTP Authors metric.

This heuristic may under-count authors with abbreviated or missing affiliation strings, and may include visiting or dual-affiliation researchers who have UTP listed as a secondary affiliation.

Metric Definitions

Scopus / SciVal
FWCIField-Weighted Citation Impact. Actual citations ÷ expected citations for same type, year, and subject. Benchmark = 1.00.
C/PTotal citations ÷ total publications.
C/CPTotal citations ÷ number of cited publications only.
CiteScore QuartileQ1–Q4 assigned per publication via source title match to CiteScore 2024. Q1 = top 25% in subject.
UTP AuthorsUnique Scopus Author IDs with a UTP/PETRONAS affiliation string across all publications.
CAGRCompound Annual Growth Rate of publication count, first to last year.
WoS / InCites / JCR
CNCICategory Normalised Citation Impact. Normalised by document type, year, and research area. Benchmark = 1.00.
JNCIJournal Normalised Citation Impact. Normalised relative to the journal’s expected citation rate.
Top 1% / 10%Papers ranking in the top global percentile by subject area (via InCites Percentile in Subject Area).
Avg JIFMean Journal Impact Factor (JCR 2024) for publications matched to a JIF-ranked journal via ISSN.
JIF QuartileQ1–Q4 per publication via ISSN/eISSN lookup in JCR 2024. Q1 = top 25%.
Open AccessAny publication with a non-empty Open Access Designations value in the WoS export.
Limitations & Caveats
  • Scopus and WoS publication counts differ due to separate indexing scope — they are not directly comparable.
  • Author disambiguation relies on Scopus Author IDs; ID merges or name variants can cause under/over-counting.
  • Publisher standardisation uses a keyword map; unrecognised variants appear as separate publishers.
  • CiteScore quartile uses fuzzy title matching; journals with special characters or variant titles may be unmatched.
  • JIF assignment uses ISSN/eISSN lookup; publications in journals absent from JCR 2024 receive no JIF.
  • CNCI/JNCI values reflect the InCites export snapshot and change as citations are updated on the platform.
  • Discontinued and MyCite flags require valid 8-digit ISSNs; malformed or missing ISSNs will not be flagged.
  • All figures are for internal analysis only — users should verify critical numbers against primary platform dashboards.
v2.0.0 Last updated: 27 May 2026
Built with Flask · Plotly · Bootstrap 5 · DataTables · Font Awesome  |  Dev. by Aidi Ahmi