PersADE Statistics

Comprehensive Analytics of association between drug and adverse events

Statistical Overview

PersADE is a newly developed online database that integrates comprehensive multidimensional information on the associations of drug and adverse event (AE). It provides richly curated drug-ADE annotations, linking chemical structures, molecular targets, dosage records, and patient demographics.

Drug-Targe-ADE associations

Drug-target-ADE associations194,874
Drug-target interactions108,677
ADE-target associations31,756
Proteins6,980

Personalized Drug-ADE associations

Personalized associations 4,061,772
Disease-stratified associations 1,277,009
Route-defined associations 1,649,311
Formulation-dependent associations 1,135,452

Target-center associations

Pharmacological Targets2,066
Drug-target interactions with affinity17,208
KEGG pathways348
MSigDB pathways6,191

DTA Confidence score framework

Drug-target-ADE associations194,874
Valid-associations2,617
High-confidence associations10,668
Middle-confidence associations181,589

General Drug-ADE associations

General associations2,201,568
Drug-like compounds10,235
Drugs3,651
Adverse events 19,991

ADE Confidence score framework

Patient-level records37,923,032
Patients11,500,749
High-confidence associations316,144
Middle-confidence associations1,339,808
Low-confidence associations4,056,728

Data Visualization

Drug Safety Data Flow

Data Source

Classification

Sub Class

Confidence Level

Safety Level

37.9M

Patient Records

11.5M

Patients

6.2M

Associations

10,235

Drug-like Compounds

19,991

Adverse Events

Hierarchical Data Analysis

Interactive Sunburst Chart showing three-level categorical hierarchy

Main Categories

Subcategories

Specific Types

Color Themes:

Category Colors

Overview of Outcome and Patient Age in the Reports

Gender Distribution

Age Distribution

Rechallenge Reports (2017-2022)

Dechallenge Reports (2017-2022)

Confidence Score Framework

Quantitative assessment framework for evaluating the reliability and confidence of drug-ADE associations, integrating statistical significance, association strength, and clinical evidence robustness.

Core Confidence Formula

Confidence Score $$\frac{S_{P} + S_{strength} + S_{case}}{3}$$ where each component is normalized to [0,10] scale

This integrated formula combines statistical significance, association strength, and case evidence to generate quantitative confidence assessments for drug-ADE relationships.

Statistical Components

$$S_{strength} = \frac{\sqrt{ROR} + \sqrt{PRR}}{2}$$

ROR: Reporting Odds Ratio

PRR: Proportional Reporting Ratio

P-value: Statistical significance

Cases: Clinical evidence volume

Score Transformation & Normalization

P-value Score: $S_P = -\log_{10}(P_{adj})$ with special handling for P=0 → Score=10

ROR Score: $S_{ROR} = \sqrt{ROR}$ (truncated at threshold=10)

PRR Score: $S_{PRR} = \sqrt{PRR}$ (truncated at threshold=10)

Case Score: $S_{case} = \sqrt{N_{cases}}$ (truncated at threshold=7)

Min-Max Normalization: Each component scaled to [0,10] range

Confidence Assessment Methodology

Raw Score Calculation

Transform input parameters using appropriate mathematical functions: $$S_{P}^{raw} = \begin{cases} 10 & \text{if } P_{adj} = 0 \\ -\log_{10}(P_{adj}) & \text{if } P_{adj} > 0 \end{cases}$$ $$S_{ROR}^{raw} = \sqrt{ROR}, \quad S_{PRR}^{raw} = \sqrt{PRR}, \quad S_{case}^{raw} = \sqrt{N_{cases}}$$

Association Strength Integration

Combine ROR and PRR scores to create a unified association strength measure: $$S_{strength}^{trunc} = \frac{S_{ROR}^{trunc} + S_{PRR}^{trunc}}{2}$$ Handle missing values by using available component when only one metric is present.

Min-Max Normalization

Normalize each component to [0,10] scale for fair integration: $$S_k^{norm} = \frac{S_k^{trunc} - \min(S_k^{trunc})}{\max(S_k^{trunc}) - \min(S_k^{trunc})} \times 10$$ Ensures equal weight contribution regardless of original scale differences.

Threshold Truncation

Apply fixed thresholds to prevent extreme outliers from distorting the scoring: $$S_k^{trunc} = \min(S_k^{raw}, T_k)$$ where $T_P = 3$, $T_{ROR} = 10$, $T_{PRR} = 10$, $T_{case} = 7$

Final Confidence Score

Calculate the final confidence score as the arithmetic mean of normalized components: $$\text{Confidence Score} = \frac{S_P^{norm} + S_{strength}^{norm} + S_{case}^{norm}}{3}$$ Missing components are excluded from the averaging to maintain score validity.

Confidence Level Classification

$$\text{Confidence Level} = \begin{cases} \text{Low} & \text{if } 0 \leq \text{Score} < 3.33 \\ \text{Medium} & \text{if } 3.33 \leq \text{Score} < 6.67 \\ \text{High} & \text{if } 6.67 \leq \text{Score} \leq 10.00 \end{cases}$$

Low Confidence

0 ≤ Score < 3.33

Limited statistical evidence, requires further validation

Medium Confidence

3.33 ≤ Score < 6.67

Moderate evidence strength, suitable for hypothesis generation

High Confidence

6.67 ≤ Score ≤ 10.00

Strong statistical support, suitable for clinical decision support

Special Handling Rules & Quality Assurance

Zero P-value Rule: When Adjust-P = 0, P-score receives maximum value (10)

Risk Signal Filtering: ROR ≤ 1 and PRR ≤ 1 associations are excluded (no risk signal)

Missing Data Handling: Final score calculated using available components only

Outlier Protection: Fixed thresholds prevent extreme values from distorting scores

Equal Weight Design: Each dimension contributes 33.3% to final confidence assessment

Vectorized Processing: High-performance computation for large-scale datasets

Mathematical Notation & Variables

$P_{adj}$: Adjusted p-value (FDR corrected)

$ROR$: Reporting Odds Ratio

$PRR$: Proportional Reporting Ratio

$N_{cases}$: Number of case reports

$S_k^{raw}$: Raw score for component k

$S_k^{trunc}$: Truncated score for component k

$S_k^{norm}$: Normalized score for component k

$T_k$: Truncation threshold for component k

Framework Performance & Validation

Processing Speed: ~2.45ms per 1,000 associations (vectorized computation)

Memory Efficiency: Optimized for datasets with >6M drug-ADE pairs

Scalability: Linear complexity O(n) for dataset size n

Reproducibility: Deterministic scoring with fixed thresholds

Cross-validation: Validated against expert-curated reference standards

Clinical Relevance: Correlation with known drug safety profiles (r > 0.85)

ADE Severity Calculator

Advanced computational framework for quantitative assessment of adverse drug reaction severity, integrating patient outcome data, statistical analysis, and clinical risk stratification.

Core Severity Formula

Severity Score $$\sum_{k}^{}\frac{W_{k} \times P_{k}}{\log_{2}{(\text{Pen}_{k} + 1) \times (1 + e^{- \log_{2}{(\text{ROR})}})}}$$

This comprehensive formula integrates outcome weights, probabilities, penalty factors, and reporting odds ratios to generate quantitative severity assessments.

Reporting Odds Ratio (ROR)

$$\text{ROR} = \frac{a \times d}{b \times c}$$

a: Drug + ADE reports

b: Drug + Non-ADE reports

c: Non-drug + ADE reports

d: Non-drug + Non-ADE reports

Outcome Classification Parameters

RI (Required Intervention to Prevent): Penalty=5, Weight=1

HO (Hospitalization): Penalty=4, Weight=2

DS (Disability): Penalty=3, Weight=3

LT (Life-Threatening): Penalty=2, Weight=4

DE (Death): Penalty=1, Weight=5

Calculation Steps & Methodology

Data Preprocessing

Select most severe outcome per patient-drug-ADE-route combination and remove duplicate entries to ensure data integrity.

Conditional Probability Calculation

Calculate conditional probability: $$P\left( \text{Outcome}_{k}|\text{Drug}, \text{ADE}, X \right) = \frac{N_{k}}{N_{\text{total}}}$$ to determine outcome probabilities given specific drug-ADE combinations.

ROR Computation

Apply the reporting odds ratio formula using contingency table analysis to quantify the association strength between drug and adverse event.

Total Severity Score

Integrate all parameters using the core formula to generate a comprehensive severity score that accounts for outcome weights, probabilities, and statistical significance.

Grade Assignment

Map calculated scores to standardized severity grades using predefined thresholds for clinical interpretation and risk stratification.

Severity Grade Classification

$$\text{Grade} = \begin{cases} \text{Minimal} & \text{if } 0 \leq \text{Severity Score} < 0.387 \\ \text{Mild} & \text{if } 0.387 \leq \text{Severity Score} < 0.861 \\ \text{Moderate} & \text{if } 0.861 \leq \text{Severity Score} < 1.500 \\ \text{Severe} & \text{if } 1.500 \leq \text{Severity Score} < 2.524 \\ \text{Critical} & \text{if } 2.524 \leq \text{Severity Score} \leq 5.000 \end{cases}$$

Minimal

0 ≤ Score < 0.387

Mild

0.387 ≤ Score < 0.861

Moderate

0.861 ≤ Score < 1.500

Severe

1.500 ≤ Score < 2.524

Critical

2.524 ≤ Score ≤ 5.000

Mathematical Notation Reference

$k \in \{RI,HO,DS,LT,DE\}$: outcome types

$P_{k}$: probability of outcome k

$W_{k}$: weight for outcome k

$\text{Pen}_{k}$: penalty value for outcome k

$N_{k}$: count of outcome k

$N_{\text{total}}$: total report count

$a, b, c, d$: contingency table cells

$\text{ROR}$: reporting odds ratio

Data Sources

PersADE integrates data from multiple authoritative sources to provide a systematic and comprehensive view of adverse drug events, drug–target interactions, and patient-specific information. The database employs rigorous data cleaning, standardization, and validation procedures to ensure accuracy and reliability.

PubMed

Drug-ADE, ADE-protein, and drug-protein associations systematically mined from biomedical literature, enriching data coverage and evidential robustness for adverse drug event analysis and molecular mechanism elucidation.

115K+ References

2,750 Drug-ADE-Protein

FDA Adverse Event Reporting System (FAERS)

Contains information on adverse event and medication error reports submitted to FDA, used for safety surveillance of drug and therapeutic biologic products.

37.1M+ Records

2004-2023 Coverage

Canada Vigilance Adverse Reaction (CVAROD)

Health Canada's post-market surveillance database that collects reports of suspected adverse reactions to health products marketed in Canada.

840K+ Records

1974-2023 Coverage

PubChem and DrugBank

Comprehensive pharmacological data repository containing detailed drug profiles—chemical structures, therapeutic indications and classifications—used for structural elucidation and mechanism-of-action research.

10,300 Drug-like compounds

3,655 Drug

Unified Medical Language System (UMLS)

Biomedical terminology system offering unified concept identifiers and semantic mappings for standardized representation and classification of adverse events, facilitating cross-dataset comparability and computational analysis.

23,026 ADE Terms

4,861 Tree number

UniProt

Comprehensive protein sequence and functional information database providing detailed annotations for drug targets, including protein structure, function, and interaction data essential for therapeutic target identification and drug discovery research.

7,138 Proteins

4,445 3D structures

Data Analysis Methodology

Our approach to analyzing and processing adverse drug event data involves multiple computational and statistical methods to ensure data quality, reliability, and clinical relevance.

Data Collection & Integration

Raw reports are retrieved via APIs from multiple sources such as FAERS and CVAROD. Drug–ADE, ADE–protein, and drug–protein associations are extracted through large-scale text mining from the PubMed literature database.

Data Cleaning & Standardization

Drugs are mapped to InChIKey sequences, and adverse events are aligned to UMLS Concept Unique Identifiers (CUIs). Duplicate reports and entries with incomplete or structurally inconsistent information are excluded.

Statistical Signal Detection

Drug-ADE associations are evaluated using multiple disproportionality analysis methods, including:

Reporting Odds Ratio (ROR)
Proportional Reporting Ratio (PRR)
P-values (association significance)

Assignment of severity scores to each ADE based on patient outcome data.

Validation & Quality Control

The identified associations are cross-validated against the latest clinical studies, guidelines, and expert opinions in the field. Confidence intervals are calculated and reported to quantify the reliability of the results.