PersADE Statistics

Comprehensive Analytics of association between drug and adverse events

Statistical Overview

PersADE is a newly developed online database that integrates comprehensive multidimensional information on the associations of drug and adverse event (AE). It provides richly curated drug-ADE annotations, linking chemical structures, molecular targets, dosage records, and patient demographics.

Drug-Targe-ADE associations

  • Drug-target-ADE associations194,874
  • Drug-target interactions108,677
  • ADE-target associations31,756
  • Proteins6,980

Personalized Drug-ADE associations

  • Personalized associations 4,061,772
  • Disease-stratified associations 1,277,009
  • Route-defined associations 1,649,311
  • Formulation-dependent associations 1,135,452

Target-center associations

  • Pharmacological Targets2,066
  • Drug-target interactions with affinity17,208
  • KEGG pathways348
  • MSigDB pathways6,191

DTA Confidence score framework

  • Drug-target-ADE associations194,874
  • Valid-associations2,617
  • High-confidence associations10,668
  • Middle-confidence associations181,589

General Drug-ADE associations

  • General associations2,201,568
  • Drug-like compounds10,235
  • Drugs3,651
  • Adverse events 19,991

ADE Confidence score framework

  • Patient-level records37,923,032
  • Patients11,500,749
  • High-confidence associations316,144
  • Middle-confidence associations1,339,808
  • Low-confidence associations4,056,728

Data Visualization

Drug Safety Data Flow

Data Source
Classification
Sub Class
Confidence Level
Safety Level
37.9M
Patient Records
11.5M
Patients
6.2M
Associations
10,235
Drug-like Compounds
19,991
Adverse Events

Hierarchical Data Analysis

Interactive Sunburst Chart showing three-level categorical hierarchy

10
Main Categories
24
Subcategories
79
Specific Types
Color Themes:

Category Colors

Overview of Outcome and Patient Age in the Reports

Gender Distribution

Age Distribution

Rechallenge Reports (2017-2022)

Dechallenge Reports (2017-2022)

Confidence Score Framework

Quantitative assessment framework for evaluating the reliability and confidence of drug-ADE associations, integrating statistical significance, association strength, and clinical evidence robustness.

Core Confidence Formula

Confidence Score $$\frac{S_{P} + S_{strength} + S_{case}}{3}$$ where each component is normalized to [0,10] scale

This integrated formula combines statistical significance, association strength, and case evidence to generate quantitative confidence assessments for drug-ADE relationships.

Statistical Components

$$S_{strength} = \frac{\sqrt{ROR} + \sqrt{PRR}}{2}$$
ROR: Reporting Odds Ratio
PRR: Proportional Reporting Ratio
P-value: Statistical significance
Cases: Clinical evidence volume

Score Transformation & Normalization

P-value Score: $S_P = -\log_{10}(P_{adj})$ with special handling for P=0 → Score=10
ROR Score: $S_{ROR} = \sqrt{ROR}$ (truncated at threshold=10)
PRR Score: $S_{PRR} = \sqrt{PRR}$ (truncated at threshold=10)
Case Score: $S_{case} = \sqrt{N_{cases}}$ (truncated at threshold=7)
Min-Max Normalization: Each component scaled to [0,10] range

Confidence Assessment Methodology

1

Raw Score Calculation

Transform input parameters using appropriate mathematical functions: $$S_{P}^{raw} = \begin{cases} 10 & \text{if } P_{adj} = 0 \\ -\log_{10}(P_{adj}) & \text{if } P_{adj} > 0 \end{cases}$$ $$S_{ROR}^{raw} = \sqrt{ROR}, \quad S_{PRR}^{raw} = \sqrt{PRR}, \quad S_{case}^{raw} = \sqrt{N_{cases}}$$

2

Association Strength Integration

Combine ROR and PRR scores to create a unified association strength measure: $$S_{strength}^{trunc} = \frac{S_{ROR}^{trunc} + S_{PRR}^{trunc}}{2}$$ Handle missing values by using available component when only one metric is present.

3

Min-Max Normalization

Normalize each component to [0,10] scale for fair integration: $$S_k^{norm} = \frac{S_k^{trunc} - \min(S_k^{trunc})}{\max(S_k^{trunc}) - \min(S_k^{trunc})} \times 10$$ Ensures equal weight contribution regardless of original scale differences.

4

Threshold Truncation

Apply fixed thresholds to prevent extreme outliers from distorting the scoring: $$S_k^{trunc} = \min(S_k^{raw}, T_k)$$ where $T_P = 3$, $T_{ROR} = 10$, $T_{PRR} = 10$, $T_{case} = 7$

5

Final Confidence Score

Calculate the final confidence score as the arithmetic mean of normalized components: $$\text{Confidence Score} = \frac{S_P^{norm} + S_{strength}^{norm} + S_{case}^{norm}}{3}$$ Missing components are excluded from the averaging to maintain score validity.

Confidence Level Classification

$$\text{Confidence Level} = \begin{cases} \text{Low} & \text{if } 0 \leq \text{Score} < 3.33 \\ \text{Medium} & \text{if } 3.33 \leq \text{Score} < 6.67 \\ \text{High} & \text{if } 6.67 \leq \text{Score} \leq 10.00 \end{cases}$$
Low Confidence
0 ≤ Score < 3.33
Limited statistical evidence, requires further validation
Medium Confidence
3.33 ≤ Score < 6.67
Moderate evidence strength, suitable for hypothesis generation
High Confidence
6.67 ≤ Score ≤ 10.00
Strong statistical support, suitable for clinical decision support

Special Handling Rules & Quality Assurance

Zero P-value Rule: When Adjust-P = 0, P-score receives maximum value (10)
Risk Signal Filtering: ROR ≤ 1 and PRR ≤ 1 associations are excluded (no risk signal)
Missing Data Handling: Final score calculated using available components only
Outlier Protection: Fixed thresholds prevent extreme values from distorting scores
Equal Weight Design: Each dimension contributes 33.3% to final confidence assessment
Vectorized Processing: High-performance computation for large-scale datasets

Mathematical Notation & Variables

$P_{adj}$: Adjusted p-value (FDR corrected)
$ROR$: Reporting Odds Ratio
$PRR$: Proportional Reporting Ratio
$N_{cases}$: Number of case reports
$S_k^{raw}$: Raw score for component k
$S_k^{trunc}$: Truncated score for component k
$S_k^{norm}$: Normalized score for component k
$T_k$: Truncation threshold for component k

Framework Performance & Validation

Processing Speed: ~2.45ms per 1,000 associations (vectorized computation)
Memory Efficiency: Optimized for datasets with >6M drug-ADE pairs
Scalability: Linear complexity O(n) for dataset size n
Reproducibility: Deterministic scoring with fixed thresholds
Cross-validation: Validated against expert-curated reference standards
Clinical Relevance: Correlation with known drug safety profiles (r > 0.85)

ADE Severity Calculator

Advanced computational framework for quantitative assessment of adverse drug reaction severity, integrating patient outcome data, statistical analysis, and clinical risk stratification.

Core Severity Formula

Severity Score $$\sum_{k}^{}\frac{W_{k} \times P_{k}}{\log_{2}{(\text{Pen}_{k} + 1) \times (1 + e^{- \log_{2}{(\text{ROR})}})}}$$

This comprehensive formula integrates outcome weights, probabilities, penalty factors, and reporting odds ratios to generate quantitative severity assessments.

Reporting Odds Ratio (ROR)

$$\text{ROR} = \frac{a \times d}{b \times c}$$
a: Drug + ADE reports
b: Drug + Non-ADE reports
c: Non-drug + ADE reports
d: Non-drug + Non-ADE reports

Outcome Classification Parameters

RI (Required Intervention to Prevent): Penalty=5, Weight=1
HO (Hospitalization): Penalty=4, Weight=2
DS (Disability): Penalty=3, Weight=3
LT (Life-Threatening): Penalty=2, Weight=4
DE (Death): Penalty=1, Weight=5

Calculation Steps & Methodology

1

Data Preprocessing

Select most severe outcome per patient-drug-ADE-route combination and remove duplicate entries to ensure data integrity.

2

Conditional Probability Calculation

Calculate conditional probability: $$P\left( \text{Outcome}_{k}|\text{Drug}, \text{ADE}, X \right) = \frac{N_{k}}{N_{\text{total}}}$$ to determine outcome probabilities given specific drug-ADE combinations.

3

ROR Computation

Apply the reporting odds ratio formula using contingency table analysis to quantify the association strength between drug and adverse event.

4

Total Severity Score

Integrate all parameters using the core formula to generate a comprehensive severity score that accounts for outcome weights, probabilities, and statistical significance.

5

Grade Assignment

Map calculated scores to standardized severity grades using predefined thresholds for clinical interpretation and risk stratification.

Severity Grade Classification

$$\text{Grade} = \begin{cases} \text{Minimal} & \text{if } 0 \leq \text{Severity Score} < 0.387 \\ \text{Mild} & \text{if } 0.387 \leq \text{Severity Score} < 0.861 \\ \text{Moderate} & \text{if } 0.861 \leq \text{Severity Score} < 1.500 \\ \text{Severe} & \text{if } 1.500 \leq \text{Severity Score} < 2.524 \\ \text{Critical} & \text{if } 2.524 \leq \text{Severity Score} \leq 5.000 \end{cases}$$
Minimal
0 ≤ Score < 0.387
Mild
0.387 ≤ Score < 0.861
Moderate
0.861 ≤ Score < 1.500
Severe
1.500 ≤ Score < 2.524
Critical
2.524 ≤ Score ≤ 5.000

Mathematical Notation Reference

$k \in \{RI,HO,DS,LT,DE\}$: outcome types
$P_{k}$: probability of outcome k
$W_{k}$: weight for outcome k
$\text{Pen}_{k}$: penalty value for outcome k
$N_{k}$: count of outcome k
$N_{\text{total}}$: total report count
$a, b, c, d$: contingency table cells
$\text{ROR}$: reporting odds ratio

Data Sources

PersADE integrates data from multiple authoritative sources to provide a systematic and comprehensive view of adverse drug events, drug–target interactions, and patient-specific information. The database employs rigorous data cleaning, standardization, and validation procedures to ensure accuracy and reliability.

PubMed

Drug-ADE, ADE-protein, and drug-protein associations systematically mined from biomedical literature, enriching data coverage and evidential robustness for adverse drug event analysis and molecular mechanism elucidation.

115K+ References
2,750 Drug-ADE-Protein

FDA Adverse Event Reporting System (FAERS)

Contains information on adverse event and medication error reports submitted to FDA, used for safety surveillance of drug and therapeutic biologic products.

37.1M+ Records
2004-2023 Coverage

Canada Vigilance Adverse Reaction (CVAROD)

Health Canada's post-market surveillance database that collects reports of suspected adverse reactions to health products marketed in Canada.

840K+ Records
1974-2023 Coverage

PubChem and DrugBank

Comprehensive pharmacological data repository containing detailed drug profiles—chemical structures, therapeutic indications and classifications—used for structural elucidation and mechanism-of-action research.

10,300 Drug-like compounds
3,655 Drug

Unified Medical Language System (UMLS)

Biomedical terminology system offering unified concept identifiers and semantic mappings for standardized representation and classification of adverse events, facilitating cross-dataset comparability and computational analysis.

23,026 ADE Terms
4,861 Tree number

UniProt

Comprehensive protein sequence and functional information database providing detailed annotations for drug targets, including protein structure, function, and interaction data essential for therapeutic target identification and drug discovery research.

7,138 Proteins
4,445 3D structures

Data Analysis Methodology

Our approach to analyzing and processing adverse drug event data involves multiple computational and statistical methods to ensure data quality, reliability, and clinical relevance.

1

Data Collection & Integration

Raw reports are retrieved via APIs from multiple sources such as FAERS and CVAROD. Drug–ADE, ADE–protein, and drug–protein associations are extracted through large-scale text mining from the PubMed literature database.

2

Data Cleaning & Standardization

Drugs are mapped to InChIKey sequences, and adverse events are aligned to UMLS Concept Unique Identifiers (CUIs). Duplicate reports and entries with incomplete or structurally inconsistent information are excluded.

3

Statistical Signal Detection

Drug-ADE associations are evaluated using multiple disproportionality analysis methods, including:

  • Reporting Odds Ratio (ROR)
  • Proportional Reporting Ratio (PRR)
  • P-values (association significance)

Assignment of severity scores to each ADE based on patient outcome data.

4

Validation & Quality Control

The identified associations are cross-validated against the latest clinical studies, guidelines, and expert opinions in the field. Confidence intervals are calculated and reported to quantify the reliability of the results.