Abstract
Bioinformatic techniques targeting gene expression data require specific analysis pipelines with the aim of studying properties, adaptation, and disease outcomes in a sample population. Present investigation compared together results of four numerical experiments modeling survival rates from bladder cancer genetic profiles. Research showed that a sequence of two discretization phases produced remarkable results compared to a classic approach employing one discretization of gene expression data. Analysis involving two discretization phases consisted of a primary discretizer followed by refinement or pre-binning input values before the main discretization scheme. Among all tests, the best model encloses a sequence of data transformation to compensate skewness, data discretization phase with class-attribute interdependence maximization algorithm, and final classification by voting feature intervals, a classifier that also provides discrete interval optimization.
| Original language | English |
|---|---|
| Pages (from-to) | 29-47 |
| Number of pages | 19 |
| Journal | Communications in Applied and Industrial Mathematics |
| Volume | 12 |
| Issue number | 1 |
| DOIs | |
| Publication status | Published - 1 Jan 2021 |
Keywords
- Bladder cancer
- Data-driven biomarker research
- Discretization
- Genetic expression
- Machine learning
- Survival rate modeling