Abstract
Triple Negative Breast Cancer (TNBC) is a type of breast cancer with very bad prognosis. Predicting the histological grade (HG) and the lymph nodes metastasis is crucial for developing more suitable treatment strategies.
We present the main clinical and pathological variables to predict the histological grade and lymph nodes metastasis via novel machine learning techniques. These variables are currently being used for prognosis and treatment in medical practice. This analysis was performed using a database of 102 Caucasian women diagnosed with TNBC. The results were cross-validated using random simulations of this dataset.
HG was predicted with an accuracy of 93.8% using a list of 6 prognostic variables with significant implications: Ki67 expression, use of Oral contraceptives, Col11A1 expression, Col11A1 score, E-cad truncated and Tumor size. The lymph nodes metastasis was predicted with an accuracy of almost 85% using only 6 prognostic variables: Vascular invasion, Tumor size, Perineural invasion, Age at diagnosis, Ki67 expression, and Col11A1 score. This analysis also served to establish the median signatures of the groups with and without lymph node metastasis, and proved the existence of a kind of small-size tumors (around 2.15 cm) with lymph node metastasis but not showing vascular and perineural invasions and higher protein Col11A1 score. Besides, these signatures proved to be very stable.
The additional information conveyed by the prognostic variables found in these two classification problems provides new insight about the genesis and progression of this disease and can be used in medical practice to improve decisions in patient diagnosis and further treatment.
Author Contributions
Copyright© 2018
Cernea Ana, et al.
License
This work is licensed under a Creative Commons Attribution 4.0 International License.
This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Competing interests The authors have declared that no competing interests exist.
Funding Interests:
Citation:
Results
The aim of this analysis is to establish the discriminatory power of the immuno-histochemical, pathological and clinical variables for HG prediction. For that purpose, we did not use any of the three pathological variables involved in the Scarff-Bloom-Richardson definition: Mitotic count, Nuclear pleomorphism and Tubule formation. This analysis established the optimum variables networks for the HG prediction, and showed how the clinical and pathological variables influence the disease development, particularly the patients’ daily habits (oral contraceptives intake, tobacco smoking (or tobacco consumption) and alcohol consumption). We had at disposal the histological grade of 96 TNBC samples: 21 samples in HG The variables used in this classification problem are presented in Besides, we provide a simple linear regression formula to perform a fast and useful estimation of the histological grading: This regression formula has a low RMS error of 0.2, that is, estimated histological grades lower than TP = true positive; TN = true negative; FP = false positive; FN = false negative; med=median; IQR= interquartile range; std = standard deviation The samples of the TP group (HG3 correctly predicted) compared to the TN group (HG2 correctly predicted), present higher median Ki67 expression (3.0 vs 1.0), higher Col11A1 score and Col11A1 expression (2.0 vs 0.5) and higher tumor size (2.10 vs 1.50). Besides, all the samples with null Oral contraceptives intake fall in the TN group. On the other hand, the main differences between FP (samples with HG3 incorrectly assigned to HG2 class) and TP are: lower values of Ki67 (2.0 vs 3.0), no contraceptive intake for FP, lower ColA11 score and expression (1 vs 2) and Tumor size (1.0 vs 2.10). Finally, the comparison between TN and FN (samples in HG3 incorrectly predicted) shows higher Ki67 expression (1 vs 2.5), higher expression of the ColA11 protein (0.50 vs 1.5), and much higher tumor size in the FN group (1.50 cm vs 3.35 cm). This classification problem tries to predict the presence or absence of lymph nodes metastasis, without making use of the HG variable, nor any of the pathological variables involved in the Nottingham score, and unraveling other prognostic variables at disposal that could be linked to this important problem in TNBC prognosis. In this case, we have at disposal 72 samples where 27 of them had one or two lymph nodes. TP = true positive; TN = true negative; FP = false positive; FN = false negative; med=median; IQR= interquartile range; std = standard deviation The classifier has failed 11 samples, 5 of which were FP, and the other 6 were FN. The three main differences between the TP and TN groups are a positive Vascular invasion in the TP group, a higher median Tumor size of 3 cm (versus 1.5 cm in the TN group), and a lower median Age at diagnosis of 55 years in the TP group (versus 58.50 in the TN group). The main difference between TP and FP groups is the Age at diagnosis that is much higher in the FP group (67 years old vs 55). Finally,
Variable
MeanHG2
StdHG2
MeanHG3
StdHG3
FR
Accuracy (%)
Ki67 expression
1.67
0.80
2.71
0.46
1.28
72.9
AR expression
0.76
0.44
0.17
0.38
1.03
81.2
Oral contraceptives
0.00
0.00
0.33
0.47
0.50
85.4
Bcl2 expression
0.29
0.64
0.80
0.77
0.26
84.4
CK14 expression
0.24
0.54
0.72
0.78
0.26
82.3
Col11A1 score
1.33
1.71
2.73
2.50
0.21
84.4
Col11A1 intensity
0.67
0.73
1.16
0.84
0.20
84.4
E-cad truncated
0.14
0.36
0.41
0.50
0.20
90.6
Age at diagnosis
66.57
13.80
57.69
14.64
0.19
79.2
Tumor Size
1.65
0.92
2.32
1.34
0.17
81.3
Col11A1 expression
1.00
1.10
1.56
1.21
0.12
80.2
Lactation
0.95
0.22
0.80
0.40
0.11
79.2
Necrosis
1.00
0.84
1.37
0.78
0.11
80.2
Pregnancies
2.29
1.42
1.71
1.10
0.10
78.1
Tobacco Smoking
0.19
0.40
0.36
0.48
0.07
78.1
Perineural invasion
0.05
0.22
0.13
0.34
0.04
78.1
Age at Menarche
12.90
1.26
12.53
1.47
0.04
76.0
Vascular invasion
0.14
0.36
0.23
0.42
0.02
77.1
Family History (BOE)
0.71
0.46
0.61
0.49
0.02
78.1
CK5/6 expression
0.81
0.75
0.95
0.82
0.01
79.2
N
0.24
0.44
0.31
0.49
0.01
77.1
Alcohol consumption
0.10
0.30
0.12
0.33
<0.01
77.1
Age First Child
25.10
3.11
24.95
3.39
<0.01
76.0
Menopause
0.95
0.22
0.95
0.23
<0.01
76.0
p53 expression
0.71
0.46
0.72
0.45
<0.01
77.0
Family History (Cancer)
0.81
0.40
0.81
0.39
<0.01
75.0
Accuracy 93.8 %
Accuracy 92.7 %
Ki67 expression
Ki67 expression
Ki67 expression
Oral contraceptives
Oral contraceptives
Oral contraceptives
Col11A1 score
Age at diagnosis
E-cad truncated
E-cad truncated
Tumor Size
Tumor Size
Tumor Size
Perineural Inv.
Col11A1 expression
Col11A1 expression
p53 expression
Classifier's stability (%)
Median
91.7
91.7
91.7
Mean
91.6
90.2
89.7
IQR
8.3
8.3
4.2
Std
5.5
5.7
5.6
ROC analysis (%)
Sensitivity
97
96
96
Specificity
76
81
76
-434340317881000Optimum Signature
TP
TN
FP
FN
med
mean
IQR
std
med
mean
IQR
std
med
mean
IQR
std
med
mean
IQR
std
Ki67 expression
3.00
2.71
1.00
0.46
1.00
1.44
1.00
0.72
2.00
2.40
1.00
0.55
2.50
2.50
1.00
0.70
Oral contraceptives
0.00
0.34
1.00
0.48
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
Col11A1 score
2.00
2.77
6.00
2.51
0.50
1.12
2.00
1.41
1.00
2.00
3.75
2.55
1.50
1.50
3.00
2.12
E-cad truncated
0.00
0.42
1.00
0.50
0.00
0.06
0.00
0.25
0.00
0.40
1.00
0.55
0.00
0.00
0.00
0.00
Tumor size
2.10
2.29
1.60
1.34
1.50
1.68
1.60
0.95
1.00
1.54
1.07
0.92
3.35
3.35
0.30
0.21
Col11A1 expression
2.00
1.56
3.00
1.20
0.50
0.88
2.00
0.95
1.00
1.40
3.00
1.52
1.50
1.50
3.00
2.12
Variable
MeanC1
StdC1
MeanC2
StdC2
FR
Accuracy
Vascular invasion
0.48
0.51
0.09
0.29
0.45
75.0
Tumor Size
2.74
1.30
1.92
1.36
0.19
66.7
Perineural invasion
0.22
0.42
0.04
0.21
0.14
70.8
Age First Child
25.78
4.40
24.62
3.02
0.05
72.2
ck14 expression
0.78
0.75
0.58
0.72
0.04
69.4
ck5/6 expression
1.04
0.85
0.84
0.82
0.03
72.2
E-cad expression
1.00
0.00
0.98
0.15
0.02
73.6
Family History Cancer
0.89
0.32
0.82
0.39
0.02
73.6
Tobacco consumption
0.37
0.49
0.29
0.46
0.01
68.1
Necrosis
1.26
0.90
1.40
0.75
0.01
70.8
Pregnancies
1.93
1.27
2.11
0.98
0.01
65.3
Age at diagnosis
58.56
14.65
60.47
13.42
0.01
65.3
Bcl2 expression
0.63
0.74
0.73
0.81
0.01
63.9
Age at Menarche
12.48
1.28
12.62
1.25
0.01
66.7
Col11A1 intensity
0.89
0.85
0.96
0.82
0.00
65.3
Ki67 expression
2.56
0.70
2.51
0.63
0.00
65.3
Lactation
0.89
0.32
0.87
0.34
0.00
65.3
Col11A1 expression
1.26
1.23
1.20
1.10
0.00
65.3
Family History BEO
0.67
0.48
0.69
0.47
0.00
65.3
E-cad truncated
0.33
0.48
0.31
0.47
0.00
65.3
Menopause
0.96
0.19
0.96
0.21
0.00
65.3
Col11A1 score
2.04
2.38
1.96
2.15
0.00
62.3
Alcohol consumption
0.16
0.36
0.16
0.37
0.00
62.5
AR expression
0.26
0.45
0.27
0.45
0.00
62.5
p53 expression
0.70
0.47
0.71
0.46
0.00
62.53
Oral contraceptives
0.30
0.47
0.29
0.46
0.00
61.1
666115230505000Acc. 84.7 %
Acc. 83.3%
Vascular Inv.
Vascular Inv.
Vascular Inv.
Vascular invasion
Tumor Size
Tumor Size
Tumor Size
Tumor Size
Perineural Inv.
Perineural Inv.
Perineural Inv.
Necrosis
Family History Cancer
Necrosis
Necrosis
Col11A1 score
Age at diagnosis
Age at diagnosis
Col11A1 score
Alcohol consumption
Ki67 expression
Ki67 expression
AR expression
AR expression
Col11A1 score
Col11A1 score
p53 expression
p53 expression
Classifier's stability (%)
med
83.3
80.6
77.8
77.8
mean
80.6
80.4
79.3
79.5
iqr
7.6
5.6
11.1
11.1
std
5.6
7.1
7.4
7.9
ROC analysis (%)
Sensitivity
78
81
78
81
Specificity
89
84
87
84
Optimum Signature
TP
TN
FP
FN
med
mean
IQR
std
med
mean
IQR
std
med
mean
IQR
std
med
mean
IQR
std
Vascular invasion
1.00
0.57
1.00
0.50
0.00
0.07
0.00
0.26
0.00
0.20
0.25
0.44
0.00
1.17
0.00
0.40
Tumor size
3.00
2.85
0.97
1.35
1.50
1.89
1.55
1.30
1.50
2.22
1.50
1.90
2.15
2.35
0.60
1.09
Perineural invasion
0.00
0.28
1.00
0.46
0.00
0.05
0.00
0.22
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
Family history
1.00
0.95
0.00
0.21
1.00
0.80
0.00
0.40
1.00
1.00
0.00
0.00
1.00
0.67
1.00
0.51
Age at diagnosis
55.00
58.29
24.00
16.03
58.50
59.50
21.50
13.60
67.00
68.20
16.00
9.36
57.50
59.50
12.00
9.28
Ki67 expression
3.00
2.57
1.00
0.75
3.00
2.55
1.00
0.59
2.00
2.20
1.25
0.84
2.50
2.50
1.00
0.54
Col11A1 score
1.00
2.05
4.50
2.48
1.00
1.87
4.00
2.15
1.00
2.60
3.50
2.30
2.00
2.00
2.00
2.19
Discussion
Regarding the most discriminatory prognostic variables of the histological grade, it is interesting to note that women in the HG2 group did not have any Oral contraceptives intake. Population studies aimed at exploring associations between oral contraceptive use and cancer risk have shown that the risks of endometrial and ovarian cancer appear to be reduced with the use of oral contraceptives, whereas the risks of breast, cervical, and liver cancer appear to be increased The best prediction of the HG (disregarding the Nottingham grading system) was performed by a list of only 6 prognostic variables: Ki67 expression, Oral contraceptives, Col11A1 score, E-cad truncated, Tumor Size, and Col11A1 expression, with a very stable accuracy (93.8%), sensitivity (97.0%) and specificity (76.0%). Once again, the importance of Oral contraceptives in the HG prediction is highlighted. All these variables are crucial for breast cancer diagnosis and treatment The correlation network shows two main branches connecting Ki67 expression to Tumor size and AR expression, both with low correlation coefficients. Two branches start from AR through CK14 expression and E-cad truncated, both weakly correlated to the AR node with negative coefficients. In the tumor size branch, all the variables seem to be related to habits and clinical features, Age at diagnosis, Menopause, Tobacco smoking, Oral contraceptives, etc. The low correlation among all these variables implies that they should be considered as independent prognostic factors. This graphic also confirms the strong correlation between the three representations of the Col11A1 protein. The role of the Androgen Receptor in breast cancer has been reviewed by In the case of the lymph nodes metastasis, the most important variables are Vascular invasion, Tumor size, Perineural invasion, Family history, Age at diagnosis, Ki67 expression and COl11A1 score, with a high predictive accuracy (84.7%), sensitivity (78.0%) and specificity (89.0%). All the samples presenting metastasis have positive Vascular invasion (vs almost null in the non-metastasis group), a higher Tumor size mean of 2.74 cm (vs. 1.92 cm), positive Perineural invasion, highest age for first child (25.78 vs 24.62) and higher CK14 and CK5/6 expressions. The analysis of the equivalent networks with accuracies higher than 83% show high stability and a good ability for diagnostic. All these signatures share the Vascular invasion and Tumor Size as leading prognostic variables. Likewise, Col11A1 score, Perineural invasion and/or Necrosis also appear in these networks. The ROC analysis established Vascular invasion and Tumor size as the main differences between the true positive (TP) and true negative (TN) groups, and also showed the existence of a group of TNBC cancers with absence of Vascular and Perineural invasion that presents lymph nodes metastasis (FN group). This kind of cancers have a lower median Tumor size (around 2.15 cm) than the FP group, and a median Col11A1 score value of 2. This knowledge is very important to improve the prediction of Lymph Nodes Metastasis at diagnostic. The correlation network shows one main branch starting from Vascular invasion and linking to Alcohol Consumption and other personal habits (Tobacco consumption) and clinical features (Age at First Child, and Tumor Size). Again, the correlations coefficients among these variables are very low. Interestingly, the immuno-histochemical variables appear at the base of the tree, indicating their lower importance in the metastasis prediction. Finally, an interesting remark is that the HG and lymph node metastasis predictions share the Tumor size, Ki67 expression, and Col11A1 score as high discriminatory prognostic variables, confirming a certain link between both problems. Besides, Col11A1 score has a much higher predictive power than the other two representations of this protein. It is not surprising the relationships with vascular and perineural invasions, as well as with the tumor size or ki67 expression, but this analysis provides novel relationships with the expression of ColA11 protein and also with the patient's age.
Conclusion
This study was dedicated to the HG and the lymph nodes metastasis prediction, crucial for developing more suitable treatment strategies. As results, we present the main clinical and pathological variables and their correlation networks for both prediction problems, via novel machine learning techniques. These variables are currently being used for prognosis and treatment in medical practice. HG was predicted with an accuracy of 93.8% using a list of 6 prognostic variables with significant implications: Ki67 expression, use of Oral contraceptives, Col11A1 expression, Col11A1 score, E-cad truncated and Tumor size. The lymph nodes metastasis was predicted with an accuracy of almost 85% using only 6 prognostic variables: Vascular invasion, Tumor size, Perineural invasion, Age at diagnosis, Ki67 expression, and Col11A1 score. This analysis also served to establish the median signatures of the groups with and without lymph node metastasis, and proved the existence of a kind of small-size tumors (around 2.15 cm) with lymph node metastasis but not showing vascular and perineural invasions and higher protein Col11A1 score. Besides, these signatures proved to be very stable. The additional information conveyed by the prognostic variables found in these two classification problems provides new insight about the genesis and progression of this disease and can be used in medical practice to improve decisions in patient diagnosis and further treatment. We expect that the conclusions attained by this analysis will contribute to improve the understanding, diagnosis and prognosis of this important type of heterogeneous cancers. This methodology could be also used to predict treatment response when this kind of information is available, as we have shown in the case of Hodgkin Lymphoma TNBC, Triple Negative Breast Cancer; HG, histological grade; ER, Estrogen Receptors; PR, Progesterone Receptors; HER2, Human Epidermal Growth factor 2 receptors; AR, androgen receptor; EMT, epithelial–mesenchymal transition; MC, Mitotic Count; Necr, necrosis; NP, Nuclear Pleomorfism; PI, Perineural invasion; TF, Tubular formation; TS, Tumor size; VI, Vascular invasion; HUCA, Hospital Universitario Central de Asturias; TP, true positive; TN, true negative; IQR, interquartile range; LOOCV, Leave-One-Out Cross-Validation; ROC, Receiver Operating Characteristic; FR, Fisher’s ratio.