Journal of Medical Informatics and Decision Making

Journal of Medical Informatics and Decision Making

Current Issue Volume No: 1 Issue No: 1

Research-article Article Open Access
  • Available online freely Peer Reviewed
  • Analysis Of Clinical Prognostic Variables For Triple Negative Breast Cancer Histological Grading And Lymph Node Metastasis

    1 Group of Inverse Problems, Optimization and Machine Learning. Department of Mathematics, Universidad de Oviedo, Oviedo, Asturias, Spain 

    2 Department of Informatics, Universidad de Oviedo, Oviedo, Asturias, Spain 

    3 Institut fur Pathologie. University of Bern, Switzerland 

    4 Servicio de Anatomia Patologica, Hospital Universitario de Asturias, Oviedo, Asturias 

    Abstract

    Background:

    Triple Negative Breast Cancer (TNBC) is a type of breast cancer with very bad prognosis. Predicting the histological grade (HG) and the lymph nodes metastasis is crucial for developing more suitable treatment strategies.

    Methods:

    We present the main clinical and pathological variables to predict the histological grade and lymph nodes metastasis via novel machine learning techniques. These variables are currently being used for prognosis and treatment in medical practice. This analysis was performed using a database of 102 Caucasian women diagnosed with TNBC. The results were cross-validated using random simulations of this dataset.

    Results:

    HG was predicted with an accuracy of 93.8% using a list of 6 prognostic variables with significant implications: Ki67 expression, use of Oral contraceptives, Col11A1 expression, Col11A1 score, E-cad truncated and Tumor size. The lymph nodes metastasis was predicted with an accuracy of almost 85% using only 6 prognostic variables: Vascular invasion, Tumor size, Perineural invasion, Age at diagnosis, Ki67 expression, and Col11A1 score. This analysis also served to establish the median signatures of the groups with and without lymph node metastasis, and proved the existence of a kind of small-size tumors (around 2.15 cm) with lymph node metastasis but not showing vascular and perineural invasions and higher protein Col11A1 score. Besides, these signatures proved to be very stable.

    Conclusions:

    The additional information conveyed by the prognostic variables found in these two classification problems provides new insight about the genesis and progression of this disease and can be used in medical practice to improve decisions in patient diagnosis and further treatment.

    Author Contributions
    Received Nov 21, 2018     Accepted Dec 04, 2018     Published Dec 13, 2018

    Copyright© 2018 Cernea Ana, et al.
    License
    Creative Commons License   This work is licensed under a Creative Commons Attribution 4.0 International License. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

    Competing interests

    The authors have declared that no competing interests exist.

    Funding Interests:

    Citation:

    Cernea Ana, Luis Fernandez-Martinez Juan, J. deAndres-Galiana Enrique, A. Galvan Jose, Garcia Pravia Carmen et al. (2018) Analysis Of Clinical Prognostic Variables For Triple Negative Breast Cancer Histological Grading And Lymph Node Metastasis Journal of Medical Informatics and Decision Making. - 1(1):14-36
    DOI 10.14302/issn.2641-5526.jmid-18-2488

    Results

    Results Histological Grade Prediction

    The aim of this analysis is to establish the discriminatory power of the immuno-histochemical, pathological and clinical variables for HG prediction. For that purpose, we did not use any of the three pathological variables involved in the Scarff-Bloom-Richardson definition: Mitotic count, Nuclear pleomorphism and Tubule formation. This analysis established the optimum variables networks for the HG prediction, and showed how the clinical and pathological variables influence the disease development, particularly the patients’ daily habits (oral contraceptives intake, tobacco smoking (or tobacco consumption) and alcohol consumption). We had at disposal the histological grade of 96 TNBC samples: 21 samples in HG2 and 75 samples in HG3.

    The variables used in this classification problem are presented in Table 4, ranked by their discriminatory power given by their Fisher’s ratios in decreasing order. The maximum Fisher’s ratio (FR) is 1.28 and corresponds to Ki67 expression, followed by AR expression with a Fisher’s ratio of 1.03, and Oral contraceptives with 0.50. The rest of the variables have a lower FR and can only expand high frequency details of the classification problem 37. In this case, using the most discriminatory variable (Ki67 expression) we have obtained a LOCCV predictive accuracy of 72.9%. The accuracy has increased to 81.3by adding the second discriminatory variable (AR expression), and up to 85.4% by adding Oral contraceptives. The maximum accuracy (90.6%) is obtained using the list containing the8 first prognostic variables, which is the minimum-size list in this case. This table also shows their mean and standard deviation within each class (HG2 and HG3) and the LOOCV predictive accuracies of the corresponding ranked lists of prognostic variables, as explained in the machine learning algorithm description, and the minimum-size signature with the highest predictive accuracy. Fisher’s ratio can be interpreted as a prior discriminatory power of the variables considered individually, while the LOOCV accuracy is the posterior discriminatory power of these variables working in synergy.

    Histological grade (HG) prediction: ranked list of prognostic variables according to their Fisher ratio. C1 and C2 stand for the two classes of histological grades, HG2 and HG3, respectively. Bold faces show the maximum mean values of the variables in each group.
    Variable MeanHG2 StdHG2 MeanHG3 StdHG3 FR Accuracy (%)
    Ki67 expression 1.67 0.80 2.71 0.46 1.28 72.9
    AR expression 0.76 0.44 0.17 0.38 1.03 81.2
    Oral contraceptives 0.00 0.00 0.33 0.47 0.50 85.4
    Bcl2 expression 0.29 0.64 0.80 0.77 0.26 84.4
    CK14 expression 0.24 0.54 0.72 0.78 0.26 82.3
    Col11A1 score 1.33 1.71 2.73 2.50 0.21 84.4
    Col11A1 intensity 0.67 0.73 1.16 0.84 0.20 84.4
    E-cad truncated 0.14 0.36 0.41 0.50 0.20 90.6
    Age at diagnosis 66.57 13.80 57.69 14.64 0.19 79.2
    Tumor Size 1.65 0.92 2.32 1.34 0.17 81.3
    Col11A1 expression 1.00 1.10 1.56 1.21 0.12 80.2
    Lactation 0.95 0.22 0.80 0.40 0.11 79.2
    Necrosis 1.00 0.84 1.37 0.78 0.11 80.2
    Pregnancies 2.29 1.42 1.71 1.10 0.10 78.1
    Tobacco Smoking 0.19 0.40 0.36 0.48 0.07 78.1
    Perineural invasion 0.05 0.22 0.13 0.34 0.04 78.1
    Age at Menarche 12.90 1.26 12.53 1.47 0.04 76.0
    Vascular invasion 0.14 0.36 0.23 0.42 0.02 77.1
    Family History (BOE) 0.71 0.46 0.61 0.49 0.02 78.1
    CK5/6 expression 0.81 0.75 0.95 0.82 0.01 79.2
    N 0.24 0.44 0.31 0.49 0.01 77.1
    Alcohol consumption 0.10 0.30 0.12 0.33 <0.01 77.1
    Age First Child 25.10 3.11 24.95 3.39 <0.01 76.0
    Menopause 0.95 0.22 0.95 0.23 <0.01 76.0
    p53 expression 0.71 0.46 0.72 0.45 <0.01 77.0
    Family History (Cancer) 0.81 0.40 0.81 0.39 <0.01 75.0

    Table 5 shows the optimum classifier found by the random sampler with an accuracy of 93.8% using a list of only 6 prognostic variables: Ki67 expression, Oral contraceptives, Col11A1 score, E-cad truncated, Tumor Size, and Col11A1 expression and other networks of high discriminatory prognostic variables with a LOOCV predictive accuracy higher than 92%, together with their corresponding stability analysis and ROC analysis. Besides, these high predictive classifiers are very stable, with median accuracies of 91.7% and mean accuracies slightly lower, a low inter-quartile range (8.3) and the standard deviation (5.5) of the predictive accuracy. Subsequently, the ROC analysis shows a very high sensitivity (97%) and specificity (76%).

    HG prediction. Other high discriminatory networks with LOOCV predictive accuracies higher than 92% with their corresponding stability and ROC analysis.
     Accuracy 93.8 % Accuracy 92.7 %
    Ki67 expression Ki67 expression Ki67 expression
    Oral contraceptives Oral contraceptives Oral contraceptives
    Col11A1 score Age at diagnosis E-cad truncated
    E-cad truncated Tumor Size Tumor Size
    Tumor Size Perineural Inv. Col11A1 expression
    Col11A1 expression p53 expression  
    Classifier's stability (%)
    Median 91.7 91.7 91.7
    Mean 91.6 90.2 89.7
    IQR 8.3 8.3 4.2
    Std 5.5 5.7 5.6
    ROC analysis (%)
    Sensitivity 97 96 96
    Specificity 76 81 76

    Besides, we provide a simple linear regression formula to perform a fast and useful estimation of the histological grading:

    This regression formula has a low RMS error of 0.2, that is, estimated histological grades lower than 23 belong almost surely to HG2. This method complements the HG assessment provided by the Nottingham grading system in appraising this important decision problem concerning the patient treatment and prognosis.

    Table 6 shows the main statistical results (median, mean, interquartile range, and standard deviation) of each predictive variables of the optimum classifier, calculated in the different groups of the confusion matrix (TP, TN, FP, and FN). The confusion matrix corresponding to the optimum classifier is

    HG prediction. Median, mean, IQR, and standard deviation of the signatures of the most predictive variables in the different groups of the confusion matrix (TP, FP, TN and FN).
    -434340317881000Optimum Signature TP TN FP FN
      med mean IQR std med mean IQR std med mean IQR std med mean IQR std
    Ki67 expression 3.00 2.71 1.00 0.46 1.00 1.44 1.00 0.72 2.00 2.40 1.00 0.55 2.50 2.50 1.00 0.70
    Oral contraceptives 0.00 0.34 1.00 0.48 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
    Col11A1 score 2.00 2.77 6.00 2.51 0.50 1.12 2.00 1.41 1.00 2.00 3.75 2.55 1.50 1.50 3.00 2.12
    E-cad truncated 0.00 0.42 1.00 0.50 0.00 0.06 0.00 0.25 0.00 0.40 1.00 0.55 0.00 0.00 0.00 0.00
    Tumor size 2.10 2.29 1.60 1.34 1.50 1.68 1.60 0.95 1.00 1.54 1.07 0.92 3.35 3.35 0.30 0.21
    Col11A1 expression 2.00 1.56 3.00 1.20 0.50 0.88 2.00 0.95 1.00 1.40 3.00 1.52 1.50 1.50 3.00 2.12

    The samples of the TP group (HG3 correctly predicted) compared to the TN group (HG2 correctly predicted), present higher median Ki67 expression (3.0 vs 1.0), higher Col11A1 score and Col11A1 expression (2.0 vs 0.5) and higher tumor size (2.10 vs 1.50). Besides, all the samples with null Oral contraceptives intake fall in the TN group. On the other hand, the main differences between FP (samples with HG3 incorrectly assigned to HG2 class) and TP are: lower values of Ki67 (2.0 vs 3.0), no contraceptive intake for FP, lower ColA11 score and expression (1 vs 2) and Tumor size (1.0 vs 2.10). Finally, the comparison between TN and FN (samples in HG3 incorrectly predicted) shows higher Ki67 expression (1 vs 2.5), higher expression of the ColA11 protein (0.50 vs 1.5), and much higher tumor size in the FN group (1.50 cm vs 3.35 cm).

    Figure 2 shows the correlation network for the HG prediction problem and serves to provide the relationships between the most discriminatory variables.

    Histological Grade prediction. Correlation network among the most discriminatory prognostic variables.
    Lymph Nodes Metastasis Prediction

    This classification problem tries to predict the presence or absence of lymph nodes metastasis, without making use of the HG variable, nor any of the pathological variables involved in the Nottingham score, and unraveling other prognostic variables at disposal that could be linked to this important problem in TNBC prognosis. In this case, we have at disposal 72 samples where 27 of them had one or two lymph nodes. Table 7 shows the information concerning the ranked lists of prognostic variables used in the lymph nodes metastasis prediction problem. The maximum Fisher’s ratio in the Lymph Nodes Metastasis prediction is 0.45 and corresponds to Vascular invasion, followed by Tumor Size (0.19), and Perineural invasion (0.14), meanwhile the rest of variables show a very low FR (close to zero). Due to these low Fisher’s ratios, it is expected that this classification problem will be harder in terms of achieving a high predictive accuracy. The maximum accuracy (75%) is provided by the Vascular invasion alone. Then, the LOOCV accuracy drops to 73.61% considering the list of the first seven most discriminatory variables: Vascular invasion, Tumor Size, Perineural invasion, Age First Child, CK14 expression, CK5/6 expression, and E-cad expression. This accuracy remains the same when we also add to the list the Family history.

    Lymph Nodes Metastasis prediction: ranked list of prognostic variables according to their Fisher ratio. C1 and C2 represent the two classes of metastasis prediction, C1: positive number of lymph nodes, C2: no lymph nodes.
    Variable MeanC1 StdC1 MeanC2 StdC2 FR Accuracy
    Vascular invasion 0.48 0.51 0.09 0.29 0.45 75.0
    Tumor Size 2.74 1.30 1.92 1.36 0.19 66.7
    Perineural invasion 0.22 0.42 0.04 0.21 0.14 70.8
    Age First Child 25.78 4.40 24.62 3.02 0.05 72.2
    ck14 expression 0.78 0.75 0.58 0.72 0.04 69.4
    ck5/6 expression 1.04 0.85 0.84 0.82 0.03 72.2
    E-cad expression 1.00 0.00 0.98 0.15 0.02 73.6
    Family History Cancer 0.89 0.32 0.82 0.39 0.02 73.6
    Tobacco consumption 0.37 0.49 0.29 0.46 0.01 68.1
    Necrosis 1.26 0.90 1.40 0.75 0.01 70.8
    Pregnancies 1.93 1.27 2.11 0.98 0.01 65.3
    Age at diagnosis 58.56 14.65 60.47 13.42 0.01 65.3
    Bcl2 expression 0.63 0.74 0.73 0.81 0.01 63.9
    Age at Menarche 12.48 1.28 12.62 1.25 0.01 66.7
    Col11A1 intensity 0.89 0.85 0.96 0.82 0.00 65.3
    Ki67 expression 2.56 0.70 2.51 0.63 0.00 65.3
    Lactation 0.89 0.32 0.87 0.34 0.00 65.3
    Col11A1 expression 1.26 1.23 1.20 1.10 0.00 65.3
    Family History BEO 0.67 0.48 0.69 0.47 0.00 65.3
    E-cad truncated 0.33 0.48 0.31 0.47 0.00 65.3
    Menopause 0.96 0.19 0.96 0.21 0.00 65.3
    Col11A1 score 2.04 2.38 1.96 2.15 0.00 62.3
    Alcohol consumption 0.16 0.36 0.16 0.37 0.00 62.5
    AR expression 0.26 0.45 0.27 0.45 0.00 62.5
    p53 expression 0.70 0.47 0.71 0.46 0.00 62.53
    Oral contraceptives 0.30 0.47 0.29 0.46 0.00 61.1

    Table 8 presents the optimum classifier found by the random sampler with an accuracy of 84.72% using a list of seven variables: Vascular invasion, Tumor Size, Perineural invasion, Family history, Age at diagnosis, Ki67 expression, and Col11A1 score. We also present and other networks of high discriminatory prognostic variables with a LOOCV predictive accuracy higher than 83%. Their stability analysis shows that the median accuracies vary from 78% to 83.3%, the mean accuracies from 79% to 81.7%, the inter-quartile range from 5.5% to 11% and the standard deviation is around 5 to 8%. In addition, the ROC rates prove a good ability of diagnostic of all the classifiers with sensitivities between 78% and 81% and specificities between 84% and 89%.

    Lymph nodes metastasis prediction. Other high discriminatory networks of prognostic variables with predictive accuracies greater than 83% ad their respective stability and ROC analysis.
    666115230505000Acc. 84.7 % Acc. 83.3%
    Vascular Inv. Vascular Inv. Vascular Inv. Vascular invasion
    Tumor Size Tumor Size Tumor Size Tumor Size
    Perineural Inv. Perineural Inv. Perineural Inv. Necrosis
    Family History Cancer Necrosis Necrosis Col11A1 score
    Age at diagnosis Age at diagnosis Col11A1 score Alcohol consumption
    Ki67 expression Ki67 expression AR expression AR expression
    Col11A1 score Col11A1 score p53 expression p53 expression
    Classifier's stability (%)
    med 83.3 80.6 77.8 77.8
    mean 80.6 80.4 79.3 79.5
    iqr 7.6 5.6 11.1 11.1
    std 5.6 7.1 7.4 7.9
    ROC analysis (%)
    Sensitivity 78 81 78 81
    Specificity 89 84 87 84

    Table 9 shows the median, mean, interquartile range (IQR) and the standard deviation of the predictive variables of the optimum classifier in the different groups of the confusion matrix. The confusion matrix of the optimum classifier is:

    Lymph nodes metastasis prediction Median, mean, IQR, and standard deviation of the signatures of the most predictive variables in the different groups of the confusion matrix (TP, FP, TN and FN).
    Optimum Signature TP TN FP FN
    med mean IQR std med mean IQR std med mean IQR std med mean IQR std
    Vascular invasion 1.00 0.57 1.00 0.50 0.00 0.07 0.00 0.26 0.00 0.20 0.25 0.44 0.00 1.17 0.00 0.40
    Tumor size 3.00 2.85 0.97 1.35 1.50 1.89 1.55 1.30 1.50 2.22 1.50 1.90 2.15 2.35 0.60 1.09
    Perineural invasion 0.00 0.28 1.00 0.46 0.00 0.05 0.00 0.22 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
    Family history 1.00 0.95 0.00 0.21 1.00 0.80 0.00 0.40 1.00 1.00 0.00 0.00 1.00 0.67 1.00 0.51
    Age at diagnosis 55.00 58.29 24.00 16.03 58.50 59.50 21.50 13.60 67.00 68.20 16.00 9.36 57.50 59.50 12.00 9.28
    Ki67 expression 3.00 2.57 1.00 0.75 3.00 2.55 1.00 0.59 2.00 2.20 1.25 0.84 2.50 2.50 1.00 0.54
    Col11A1 score 1.00 2.05 4.50 2.48 1.00 1.87 4.00 2.15 1.00 2.60 3.50 2.30 2.00 2.00 2.00 2.19

    The classifier has failed 11 samples, 5 of which were FP, and the other 6 were FN. The three main differences between the TP and TN groups are a positive Vascular invasion in the TP group, a higher median Tumor size of 3 cm (versus 1.5 cm in the TN group), and a lower median Age at diagnosis of 55 years in the TP group (versus 58.50 in the TN group). The main difference between TP and FP groups is the Age at diagnosis that is much higher in the FP group (67 years old vs 55). Finally, Figure 3 shows the correlation network for the Lymph Nodes prediction problem and shows the relationships between the most discriminatory variables.

    Lymph nodes metastasis prediction. Correlation network among the most discriminatory prognostic variables.

    Discussion

    Discussion

    Regarding the most discriminatory prognostic variables of the histological grade, it is interesting to note that women in the HG2 group did not have any Oral contraceptives intake. Population studies aimed at exploring associations between oral contraceptive use and cancer risk have shown that the risks of endometrial and ovarian cancer appear to be reduced with the use of oral contraceptives, whereas the risks of breast, cervical, and liver cancer appear to be increased 30. Other relevant values related with patients in the HG2 group with respect to the HG3 group are: higher Age at diagnosis, Lactation habits, and number of Pregnancies (an average of 2.3 children for women in HG2 group vs 1.7 in HG3 group); lower tumor size (Tsize) and Tobacco smoking; and lower values of the immuno-histochemical variables, except for the AR (Androgen Receptor) expression. These results provide new insights concerning the clinical features and habits that might influence a better prognosis.

    The best prediction of the HG (disregarding the Nottingham grading system) was performed by a list of only 6 prognostic variables: Ki67 expression, Oral contraceptives, Col11A1 score, E-cad truncated, Tumor Size, and Col11A1 expression, with a very stable accuracy (93.8%), sensitivity (97.0%) and specificity (76.0%). Once again, the importance of Oral contraceptives in the HG prediction is highlighted. All these variables are crucial for breast cancer diagnosis and treatment 11121314, but their combination has never been explored for HG assignment. The analysis of other equivalent networks has confirmed that Tumor size, Ki67 expression, Oral contraceptives, E-cad truncated, Col11A1 expression, p53 expression and Age at diagnosis are the most important prognostic variables in this prediction problem, and should be compulsory monitored to establish this important medical decision. The role of Ki67 expression as a prognostic marker in breast cancer has been also outlined by 39 in a large-base cohort study, concluding that it is associated with common histopathological parameters and as an additional independent prognostic factor for disease free and overall survivals. The relationship with the epithelial /mesenchymal (EMT) transition, expressed by the presence of ColA11, the truncated E-Cadherin and with the oral contraceptives intake are two main novelties of this analysis, since the samples with null Oral contraceptives intake fall in the HG2 group. Obviously, these values only provide general trends due to the possible presence of behavioral outliers.

    The correlation network shows two main branches connecting Ki67 expression to Tumor size and AR expression, both with low correlation coefficients. Two branches start from AR through CK14 expression and E-cad truncated, both weakly correlated to the AR node with negative coefficients. In the tumor size branch, all the variables seem to be related to habits and clinical features, Age at diagnosis, Menopause, Tobacco smoking, Oral contraceptives, etc. The low correlation among all these variables implies that they should be considered as independent prognostic factors. This graphic also confirms the strong correlation between the three representations of the Col11A1 protein. The role of the Androgen Receptor in breast cancer has been reviewed by 40, concluding that AR expression might play a role during tumor progression. Although histologic grading has become widely accepted as a powerful indicator of prognosis in breast cancer, no connections with other biomarkers has been made relevant. In our opinion this is one major findings of this research that will serve to improve the actual methods of prognosis.

    In the case of the lymph nodes metastasis, the most important variables are Vascular invasion, Tumor size, Perineural invasion, Family history, Age at diagnosis, Ki67 expression and COl11A1 score, with a high predictive accuracy (84.7%), sensitivity (78.0%) and specificity (89.0%). All the samples presenting metastasis have positive Vascular invasion (vs almost null in the non-metastasis group), a higher Tumor size mean of 2.74 cm (vs. 1.92 cm), positive Perineural invasion, highest age for first child (25.78 vs 24.62) and higher CK14 and CK5/6 expressions. The analysis of the equivalent networks with accuracies higher than 83% show high stability and a good ability for diagnostic. All these signatures share the Vascular invasion and Tumor Size as leading prognostic variables. Likewise, Col11A1 score, Perineural invasion and/or Necrosis also appear in these networks. The ROC analysis established Vascular invasion and Tumor size as the main differences between the true positive (TP) and true negative (TN) groups, and also showed the existence of a group of TNBC cancers with absence of Vascular and Perineural invasion that presents lymph nodes metastasis (FN group). This kind of cancers have a lower median Tumor size (around 2.15 cm) than the FP group, and a median Col11A1 score value of 2. This knowledge is very important to improve the prediction of Lymph Nodes Metastasis at diagnostic. The correlation network shows one main branch starting from Vascular invasion and linking to Alcohol Consumption and other personal habits (Tobacco consumption) and clinical features (Age at First Child, and Tumor Size). Again, the correlations coefficients among these variables are very low. Interestingly, the immuno-histochemical variables appear at the base of the tree, indicating their lower importance in the metastasis prediction.

    Finally, an interesting remark is that the HG and lymph node metastasis predictions share the Tumor size, Ki67 expression, and Col11A1 score as high discriminatory prognostic variables, confirming a certain link between both problems. Besides, Col11A1 score has a much higher predictive power than the other two representations of this protein. It is not surprising the relationships with vascular and perineural invasions, as well as with the tumor size or ki67 expression, but this analysis provides novel relationships with the expression of ColA11 protein and also with the patient's age.

    Conclusion

    Conclusions

    This study was dedicated to the HG and the lymph nodes metastasis prediction, crucial for developing more suitable treatment strategies. As results, we present the main clinical and pathological variables and their correlation networks for both prediction problems, via novel machine learning techniques. These variables are currently being used for prognosis and treatment in medical practice. HG was predicted with an accuracy of 93.8% using a list of 6 prognostic variables with significant implications: Ki67 expression, use of Oral contraceptives, Col11A1 expression, Col11A1 score, E-cad truncated and Tumor size. The lymph nodes metastasis was predicted with an accuracy of almost 85% using only 6 prognostic variables: Vascular invasion, Tumor size, Perineural invasion, Age at diagnosis, Ki67 expression, and Col11A1 score. This analysis also served to establish the median signatures of the groups with and without lymph node metastasis, and proved the existence of a kind of small-size tumors (around 2.15 cm) with lymph node metastasis but not showing vascular and perineural invasions and higher protein Col11A1 score. Besides, these signatures proved to be very stable. The additional information conveyed by the prognostic variables found in these two classification problems provides new insight about the genesis and progression of this disease and can be used in medical practice to improve decisions in patient diagnosis and further treatment.

    We expect that the conclusions attained by this analysis will contribute to improve the understanding, diagnosis and prognosis of this important type of heterogeneous cancers. This methodology could be also used to predict treatment response when this kind of information is available, as we have shown in the case of Hodgkin Lymphoma 18.

    List of Abbreviations

    TNBC, Triple Negative Breast Cancer; HG, histological grade; ER, Estrogen Receptors; PR, Progesterone Receptors; HER2, Human Epidermal Growth factor 2 receptors; AR, androgen receptor; EMT, epithelial–mesenchymal transition; MC, Mitotic Count; Necr, necrosis; NP, Nuclear Pleomorfism; PI, Perineural invasion; TF, Tubular formation; TS, Tumor size; VI, Vascular invasion; HUCA, Hospital Universitario Central de Asturias; TP, true positive; TN, true negative; IQR, interquartile range; LOOCV, Leave-One-Out Cross-Validation; ROC, Receiver Operating Characteristic; FR, Fisher’s ratio.

    Affiliations:
    Affiliations: