How is your model doing?


A quick glance of your most important metrics.

Last 0 Evaluations

PPV
The proportion of correctly predicted positive instances among all instances predicted as positive. Also known as precision.
0.14 0.7
minimum
threshold
NPV
The proportion of correctly predicted negative instances among all instances predicted as negative.
0.93 0.7
minimum
threshold
Sensitivity
The proportion of actual positive instances that are correctly predicted. Also known as recall or true positive rate.
0.82 0.7
minimum
threshold
Specificity
The proportion of actual negative instances that are correctly predicted.
0.32 0.7
minimum
threshold

Last 0 Evaluations

How is your model doing over time?


See how your model is performing over several metrics and subgroups over time.

Multi-plot Selection:

Metrics

The moving average of all data points.
A measure of how dispersed the data points are in relation to the mean.
The proportion of correctly predicted positive instances among all instances predicted as positive. Also known as precision.
The proportion of correctly predicted negative instances among all instances predicted as negative.
The proportion of actual positive instances that are correctly predicted. Also known as recall or true positive rate.
The proportion of actual negative instances that are correctly predicted.

Patient Age

pathology

Patient Gender

Datasets


Graphics

Quantitative Analysis


PPV
The proportion of correctly predicted positive instances among all instances predicted as positive. Also known as precision.
0.14 0.7
minimum
threshold
NPV
The proportion of correctly predicted negative instances among all instances predicted as negative.
0.93 0.7
minimum
threshold
Sensitivity
The proportion of actual positive instances that are correctly predicted. Also known as recall or true positive rate.
0.82 0.7
minimum
threshold
Specificity
The proportion of actual negative instances that are correctly predicted.
0.32 0.7
minimum
threshold

Model Details


Description

This model is a DenseNet121 model trained on the NIH Chest X-Ray dataset, which contains 112,120 frontal-view X-ray images of 30,805 unique patients with the fourteen text-mined disease labels from the associated radiological reports. The labels are Atelectasis, Cardiomegaly, Effusion, Infiltration, Mass, Nodule, Pneumonia, Pneumothorax, Consolidation, Edema, Emphysema, Fibrosis, Pleural Thickening, and Hernia. The model was trained on 80% of the data and evaluated on the remaining 20%.

Owners

  • Name: Machine Learning and Medicine Lab
    Contact: mlmed.org
    Email: joseph@josephpcohen.com

Citations

  • @inproceedings{Cohen2022xrv, title = {{TorchXRayVision: A library of chest X-ray datasets and models}}, author = {Cohen, Joseph Paul and Viviano, Joseph D. and Bertin, Paul and Morrison,Paul and Torabian, Parsa and Guarrera, Matteo and Lungren, Matthew P and Chaudhari, Akshay and Brooks, Rupert and Hashir, Mohammad and Bertrand, Hadrien}, booktitle = {Medical Imaging with Deep Learning}, url = {https://github.com/mlmed/torchxrayvision}, arxivId = {2111.00595}, year = {2022} }
  • @inproceedings{cohen2020limits, title={On the limits of cross-domain generalization in automated X-ray prediction}, author={Cohen, Joseph Paul and Hashir, Mohammad and Brooks, Rupert and Bertrand, Hadrien}, booktitle={Medical Imaging with Deep Learning}, year={2020}, url={https://arxiv.org/abs/2002.02497} }

Name

NIH Chest X-Ray Multi-label Classification Model

Considerations


Users

  • Radiologists
  • Data Scientists

Use Cases

  • The model can be used to predict the presence of 14 pathologies in chest X-ray images.
    Kind: primary

Fairness Assessment

  • Affected Group: Patients with rare pathologies
    Benefits: The model can help radiologists to detect pathologies in chest X-ray images.
    Harms: The model may not generalize well to populations that are not well-represented in the training data.
    A mitigation strategy for this risk is to ensure that the training data is diverse and representative of the population.

Ethical Considerations

  • A mitigation strategy for this risk is to ensure that the training data is diverse and representative of the population that the model will be used on. Additionally, the model should be regularly evaluated and updated to ensure that it continues to perform well on diverse populations. Finally, the model should be used in conjunction with human expertise to ensure that any biases or limitations are identified and addressed.
    Risk: One ethical risk of the model is that it may not generalize well to populations that are not well-represented in the training data, such as patients from different geographic regions or with different demographics.

Limitations

  • The limitations of this model include its inability to detect pathologies that are not included in the 14 labels of the NIH Chest X-Ray dataset. Additionally, the model may not perform well on images that are of poor quality or that contain artifacts. Finally, the model may not generalize well to populations that are not well-represented in the training data, such as patients from different geographic regions or with different demographics.

Tradeoffs

  • The model can help radiologists to detect pathologies in chest X-ray images, but it may not generalize well to populations that are not well-represented in the training data.