This page provides pre-trained taxonomic classifiers that can be used with the q2-feature-classifier
plugin (Bokulich et al. (2018)).
Classifiers are specific to versions of scikit-learn (Pedregosa et al. (2011)), a dependency of q2-feature-classifier
, and thus are categorized below by the QIIME 2 version range that they will work with.
QIIME 2 2024.5 - Present¶
Naive Bayes Classifiers¶
Silva 138 99% OTUs full-length sequences
Download: Silva 138 99% OTUs full-length sequences
UUID: 70b4b5f4-8fce-40bd-b508-afacbc12a5ed
SHA256: c08a1aa4d56b449b511f7215543a43249ae9c54b57491428a7e5548a62613616
Sklearn Version: 1.4.2
Date Trained: 2024-05-30
Notes: Silva species taxonomy may be unreliable
Citations: Robeson et al. (2020), Bokulich et al. (2018), Silva
EXPERIMENTAL: diverse weighted Silva 138 99% OTUs full-length sequences
Download: diverse weighted Silva 138 99% OTUs full-length sequences
UUID: eff4efb9-d90d-43ce-acb3-53e04583323a
SHA256: decfae408061fab8ff2fec7dac1fe2a2e0041581589715062cc789bd4f9933db
Sklearn Version: 1.4.2
Date Trained: 2024-07-04
Notes: Silva species taxonomy may be unreliable. These classifiers were created with 14 diverse environments: “Sediment (non-saline)”, “Plant corpus”, “Animal secretion”, “Sediment (saline)”, “Animal surface”, “Surface (saline)”, “Plant rhizosphere”, “Soil (non-saline)”, “Animal distal gut”, “Water (saline)”, “Animal proximal gut”, “Water (non-saline)”, “Animal corpus”, “Plant surface”. More information can be found at: https://www.nature.com/articles/s41467-019-12669-6
Citations: Robeson et al. (2020), Bokulich et al. (2018), Kaehler et al. (2019), Silva
EXPERIMENTAL: human stool weighted Silva 138 99% OTUs full-length sequences
Download: experimental human stool weighted Silva 138 99% OTUs full-length sequences
UUID: 529fdee1-778e-4a1e-acd2-b1b78fcc0048
SHA256: db9e3c0105b1b9173deaa8bd828113b422c467443587cc8be3aed2e6f7cc995f
Sklearn Version: 1.4.2
Date Trained: 2024-05-30
Notes: Silva species taxonomy may be unreliable
Citations: Robeson et al. (2020), Bokulich et al. (2018), Kaehler et al. (2019, Silva)(#Silva-Citations)
GTDB r220 full-length sequences
Download: GTDB r220 full-length sequences
UUID: 5d5461cc-6a51-434b-90ab-040f388e4221
SHA256: 07aadcf7472d9cc6f853f6b4615348619f1a3eceb56c1fb1b6d8dbb20554765f
Sklearn Version: 1.4.2
Date Trained: 2024-05-30
Citations: Parks et al. (2021), Parks et al. (2020), Parks et al. (2018), Rinke et al. (2021)
EXPERIMENTAL: diverse weighted GTDB r220 full-length sequences
Download: diverse weighted GTDB r220 full-length sequences
UUID: 5381b6fc-9f93-4844-8104-5263d5eae3f0
SHA256: 232e0360f5e12f5158f2891db8d50de7e9cc035e1ea13672d9f87582ce10ee0f
Sklearn Version: 1.4.2
Date Trained: 2024-07-09
Notes: These classifiers were created with 14 diverse environments: “Sediment (non-saline)”, “Plant corpus”, “Animal secretion”, “Sediment (saline)”, “Animal surface”, “Surface (saline)”, “Plant rhizosphere”, “Soil (non-saline)”, “Animal distal gut”, “Water (saline)”, “Animal proximal gut”, “Water (non-saline)”, “Animal corpus”, “Plant surface”. More information can be found at: https://www.nature.com/articles/s41467-019-12669-6
Citations: Parks et al. (2021), Parks et al. (2020), Parks et al. (2018), Rinke et al. (2021), Kaehler et al. (2019)
EXPERIMENTAL: human stool weighted GTDB r220 full-length sequences
Download: experimental human stool weighted GTDB r220 full-length sequences
UUID: 4410ca00-2484-49ef-bad5-039e82be10b9
SHA256: ec15dec8adc9f0bd45b315117df968a551651aef495a6079541a8bb29225d522
Sklearn Version: 1.4.2
Date Trained: 2024-05-30
Notes: GTDB human stool weights provide unreliable results for Bacteriodes taxa
Citations: Parks et al. (2021), Parks et al. (2020), Parks et al. (2018), Rinke et al. (2021), Kaehler et al. (2019)
Greengenes2 2022.10 full length sequences
Download: Greengenes2 2022.10 full length sequences
UUID: 1df1eab1-2ee4-4d92-917e-ba1f6c937269
SHA256: 4f7ab05b57f85a76b12049c057c363759112fe17b7337f69f1ab2db0ec668024
Sklearn Version: 1.4.2
Date Trained: 2024-05-15
Forum Submitter: wasade
Notes: Greengenes2 has succeeded Greengenes 13_8
Citations: McDonald et al. (2023), Bokulich et al. (2018)
Greengenes2 2022.10 from 515F/806R region of sequences
Download: Greengenes2 2022.10 from 515F/806R region of sequences
UUID: c0a78f62-6d0f-4f4a-b9c1-ec34c4b4763b
SHA256: 5784f50a4ce8f004019ae7495e83c170aecdb4126aa13f0bb1b2e78d1f5cb024
Sklearn Version: 1.4.2
Date Trained: 2024-05-15
Forum Submitter: wasade
Notes: Greengenes2 has succeeded Greengenes 13_8
Citations: McDonald et al. (2023), Bokulich et al. (2018)
QIIME 2 2021.4-2024.2¶
Naive Bayes Classifiers¶
Silva 138 99% OTUs full-length sequences
Download: Silva 138 99% OTUs full-length sequences
UUID: 2bbe61fa-7f78-4913-a6a7-b42e6fff2279
SHA256: bb5870fcf084e82a9ee6ca806f1b4f9e78b2e299eac608ff1dca4b2f28fb1b36
Sklearn Version: 0.24.1
Date Trained: 2021-05-09
Notes: Silva species taxonomy may be unreliable
Citations: Robeson et al. (2020), Bokulich et al. (2018), Silva
Silva 138 99% OTUs from 515F/806R region of sequences
Download: Silva 138 99% OTUs from 515F/806R region of sequences
UUID: 9b9c290b-2297-4711-bafe-7cc603f3b990
SHA256: 5b08f1c272b16208830b2b712a682824b82671c8b6c5fe325ccd7feabfc498ba
Sklearn Version: 0.24.1
Date Trained: 2021-05-07
Notes: Silva species taxonomy may be unreliable
Citations: Robeson et al. (2020), Bokulich et al. (2018), Silva
Greengenes2 2022.10 full length sequences
Download: Greengenes2 2022.10 full length sequences
UUID: 3e819633-6888-42f9-ab66-fe5214e57d72
SHA256: f48c1e2cc7b997d3dee953e1869c2492de03f31d99886f06a50b2536136ad5cf
Sklearn Version: 0.24.1
Date Trained: 2022-12-30
Forum Submitter: wasade
Notes: Greengenes2 has succeeded Greengenes 13_8
Citations: McDonald et al. (2023), Bokulich et al. (2018)
Greengenes2 2022.10 from 515F/806R region of sequences
Download: Greengenes2 2022.10 from 515F/806R region of sequences
UUID: 32489596-075f-44ff-a0ad-0a5c43a80b2c
SHA256: 643fd395ada320140838f12c1d395fc88ae950128a83d3a3ac55625e1d21f337
Sklearn Version: 0.24.1
Date Trained: 2022-10-20
Forum Submitter: wasade
Notes: Greengenes2 has succeeded Greengenes 13_8
Citations: McDonald et al. (2023), Bokulich et al. (2018)
Greengenes 13_8 99% OTUs full length sequences
Download: Greengenes 13_8 99% OTUs full length sequences
UUID: aacdcb16-5bad-48f7-ac7d-3f30c35b0d67
SHA256: b49f6e28e4b3195b39efb4787cd3c07d4c7a2fd5ba07f5699f95c3120de7a6b5
Sklearn Version: 0.24.1
Date Trained: 2021-04-21
Notes: N/A
Citations: Robeson et al. (2020), Bokulich et al. (2018)
Greengenes 13_8 99% OTUs from 515F/806R region of sequences
Download: Greengenes 13_8 99% OTUs from 515F/806R region of sequences
UUID: 4b2a57b7-1e5a-4a4d-8201-99551ab50858
SHA256: 526a122e7599f542f6b76840097c3e5dbf71a13aed7e06fee595efce43578544
Sklearn Version: 0.24.1
Date Trained: 2021-04-21
Notes: N/A
Citations: Robeson et al. (2020), Bokulich et al. (2018)
Weighted Taxonomic Classifiers¶
These 16S rRNA gene classifiers were trained with weights that take into account the fact that not all species are equally likely to be observed. If your sample comes from any of the 14 habitat types we tested, these weighted classifiers should give you superior classification precision. If your sample doesn’t come from one of those habitats, they might still help. If you have the time, training with weights specific to your habitat should help even more. Weights for a range of habitats are available here.
Weighted Silva 138 99% OTUs full-length sequences
Download: Weighted Silva 138 99% OTUs full-length sequences
UUID: 4df224fc-d4ba-44b6-9bef-cb0747673864
SHA256: b0be3d168e7292f3f7d6d4a299e8a0f36416db013668e89f8e614e0bd648b452
Sklearn Version: 0.24.1
Date Trained: 2021-05-06
Notes: Silva species taxonomy may be unreliable
Citations: Robeson et al. (2020), Kaehler et al. (2019), Bokulich et al. (2018), Silva
Weighted Greengenes 13_8 full length sequences
Download: Weighted Greengenes 13_8 full length sequences
UUID: 60645f71-cb57-41e3-8ed8-c3257a630cb7
SHA256: 1629124485da77f5fadea213db4e5ba1361077df8b1fc1d37eafabc500251eca
Sklearn Version: 0.24.1
Date Trained: 2021-04-21
Notes: N/A
Citations: Kaehler et al. (2019), Bokulich et al. (2018)
Weighted Greengenes 13_8 from 515F/806R region of sequences
Download: Weighted Greengenes 13_8 from 515F/806R region of sequences
UUID: b607509d-cc1b-4fe7-863e-1359be7f34f3
SHA256: eb4c11d7b3cf3d1d1f0ae40b47dcf2cd0c853f0b9f9e7594d257342ceb09103a
Sklearn Version: 0.24.1
Date Trained: 2021-04-21
Notes: N/A
Citations: Kaehler et al. (2019), Bokulich et al. (2018)
QIIME 2 2020.6-2021.2¶
Naive Bayes Classifiers¶
Silva 138 99% OTUs full-length sequences
Download: Silva 138 99% OTUs full-length sequences
UUID: 50d540a8-272f-4012-86d3-31254047b46b
SHA256: def48c9f9c8c3444f42b13dbeaf5f6376efff3e8e81994788dc3493fe02aaedc
Sklearn Version: 0.23.1
Date Trained: 2020-06-18
Notes: Silva species taxonomy may be unreliable
Citations: Robeson et al. (2020), Bokulich et al. (2018), Silva
Silva 138 99% OTUs from 515F/806R region of sequences
Download: Silva 138 99% OTUs from 515F/806R region of sequences
UUID: 981abf7d-2d85-41f5-963f-06e36c6ae5c5
SHA256: 850d449c7b0b6833cf7d7d631fb4a462e72b21fbd41ac6e6b07f159c07f64c16
Sklearn Version: 0.23.1
Date Trained: 2020-06-17
Notes: Silva species taxonomy may be unreliable
Citations: Robeson et al. (2020), Bokulich et al. (2018), Silva
Greengenes 13_8 99% OTUs full-length sequences
Download: Greengenes 13_8 99% OTUs full-length sequences
UUID: 8390eae6-498b-410d-a042-4a997ceab50d
SHA256: bf3604186bd7bde518bbe78478db3dd28b5ce383ae969d0efd7f8acdbd619734
Sklearn Version: 0.23.1
Date Trained: 2020-06-17
Notes: N/A
Citations: McDonald et al. (2023), Bokulich et al. (2018)
Greengenes 13_8 99% OTUs from 515F/806R region of sequences
Download: Greengenes 13_8 99% OTUs from 515F/806R region of sequences
UUID: 0ad07fee-e9e8-48fa-8e35-9f689e324245
SHA256: 2dd6f94a3614d5a1b8de6b6b1661a1e1bbb8778e53cbfcc47eb32989b5582895
Sklearn Version: 0.23.1
Date Trained: 2020-06-17
Notes: N/A
Citations: McDonald et al. (2023), Bokulich et al. (2018)
QIIME 2 2020.2¶
Naive Bayes Classifiers¶
Silva 132 99% OTUs full-length sequences
Download: Silva 132 99% OTUs full-length sequences
UUID: 50d540a8-272f-4012-86d3-31254047b46b
SHA256: 6a78f2a6a026c4a7b7b69f87ddec765d8ff6d933fc7681badeaac9338c439658
Sklearn Version: 0.22.1
Date Trained: 2020-02-18
Notes: Silva species taxonomy may be unreliable
Citations: Robeson et al. (2020), Bokulich et al. (2018), Silva
Silva 132 99% OTUs from 515F/806R region of sequences
Download: Silva 132 99% OTUs from 515F/806R region of sequences
UUID: 981abf7d-2d85-41f5-963f-06e36c6ae5c5
SHA256: c541fe3087f2b1a2391082ab608256f6467022e04e54d3c07e28c1d51cb51f75
Sklearn Version: 0.22.1
Date Trained: 2020-02-18
Notes: Silva species taxonomy may be unreliable
Citations: Robeson et al. (2020), Bokulich et al. (2018), Silva
Greengenes 13_8 99% OTUs full-length sequences
Download: Greengenes 13_8 99% OTUs full-length sequences
UUID: 2475458d-db2d-46d8-938e-269d5c548225
SHA256: a106451cc4719cde56141a7124219da6ba6e44eab15e02db306e486063c85a35
Sklearn Version: 0.22.1
Date Trained: 2020-02-17
Notes: N/A
Citations: McDonald et al. (2023), Bokulich et al. (2018)
Greengenes 13_8 99% OTUs from 515F/806R region of sequences
Download: Greengenes 13_8 99% OTUs from 515F/806R region of sequences
UUID: 18d34bd0-8e7e-4af6-ba34-a31c03fceb70
SHA256: 3d64bc343c5d364302b6440d6d426a18583297edf17dc144ca21ca2c4f23ce18
Sklearn Version: 0.22.1
Date Trained: 2020-02-17
Notes: N/A
Citations: McDonald et al. (2023), Bokulich et al. (2018)
- Bokulich, N. A., Kaehler, B. D., Rideout, J. R., Dillon, M., Bolyen, E., Knight, R., Huttley, G. A., & Gregory Caporaso, J. (2018). Optimizing taxonomic classification of marker-gene amplicon sequences with QIIME 2’s q2-feature-classifier plugin. Microbiome, 6(1).
- Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, E. (2011). Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12, 2825–2830.
- Robeson, M. S., II, O’Rourke, D. R., Kaehler, B. D., Ziemski, M., Dillon, M. R., Foster, J. T., & Bokulich, N. A. (2020). RESCRIPt: Reproducible sequence taxonomy reference database management for the masses. In bioRxiv. bioRxiv.
- Kaehler, B. D., Bokulich, N. A., McDonald, D., Knight, R., Caporaso, J. G., & Huttley, G. A. (2019). Species abundance information improves sequence taxonomy classification accuracy. Nat. Commun., 10(1), 4643.
- Parks, D. H., Chuvochina, M., Rinke, C., Mussig, A. J., Chaumeil, P.-A., & Hugenholtz, P. (2021). GTDB: an ongoing census of bacterial and archaeal diversity through a phylogenetically consistent, rank normalized and complete genome-based taxonomy. Nucleic Acids Research, 50(D1), D785–D794. 10.1093/nar/gkab776