Talk:Biomedical text mining

Latest comment: 4 years ago by Daisylagata in topic Additional of references for BioNLP Shared Tasks

Untitled edit

Greetings, fellow wikipedians! I noticed that this stub didn't have a discussion page, while several people have contributed to this article. I'd love to get to know whoever else is interested in the subject. Even though the references so far are all centered around Hoffman/Valencia et al. , I'm surprised nobody brought up iHOP yet, so I added it as an example in a new section. Since I am currently writing a thesis on the subject of biomedical text mining, I expect to be able to give a much more complete view of the subject, and eventually lift the stub status of this article. My edits so far have been only a warming up. Ste1n 19:05, 17 April 2006 (UTC)Reply

Examples, please? edit

Should this article not have one or two specfic examples where text mining advanced research, helped with drug dscovery, established the etiology of a disease? Are there any such examples in medicine / health? I doubt it[I have looked e.g. on PubMed]. The use of word clouds has been questioned, text mining produces nice stats and graphs but does it tell us anything new? BTW I am not talking about plagiarism detection... Sleuth21 (talk) 07:30, 30 May 2011 (UTC)Reply

External links modified edit

Hello fellow Wikipedians,

I have just modified one external link on Biomedical text mining. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:

When you have finished reviewing my changes, please set the checked parameter below to true or failed to let others know (documentation at {{Sourcecheck}}).

This message was posted before February 2018. After February 2018, "External links modified" talk page sections are no longer generated or monitored by InternetArchiveBot. No special action is required regarding these talk page notices, other than regular verification using the archive tool instructions below. Editors have permission to delete these "External links modified" talk page sections if they want to de-clutter talk pages, but see the RfC before doing mass systematic removals. This message is updated dynamically through the template {{source check}} (last update: 18 January 2022).

  • If you have discovered URLs which were erroneously considered dead by the bot, you can report them with this tool.
  • If you found an error with any archives or the URLs themselves, you can fix them with this tool.

Cheers.—InternetArchiveBot (Report bug) 23:40, 2 November 2016 (UTC)Reply

Additional of references for BioNLP Shared Tasks edit

Information to be added or removed: In the section "Availability of annotated text data", I would like to add mention of the BioNLP shared tasks following the mention of the Informatics for Integrating Biology and the Bedside (i2b2) challenges. The first pagagraph of this section would then be the following:

Extended content

Large annotated corpora used in the development and training of general purpose text mining methods (e.g., sets of movie dialogue,[1] product reviews,[2] or Wikipedia article text) are not specific for biomedical language. While they may provide evidence of general text properties such as parts of speech, they rarely contain concepts of interest to biologists or clinicians. Development of new methods to identify features specific to biomedical documents therefore requires assembly of specialized corpora.[3] Resources designed to aid in building new biomedical text mining methods have been developed through the Informatics for Integrating Biology and the Bedside (i2b2) challenges[4][5][6], BioNLP shared tasks [7][8][9][10][11][12][13][14] and biomedical informatics researchers.[15][16] Text mining researchers frequently combine these corpora with the controlled vocabularies and ontologies available through the National Library of Medicine's Unified Medical Language System (UMLS) and Medical Subject Headings (MeSH).


Explanation of issue: The BioNLP shared tasks (and the corpora created as part of them) represent important community efforts and resources for the biomedical text minning community. The tasks and resources were created by various members of the community, including my own group. I tried to to add this directly, but it was removed as an "Apparent COI cite". Howeever, this represents not only the work of my group, but the work of others. Apologies if I have done something incorrecly - I have not got a great deal of experience in editing Wikipedia pages.

References supporting change: Supporting references included in the changes shown above

References

  1. ^ Danescu-Niculescu-Mizil C, Lee L (2011). Chameleons in Imagined Conversations: A New Approach to Understanding Coordination of Linguistic Style in Dialogs. pp. 76–87. arXiv:1106.3077. Bibcode:2011arXiv1106.3077D. ISBN 978-1-932432-95-4. {{cite book}}: |journal= ignored (help)
  2. ^ McAuley J, Leskovec J (2013-10-12). Hidden factors and hidden topics: understanding rating dimensions with review text. ACM. pp. 165–172. doi:10.1145/2507157.2507163. ISBN 978-1-4503-2409-0.
  3. ^ Ohno-Machado L, Nadkarni P, Johnson K (2013). "Natural language processing: algorithms and tools to extract computable information from EHRs and from the biomedical literature". Journal of the American Medical Informatics Association. 20 (5): 805. doi:10.1136/amiajnl-2013-002214. PMC 3756279. PMID 23935077.
  4. ^ Uzuner Ö, South BR, Shen S, DuVall SL (2011). "2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text". Journal of the American Medical Informatics Association. 18 (5): 552–6. doi:10.1136/amiajnl-2011-000203. PMC 3168320. PMID 21685143.
  5. ^ Sun W, Rumshisky A, Uzuner O (2013). "Evaluating temporal relations in clinical text: 2012 i2b2 Challenge". Journal of the American Medical Informatics Association. 20 (5): 806–13. doi:10.1136/amiajnl-2013-001628. PMC 3756273. PMID 23564629.
  6. ^ Stubbs A, Kotfila C, Uzuner Ö (December 2015). "Automated systems for the de-identification of longitudinal clinical narratives: Overview of 2014 i2b2/UTHealth shared task Track 1". Journal of Biomedical Informatics. 58 Suppl: S11–9. doi:10.1016/j.jbi.2015.06.007. PMC 4989908. PMID 26225918.
  7. ^ Kim, JD, Ohta, T, Pyysalo, S, Kano, Y, Tsujii, J (2011). "Extracting Bio-Molecular Events From Literature - The BioNLP'09 Shared Task". Computational Intelligence. 27 (4): 513–540. doi:10.1111/j.1467-8640.2011.00398.x.
  8. ^ Kim, JD, Nguyen, N, Wang, Y, Tsujii, J, Takagi, T, Yonezawa, A (2012). "The Genia Event and Protein Coreference tasks of the BioNLP Shared Task 2011". BMC Bioinformatics. 13: S1. doi:10.1186/1471-2105-13-S11-S1.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  9. ^ Pyysalo, S, Ohta, T, Rak, R, Sullivan, D, Mao, C, Wang, C, Sobral, B, Tsujii, J, Ananiadou, S (2012). "Overview of the ID, EPI and REL tasks of BioNLP Shared Task 2011". BMC Bioinformatics. 13: S2. doi:10.1186/1471-2105-13-S11-S2.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  10. ^ Bossy, R, Jourde, J, Manine, AP, Veber, P, Alphonse, E, van de Guchte M, Bessières, P, Nédellec, C (2012). "BioNLP Shared Task - The Bacteria Track". BMC Bioinformatics. 13: S3. doi:10.1186/1471-2105-13-S11-S3.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  11. ^ Bossy, R, Golik, W, Ratkovic, Z, Valsamou, D, Bessières, P, Nédellec, C (2015). "Overview of the gene regulation network and the bacteria biotope tasks in BioNLP'13 shared task". BMC Bioinformatics. 16: S1. doi:10.1186/1471-2105-16-S10-S1.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  12. ^ Pyysalo, S, Ohta, T, Rak, R, Rowley, A, Chun, HW, Jung, SJ, Choi, SP, Tsujii, J, Ananiadou, S (2015). "Overview of the Cancer Genetics and Pathway Curation tasks of BioNLP Shared Task 2013". BMC Bioinformatics. 16: S2. doi:10.1186/1471-2105-16-S10-S2.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  13. ^ Chaix, E; Dubreucq, B; Fatihi, A; Valsamou, D; Bossy, R; Ba, M; Delėger, L; Zweigenbaum, P; Bessières, P; Lepiniec, L; Nėdellec, C (2016). "Overview of the Regulatory Network of Plant Seed Development (SeeDev) Task at the BioNLP Shared Task 2016". Proceedings of the 4th BioNLP Shared Task Workshop. pp. 1–11. doi:10.18653/v1/W16-3001.
  14. ^ Delėger, L; Bossy, R; Chaix, E; Ba, M; Ferrė, A; Bessières, P; Nėdellec, C (2016). "Overview of the Bacteria Biotope Task at BioNLP Shared Task 2016". Proceedings of the 4th BioNLP Shared Task Workshop. pp. 12–22. doi:10.18653/v1/W16-3002.
  15. ^ Albright D, Lanfranchi A, Fredriksen A, Styler WF, Warner C, Hwang JD, Choi JD, Dligach D, Nielsen RD, Martin J, Ward W, Palmer M, Savova GK (2013). "Towards comprehensive syntactic and semantic annotations of the clinical narrative". Journal of the American Medical Informatics Association. 20 (5): 922–30. doi:10.1136/amiajnl-2012-001317. PMC 3756257. PMID 23355458.
  16. ^ Bada M, Eckert M, Evans D, Garcia K, Shipley K, Sitnikov D, Baumgartner WA, Cohen KB, Verspoor K, Blake JA, Hunter LE (July 2012). "Concept annotation in the CRAFT corpus". BMC Bioinformatics. 13 (1): 161. doi:10.1186/1471-2105-13-161. PMC 3476437. PMID 22776079.{{cite journal}}: CS1 maint: unflagged free DOI (link)

Daisylagata (talk) 14:38, 27 August 2019 (UTC)Reply

Reply 27-AUG-2019 edit

   Specification requested  

  1. Of the provided sources, 50% of them contain |page= parameters covering 4 or more cited pages of text. It is highly unlikely that the information contained in five sentences results from all 96 pages of this cited text. Thus, the request should specify which particular page the information is contained upon in sources containing multiple cited |pages=.
  2. The grouping of eight separate references to source only three words suggests WP:TOOMANYREFS.
  3. The COI editor is invited to redraft their proposal incorporating exact page numbers, and is asked to make use of only the minimum references needed.

Regards,  Spintendo  15:16, 27 August 2019 (UTC)Reply

Reply 28-AUG-2019 edit

I have reduced the number of references to four. There have been four of the BioNLP shared tasks in different years. Now, there is a link to an overview paper, or the conference proceedings, for each of these tasks. I hope that this is more accptable. Please see below. Daisylagata (talk) 10:35, 28 August 2019 (UTC)Reply

Large annotated corpora used in the development and training of general purpose text mining methods (e.g., sets of movie dialogue,[1] product reviews,[2] or Wikipedia article text) are not specific for biomedical language. While they may provide evidence of general text properties such as parts of speech, they rarely contain concepts of interest to biologists or clinicians. Development of new methods to identify features specific to biomedical documents therefore requires assembly of specialized corpora.[3] Resources designed to aid in building new biomedical text mining methods have been developed through the Informatics for Integrating Biology and the Bedside (i2b2) challenges[4][5][6], BioNLP shared tasks [7][8][9][10] and biomedical informatics researchers.[11][12] Text mining researchers frequently combine these corpora with the controlled vocabularies and ontologies available through the National Library of Medicine's Unified Medical Language System (UMLS) and Medical Subject Headings (MeSH).

  1. ^ Danescu-Niculescu-Mizil C, Lee L (2011). Chameleons in Imagined Conversations: A New Approach to Understanding Coordination of Linguistic Style in Dialogs. pp. 76–87. arXiv:1106.3077. Bibcode:2011arXiv1106.3077D. ISBN 978-1-932432-95-4. {{cite book}}: |journal= ignored (help)
  2. ^ McAuley J, Leskovec J (2013-10-12). Hidden factors and hidden topics: understanding rating dimensions with review text. ACM. pp. 165–172. doi:10.1145/2507157.2507163. ISBN 978-1-4503-2409-0.
  3. ^ Ohno-Machado L, Nadkarni P, Johnson K (2013). "Natural language processing: algorithms and tools to extract computable information from EHRs and from the biomedical literature". Journal of the American Medical Informatics Association. 20 (5): 805. doi:10.1136/amiajnl-2013-002214. PMC 3756279. PMID 23935077.
  4. ^ Uzuner Ö, South BR, Shen S, DuVall SL (2011). "2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text". Journal of the American Medical Informatics Association. 18 (5): 552–6. doi:10.1136/amiajnl-2011-000203. PMC 3168320. PMID 21685143.
  5. ^ Sun W, Rumshisky A, Uzuner O (2013). "Evaluating temporal relations in clinical text: 2012 i2b2 Challenge". Journal of the American Medical Informatics Association. 20 (5): 806–13. doi:10.1136/amiajnl-2013-001628. PMC 3756273. PMID 23564629.
  6. ^ Stubbs A, Kotfila C, Uzuner Ö (December 2015). "Automated systems for the de-identification of longitudinal clinical narratives: Overview of 2014 i2b2/UTHealth shared task Track 1". Journal of Biomedical Informatics. 58 Suppl: S11–9. doi:10.1016/j.jbi.2015.06.007. PMC 4989908. PMID 26225918.
  7. ^ Kim, JD, Ohta, T, Pyysalo, S, Kano, Y, Tsujii, J (2011). "Extracting Bio-Molecular Events From Literature - The BioNLP'09 Shared Task". Computational Intelligence. 27 (4): 513–540. doi:10.1111/j.1467-8640.2011.00398.x.
  8. ^ Kim, JD; Pyysalo, S; Nédellec, C; Ananiadou, S; Tsujii, J, eds. (2012). "Selected articles from the BioNLP Shared Task 2011". BMC Bioinformatics. 13.
  9. ^ Nédellec, C; Bossy, R; Kim, JD; Kim, JJ; Ohta, T; Pyysalo, S; Zweigenbaum, P, eds. (2012). Proceedings of the BioNLP Shared Task 2013 Workshop.
  10. ^ Nédellec, C; Bossy, R; Kim, JD, eds. (2016). Proceedings of the 4th BioNLP Shared Task Workshop.
  11. ^ Albright D, Lanfranchi A, Fredriksen A, Styler WF, Warner C, Hwang JD, Choi JD, Dligach D, Nielsen RD, Martin J, Ward W, Palmer M, Savova GK (2013). "Towards comprehensive syntactic and semantic annotations of the clinical narrative". Journal of the American Medical Informatics Association. 20 (5): 922–30. doi:10.1136/amiajnl-2012-001317. PMC 3756257. PMID 23355458.
  12. ^ Bada M, Eckert M, Evans D, Garcia K, Shipley K, Sitnikov D, Baumgartner WA, Cohen KB, Verspoor K, Blake JA, Hunter LE (July 2012). "Concept annotation in the CRAFT corpus". BMC Bioinformatics. 13 (1): 161. doi:10.1186/1471-2105-13-161. PMC 3476437. PMID 22776079.{{cite journal}}: CS1 maint: unflagged free DOI (link)