Text Analysis: http://libguides.uta.edu/textanalysis
Acknowledgement
This guide is adapted by permission from Angela Zoss, Data Visualization Coordinator, Duke University.
TEXT AND DATA MINING (TDM) is the computational analysis of vast quantities of digital information, whether free-form natural language text or structured data. Using specialized software, researchers can extract data, identify trends, look for patterns and better understand the relationships of terms within and between documents.
Analysis might focus on word frequency, words that frequently appear near each other, contextual information for key words, common phrases and other patterns. Materials to be analyzed range from websites (such as publicly available Facebook posts), 16th C. manuscripts, DNA sequences, to old newspapers.
Introduction to Text Analysis
"Text analysis" is a broad term covering various processes by which text and natural language documents can be modified so that they can be organized and described.
This guide collects resources for several phases of the text analysis process, including text collection, text parsing and cleaning, text summary and analysis methods, and text visualization.
Overviews/summaries:
- Ted Underwood – Seven ways humanists are using computers to understand text
- Tooling Up for Digital Humanities – Text Analysis
- Ryan Shaw – Text Mining
- O'Connor, Bamman, & Smith (2011) – Computational Text Analysis for Social Science
- Ben Schmidt – Comparing Corpuses by Word Use
Web Scraping
Web scraping (web harvesting or web data extraction) is a computer software technique of extracting information from websites. https://en.wikipedia.org/wiki/Web_scraping
- wget
- Nvivo (with NCapture add-in)
- School of Data: Scraping Resources
- School of Data: Course on Web Scraping
- Table of screen scrapers (Google spreadsheet)
- ScraperWiki
- Web Scraping Using PHP and jQuery
Related Guides
by Kelly Visnak Last Updated Nov 1, 2018 56 views this year
by Andy Herzog Last Updated Jan 21, 2019 1108 views this year
by Diane Shepelwich Last Updated Jan 31, 2019 62 views this year
· Last Updated: Jan 2, 2018 12:58 PM
· URL: https://libguides.uta.edu/textanalysis
Subjects: Communication, English, History, Linguistics, Philosophy & Humanities
Except where otherwise noted, this work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. For details and exceptions, see the Library Copyright Statement.
© 2016 The University of Texas at Arlington.
University of Texas Arlington Libraries
702 Planetarium Place · Arlington, TX 76019 · 817-272-3000
702 Planetarium Place · Arlington, TX 76019 · 817-272-3000
No comments:
Post a Comment