World Scientific
Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×

System Upgrade on Tue, May 28th, 2024 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at customercare@wspc.com for any enquiries.
https://doi.org/10.1142/9781800614079_0015Cited by:1 (Source: Crossref)
Abstract:

Collecting and making use of publicly available data is not always straightforward, particularly for interdisciplinary researchers who often lack skills to deal with technical issues that arise during the process. This chapter gives an overview of the challenges involved in identifying and collecting materials, and outlines a general technical framework for building effective and sustainable computer programmes to scrape, process and store online open source materials into structured datasets for research purposes. We also discuss the data licensing process, which is essential for experiment reproducibility, along with ethical considerations when working with the data to protect both researchers and the general population. We demonstrate, as a case study, how we collect and handle cybercrime and extremist resources at the Cambridge Cybercrime Centre – an interdisciplinary initiative combining diverse expertise at the University of Cambridge.