Loading…

TOOLS THAT EASE DATA COLLECTION FROM THE WEB

The structure and content of a web page are encoded in Hypertext Markup Language (HTML), which you can see using your browser's 'view source' or 'inspect element' function. A common scraping task involves iterating over every possible URL from www.example. com/data/1 to www....

Full description

Saved in:
Bibliographic Details
Published in:Nature (London) 2020-09, Vol.585 (7826), p.621-622
Main Authors: DeVito, Nicholas J, Richards, Georgia C, Inglesby, Peter
Format: Article
Language:English
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The structure and content of a web page are encoded in Hypertext Markup Language (HTML), which you can see using your browser's 'view source' or 'inspect element' function. A common scraping task involves iterating over every possible URL from www.example. com/data/1 to www.example.com/data/100 (sometimes called 'crawling') and storing what you need from each page without the risk of human error during extraction. [...]be advised: depending on the number of pages, your Internet connection and the website's server, a scraping job could still take days.
ISSN:0028-0836
1476-4687
DOI:10.1038/d41586-020-02558-0