Diffbot: Crawling with Visual Machine Learning

Amir HameedTechnology Leave a Comment

Have you ever wondered how social networks do URL previews so well when you share links? How do they know which images to grab, whom to cite as an author, or which tags to attach to the preview? Is it all crawling with complex regexes over source code? Actually, more often than not, it isn’t. Meta information defined in the source can be unreliable, and sites with less than stellar reputation often use them as keyword carriers, attempting to get search engines to rank them higher. Isn’t what we, the humans, see in front of us what matters anyway?

via Diffbot: Crawling with Visual Machine Learning.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.