Transform Meta Info

Display Name
Mirror: External links found
Transform Name
WebsiteToWebsite_Mirror
Short Description
This transform uses Gary's Ruby website mirror to spider the site and extract links
Owner
Paterva
Author
Roelof Temmingh (roelof@paterva.com), Gary Oleary-Steele (garyo@sec-1.com)
Input
Website
Output
Website

Description

This transform will make a (partial) mirror of the web site and extract all external links found on the site - these will be returned as website entities. The slider plays a big role in this transform as it set the time-out for the mirroring process. The higher (to the right) the slider is set, the deeper the mirroring process will go, and hopefully, the more results you'll get. The process runs via a caching server (that is local on the box) which means that you won’t be doing the data transfer to the site twice (if you run the transform again) - expect of course if the first round did not manage to get the entire site. Also keep in mind that not all sites are mirror friendly. Flash based sites will give problems as will sites with exotic JavaScript menus and redirects.

Typical Use Case

URL --> Website ==> Externally linked websites

==> Mirror: External links found

--> Related Transform

Example

Starting with the URL for our homepage, we can convert the URL entity into a website entity. From the website entity we can run this transform to find the external websites that we link to.