Overview
The Wayback Machine is a digital archive of the World Wide Web, founded by the Internet Archive, a non-profit organization based in San Francisco. It allows the user to go “back in time” and see what websites looked like in the past.
The Wayback Machine Transforms allow you to browse snapshots and archived content of hundreds of billions of websites, going back for years. Uncover deleted pages, hidden files, changed content and more.
To read more, visit the Data Hub item page on Maltego's primary webpage here.
Wayback Machine uses the Memento Protocol.
API Documentation
- https://github.com/internetarchive/wayback/tree/master/wayback-cdx-server
API Endpoints
- CDX API
- >https://web.archive.org/cdx/search/cdx?query_params
Wayback Machine uses the following matching levels to determine the snapshot results:
Match Type | Description | Example |
Exact | Snapshots of the given URL are returned | Search URL: foobar.com Snapshots are returned for: foobar.com |
Prefix | Snapshots of all resources which have the given URL as prefix are returned | Search URL: foobar.com Snapshots are returned for: foobar.com |
Host | Snapshots of all resources under the given hostname are returned
| Search URL: foobar.com Snapshots are returned for: foobar.com and anything under foobar.com |
Domain | Snapshots of all resources under the given domain name are returned | Search URL: foobar.com Snapshots are returned for: foobar.com and anything under foobar.com And also sub hosts *.foobar.com |
To Snapshots
Transform Meta Info
Transform Inputs
Setting Name
| Setting type | Default Value | Optional | Popup | Display | Auth | Comment |
httpTimeout | String | 90000 | true | false | HTTP timeout(ms) | false | This timeout determines the period a Transform should run before it terminates. When the timeout is reached, the Transform server will terminate pagination and return all available results. This is useful for large queries. |
statusCode | String | 20.* | true | false | HTTP status code when the snapshot was taken (set 0 for any status code) | false | Wayback Machine offers an option to filter snapshots by the status code of a URL at the time it was archived. Options accepted are regex input in the form accepted by Wayback such as 20.* and any standard status code between 100 – 599 |
Description
This Transform returns the Links Archived snapshots for a given Entity. It uses the following Wayback Machine Match Type for each input Entity:
Entity | Wayback Machine Match Type |
Document | Exact |
Image | Exact |
File | Exact |
Domain | Domain |
Website | Host |
URL | Prefix |
Use Case
This Transform can be used to view all the various snapshot versions for web documents, web files, web images, domains, websites and URLs.
To Snapshots between Dates
Transform Meta Info
Display Name | To Snapshots between Dates |
Transform Names | wayback.DocumentToSnapshotsInTimeframe wayback.ImageToSnapshotsInSnapshotsInTimeframe wayback.FileToSnapshotsInSnapshotsInTimeframe wayback.DomainToSnapshotsInSnapshotsInTimeframe wayback.WebsiteToSnapshotsInSnapshotsInTimeframe wayback.URLToSnapshotsInSnapshotsInTimeframe |
Short Description | Returns the given Entity’s archived snapshots within a specific time frame. |
Data Source
| Wayback Machine |
Owner | <Maltego Technologies GmbH> |
Author | <dev@maltego.com> |
Input Entity(s) | maltego.Document maltego.Image maltego.File maltego.Domain maltego.Website maltego.URL |
Output Entity(s) | maltego.wayback.DocumentSnapshot, maltego.wayback.ImageSnapshot, maltego.wayback.FileSnapshot, maltego.wayback.Snapshot |
Transform Inputs
Setting Name
| Setting type | Default Value | Optional | Popup | Display | Auth | Comment |
httpTimeout | String | 90000 | true | false | HTTP timeout(ms) | false | This timeout determines the period a Transform should run before it terminates. When the timeout is reached, the Transform server will terminate pagination and return all available results. This is useful for large queries. |
statusCode | String | 20.* | true | false | HTTP status code when the snapshot was taken (set 0 for any status code) | false | Wayback Machine offers an option to filter Snapshots by the status code of a URL at the time it was archived. Options accepted are regex input in the form accepted by Wayback such as 20.* and any standard status code between 100 – 599 |
beginDate | String | 20100101
| false | true | Begin Date - YYYYMMDD
| false | Begin date: allowed format Min – YYYY e.g. 2020 Max – YYYYMMDDHHMMSS e.g. 20200101010101
|
endDate | String | 20201231 | false | true | End Date - YYYYMMDD
| false | End date: allowed format Min – YYYY e.g. 2020 Max – YYYYMMDDHHMMSS e.g. 20200101010101
|
Description
This Transform returns the links archived snapshots for a given Entity within a specified time frame. It uses the following Wayback Machine Match Type for each input Entity:
Entity | Wayback Machine Match Type |
Document | Exact |
Image | Exact |
File | Exact |
Domain | Domain |
Website | Host |
URL | Prefix |
Use Case
This Transform can be used to view all the various snapshot versions for web documents, web files, web images, domains, websites and URLs during a specific time.
To Snapshots (Exact)
Transform Meta Info
Display Name | To Snapshots (Exact) |
Transform Names | wayback.URLToSnapshotsExact
|
Short Description | Returns the given URL’s exact archived snapshots. |
Data Source
| Wayback Machine |
Owner | <Maltego Technologies GmbH> |
Author | <dev@maltego.com> |
Input Entity(s) | maltego.URL |
Output Entity(s) | maltego.wayback.Snapshot |
Transform Inputs
Setting Name
| Setting type | Default Value | Optional | Popup | Display | Auth | Comment |
httpTimeout | String | 90000 | true | false | HTTP timeout(ms) | false | This timeout determines the period a Transform should run before it terminates. When the timeout is reached, the Transform server will terminate pagination and return all available results. This is useful for large queries. |
statusCode | String | 20.* | true | false | HTTP status code when the snapshot was taken (set 0 for any status code) | false | Wayback Machine offers an option to filter snapshots by the status code of a URL at the time it was archived. Options accepted are regex input in the form accepted by Wayback such as 20.* and any standard status code between 100 – 599 |
Description
This Transform returns the links archived snapshots for a given Entity.
Use Case
This Transform can be used to view all the exact snapshot versions for a URL.
To Original
Transform Meta Info
Display Name | To Snapshots |
Transform Names | waybackDocumentToOriginalDocument waybackImageToSnapshotsToOriginalImage waybackFileToSnapshotsToOriginalFile waybackSnapshotToSnapshotsToOriginalUrl |
Short Description | Returns the Original URL/File/Image/Document of the given snapshot Entity |
Data Source
| Wayback Machine |
Owner | <Maltego Technologies GmbH> |
Author | <dev@maltego.com> |
Input Entity(s) | maltego.wayback.DocumentSnapshot maltego.wayback.ImageSnapshot maltego.wayback.FileSnapshot |
Output Entity(s) | maltego.Document maltego.Image maltego.File |
Transform Inputs
Not applicable
Description
This Transform returns the original URL/File/Image/Document of the archived snapshot.
Use Case
This Transform can be used to pivot from an archived link and investigate the original URL of a snapshot.