Open navigation

Wayback Machine

Modified on: Tue, 8 Oct, 2024 at 8:35 AM

Overview

The Wayback Machine is a digital archive of the World Wide Web, founded by the Internet Archive, a non-profit organization based in San Francisco. It allows the user to go “back in time” and see what websites looked like in the past.

 

The Wayback Machine Transforms allow you to browse snapshots and archived content of hundreds of billions of websites, going back for years. Uncover deleted pages, hidden files, changed content and more.

 

To read more, visit the Data Hub item page on Maltego's primary webpage here.

 

Wayback Machine uses the Memento Protocol.

 

API Documentation

  • https://github.com/internetarchive/wayback/tree/master/wayback-cdx-server

 

API Endpoints

  • CDX API
    • >https://web.archive.org/cdx/search/cdx?query_params

 

Wayback Machine uses the following matching levels to determine the snapshot results:

 

Match Type

Description

Example

Exact

Snapshots of the given URL are returned

Search URL: foobar.com

Snapshots are returned for: foobar.com

Prefix

Snapshots of all resources which have the given URL as prefix are returned

Search URL: foobar.com

Snapshots are returned for: foobar.com

Host

Snapshots of all resources under the given hostname are returned

 

Search URL: foobar.com

Snapshots are returned for:

foobar.com and anything under

foobar.com 

Domain

Snapshots of all resources under the given domain name are returned

Search URL: foobar.com

Snapshots are returned for: foobar.com and anything under foobar.com And also sub hosts *.foobar.com

 

To Snapshots

Transform Meta Info

Transform Inputs

 

Setting Name

 

Setting type

Default Value

Optional

Popup

Display

Auth

Comment

httpTimeout

String

90000

true

false

HTTP timeout(ms)

false

This timeout determines the period a Transform should run before it terminates. When the timeout is reached, the Transform server will terminate pagination and return all available results. This is useful for large queries.

statusCode

String

20.*

true

false

HTTP status code when the snapshot was taken (set 0 for any status code)

false

Wayback Machine offers an option to filter snapshots by the status code of a URL at the time it was archived. Options accepted are  regex input in the form accepted by Wayback such as 20.* and  any standard status code between 100 – 599

 

Description

This Transform returns the Links Archived snapshots for a given Entity. It uses the following Wayback Machine Match Type for each input Entity:

 

Entity

Wayback Machine Match Type

Document

Exact

Image

Exact

File

Exact 

Domain

Domain

Website

Host

URL

Prefix 

 

Use Case

This Transform can be used to view all the various snapshot versions for web documents, web files, web images, domains, websites and URLs. 

 

To Snapshots between Dates

 

Transform Meta Info

Display Name

To Snapshots between Dates

Transform Names

wayback.DocumentToSnapshotsInTimeframe

wayback.ImageToSnapshotsInSnapshotsInTimeframe

wayback.FileToSnapshotsInSnapshotsInTimeframe

wayback.DomainToSnapshotsInSnapshotsInTimeframe

wayback.WebsiteToSnapshotsInSnapshotsInTimeframe

wayback.URLToSnapshotsInSnapshotsInTimeframe

Short Description

Returns the given Entity’s archived snapshots within a specific time frame. 

Data Source

 

Wayback Machine 

Owner

<Maltego Technologies GmbH>

Author

<dev@maltego.com>

Input Entity(s)

maltego.Document

maltego.Image

maltego.File

maltego.Domain

maltego.Website

maltego.URL

Output Entity(s)

maltego.wayback.DocumentSnapshot, maltego.wayback.ImageSnapshot, maltego.wayback.FileSnapshot, maltego.wayback.Snapshot 

 

Transform Inputs

 

Setting Name

 

Setting type

Default Value

Optional

Popup

Display

Auth

Comment

httpTimeout

String

90000

true

false

HTTP timeout(ms)

false

This timeout determines the period a Transform should run before it terminates. When the timeout is reached, the Transform server will terminate pagination and return all available results. This is useful for large queries.

statusCode

String

20.*

true

false

HTTP status code when the snapshot was taken (set 0 for any status code)

false

Wayback Machine offers an option to filter Snapshots by the status code of a URL at the time it was archived. Options accepted are  regex input in the form accepted by Wayback such as 20.* and  any standard status code between 100 – 599

beginDate

String

20100101

 

false

true

Begin Date - YYYYMMDD

 

false

Begin date: allowed format

Min – YYYY e.g. 2020

Max – YYYYMMDDHHMMSS e.g. 

20200101010101

 

endDate

String

20201231

false

true

End Date - YYYYMMDD

 

false

End date: allowed format

Min – YYYY e.g. 2020

Max – YYYYMMDDHHMMSS e.g. 

20200101010101

 

 

Description

This Transform returns the links archived snapshots for a given Entity within a specified time frame. It uses the following Wayback Machine Match Type for each input Entity:

 

Entity

Wayback Machine Match Type

Document

Exact

Image

Exact

File

Exact 

Domain

Domain

Website

Host

URL

Prefix 

 

Use Case

This Transform can be used to view all the various snapshot versions for web documents, web files, web images, domains, websites and URLs during a specific time. 

 

To Snapshots (Exact)

Transform Meta Info

Display Name

To Snapshots (Exact)

Transform Names

wayback.URLToSnapshotsExact

 

Short Description

Returns the given URL’s exact archived snapshots.

Data Source

 

Wayback Machine 

Owner

<Maltego Technologies GmbH>

Author

<dev@maltego.com>

Input Entity(s)

maltego.URL

Output Entity(s)

maltego.wayback.Snapshot

 

Transform Inputs

Setting Name

 

Setting type

Default Value

Optional

Popup

Display

Auth

Comment

httpTimeout

String

90000

true

false

HTTP timeout(ms)

false

This timeout determines the period a Transform should run before it terminates. When the timeout is reached, the Transform server will terminate pagination and return all available results. This is useful for large queries.

statusCode

String

20.*

true

false

HTTP status code when the snapshot was taken (set 0 for any status code)

false

Wayback Machine offers an option to filter snapshots by the status code of a URL at the time it was archived. Options accepted are  regex input in the form accepted by Wayback such as 20.* and  any standard status code between 100 – 599

 

Description

This Transform returns the links archived snapshots for a given Entity. 

 

Use Case

This Transform can be used to view all the exact snapshot versions for a URL.

 

To Original

Transform Meta Info

 

Display Name

To Snapshots

Transform Names

waybackDocumentToOriginalDocument

waybackImageToSnapshotsToOriginalImage

waybackFileToSnapshotsToOriginalFile

waybackSnapshotToSnapshotsToOriginalUrl

Short Description

Returns the Original URL/File/Image/Document of the given snapshot Entity

Data Source

 

Wayback Machine 

Owner

<Maltego Technologies GmbH>

Author

<dev@maltego.com>

Input Entity(s)

maltego.wayback.DocumentSnapshot

maltego.wayback.ImageSnapshot

maltego.wayback.FileSnapshot

Output Entity(s)

maltego.Document

maltego.Image

maltego.File

 

Transform Inputs

Not applicable

 

Description

This Transform returns the original URL/File/Image/Document of the archived snapshot.

 

Use Case

This Transform can be used to pivot from an archived link and investigate the original URL of a snapshot.

Did you find it helpful? Yes No

Send feedback
Sorry we couldn't be helpful. Help us improve this article with your feedback.