Migrate from Amplitude to PostHog

Last updated:

|Edit this page

Prior to starting a historical data migration, ensure you do the following:

  1. Create a project on our US or EU Cloud.
  2. Sign up to a paid product analytics plan on the billing page (historic imports are free but this unlocks the necessary features).
  3. Raise an in-app support request with the Data pipelines topic detailing where you are sending events from, how, the total volume, and the speed. For example, "we are migrating 30M events from a self-hosted instance to EU Cloud using the migration scripts at 10k events per minute."
  4. Wait for the OK from our team before starting the migration process to ensure that it completes successfully and is not rate limited.
  5. Set the historical_migration option to true when capturing events in the migration.

Migrating from Amplitude is a two step process:

  1. Export your data from Amplitude using the organization settings export, Amplitude Export API, or the S3 export.

  2. Import data into PostHog using PostHog's Python SDK or batch API with the historical_migration option set to true. Other libraries don't support historical migrations yet.

Exporting data from Amplitude

There are three ways to export data from Amplitude.

1. Organization settings export

The simplest way is to go to your project in your organization settings and click the Export Data button.

Export button

2. Export API

To export data using Amplitude's Export API, start by getting your API and secret key for your project from your organization settings.

API keys

You can then use these in a request to get the data like this:

curl --location --request GET 'https://amplitude.com/api/2/export?start=<starttime>&end=<endtime>' \
-u '{api_key}:{secret_key}'

3. S3 export

If your data exceeds Amplitude's export size limitation, you can use their S3 export.

Importing Amplitude data into PostHog

Amplitude exports data in a zipped archive of JSON files. To get this data into PostHog, you need to:

  1. Unzip and read the data
  2. Convert the events from Amplitude's schema to PostHog's
  3. Capture the events into PostHog using the historical_migration option
  4. Alias device IDs to user IDs

Steps 1, 3, and 4 are relatively straightforward, but step 2 requires more explanation.

Converting Amplitude events

Although Amplitude events have a similar structure, you need to convert them to PostHog's schema. Many events and properties have different keys. For example, autocaptured events and properties in PostHog often start with $.

You can see Amplitude's event structure in their Export API documentation and PostHog's autocapture event structure in our autocapture docs.

Some conversions needed include:

  • Changing event names like [Amplitude] Page Viewed to $pageview
  • Changing event property keys like [Amplitude] Page Location to $current_url
  • Translating EMPTY values in user_properties to null
  • Changing event_time to an ISO 8601 formatted timestamp
  • Using $set and $set_once for person properties

Converting the data ensures that it matches the data PostHog captures and can be integrated in analysis.

Example Amplitude migration script

Below is a script that gets Amplitude data from the export folder, unzips it, converts the data to PostHog's schema, and then captures it in PostHog. It gives you a start, but likely needs to be modified to fit your infrastructure and data structure.

Python
from posthog import Posthog
from datetime import datetime
import json
import os
import gzip
# PostHog Python Client
posthog = Posthog(
<ph_project_api_key>,
host='https://us.i.posthog.com',
debug=True,
historical_migration=True
)
# Convert and capture Amplitude data
def capture_entry(entry):
distinct_id = entry.get("user_id") or entry.get("device_id")
event_name = entry["event_type"]
if event_name == "session_start":
return
if event_name == "[Amplitude] Page Viewed":
event_name = "$pageview"
if event_name in ["[Amplitude] Element Clicked", "[Amplitude] Element Changed"]:
event_name = "$autocapture"
timestamp = datetime.strptime(entry.get("event_time"), "%Y-%m-%d %H:%M:%S.%f")
device_type = entry.get("device_type")
if device_type == "Windows" or device_type == "Linux":
device_type = "Desktop"
elif device_type == "iOS" or device_type == "Android":
device_type = "Mobile"
else:
device_type = None
payload = {
"event": event_name,
"distinct_id": distinct_id,
"properties": {
"$os": entry.get("device_type"),
"$browser": entry.get("os_name"),
"$browser_version": int(entry.get("os_version")),
"$device_type": device_type,
"$current_url": entry.get("event_properties").get("[Amplitude] Page URL"),
"$host": entry.get("event_properties").get("[Amplitude] Page Domain"),
"$pathname": entry.get("event_properties").get("[Amplitude] Page Path"),
"$viewport_height": entry.get("event_properties").get("[Amplitude] Viewport Height"),
"$viewport_width": entry.get("event_properties").get("[Amplitude] Viewport Width"),
"$referrer": entry.get("event_properties").get("referrer"),
"$referring_domain": entry.get("event_properties").get("referring_domain"),
"$device_id": entry.get("device_id"),
"$ip": entry.get("ip_address"),
"$geoip_city_name": entry.get("city"),
"$geoip_subdivision_1_name": entry.get("region"),
"$geoip_country_name": entry.get("country"),
"$set_once": {
"$initial_referrer": None if entry.get("user_properties").get("initial_referrer") == "EMPTY" else entry.get("user_properties").get("initial_referrer"),
"$initial_referring_domain": None if entry.get("user_properties").get("initial_referring_domain") == "EMPTY" else entry.get("user_properties").get("initial_referring_domain"),
"$initial_utm_source": None if entry.get("user_properties").get("initial_utm_source") == "EMPTY" else entry.get("user_properties").get("initial_utm_source"),
"$initial_utm_medium": None if entry.get("user_properties").get("initial_utm_medium") == "EMPTY" else entry.get("user_properties").get("initial_utm_medium"),
"$initial_utm_campaign": None if entry.get("user_properties").get("initial_utm_campaign") == "EMPTY" else entry.get("user_properties").get("initial_utm_campaign"),
"$initial_utm_content": None if entry.get("user_properties").get("initial_utm_content") == "EMPTY" else entry.get("user_properties").get("initial_utm_content"),
},
"$set": {
"$os": entry.get("device_type"),
"$browser": entry.get("os_name"),
"$device_type": device_type,
"$current_url": entry.get("event_properties").get("[Amplitude] Page URL"),
"$pathname": entry.get("event_properties").get("[Amplitude] Page Path"),
"$browser_version": entry.get("os_version"),
"$referrer": entry.get("event_properties").get("referrer"),
"$referring_domain": entry.get("event_properties").get("referring_domain"),
"$geoip_city_name": entry.get("city"),
"$geoip_subdivision_1_name": entry.get("region"),
"$geoip_country_name": entry.get("country"),
}
},
"timestamp": timestamp
}
posthog.capture(
event=payload["event"],
distinct_id=payload["distinct_id"],
properties=payload["properties"],
timestamp=payload["timestamp"],
)
# Get Amplitude data from folder, unzip it, and use the capture function
def get_entries_from_folder_and_capture(folder_name):
count = 0
for filename in os.listdir(folder_name):
if filename.endswith('.json.gz'):
file_path = os.path.join(folder_name, filename)
with gzip.open(file_path, 'rt', encoding='utf-8') as f:
for line in f:
entry = json.loads(line)
capture_entry(entry)
count += 1
if count >= 6:
break
folder_name = '609539'
get_entries_from_folder_and_capture(folder_name)

This script may need modification depending on the structure of your Amplitude data, but it gives you a start.

Why are my event and DAU count lower in PostHog than Amplitude? PostHog blocks bot traffic by default, while Amplitude doesn't. You can see a full list of the bots PostHog blocks in our docs.

Aliasing device IDs to user IDs

In addition to capturing the events, we want to combine anonymous and identified users. For Amplitude, events rely on the device ID before identification and the user ID after:

EventUser IDDevice ID
Application installednull551dc114-7604-430c-a42f-cf81a3059d2b
Login123551dc114-7604-430c-a42f-cf81a3059d2b
Purchase123551dc114-7604-430c-a42f-cf81a3059d2b

We want to attribute "Application installed" to the user with ID 123, so we need to also call alias with both the device ID and user ID:

Python
posthog = Posthog(
'<ph_project_api_key>',
host='https://us.i.posthog.com',
debug=True,
historical_migration=True
)
posthog.alias(previous_id=device_id, distinct_id=user_id)

Since you only need to do this once per user, ideally you'd store a record (e.g. a SQL table) of which users you'd already sent to PostHog, so that you don't end up sending the same events multiple times.

Questions?

Was this page useful?

Next article

Migrate from Google Analytics to PostHog

Migrating data from Google Analytics is a three step process: Setting up the Google Analytics BigQuery streaming export Querying Google Analytics data from BigQuery Converting Google Analytics event data to the PostHog schema and capturing in PostHog Want a higher-level overview of PostHog? Check out our introduction to PostHog for Google Analytics users . 1. Setting up the Google Analytics BigQuery export Unfortunately, Google Analytics' historical data exports are limited. The best way to…

Read next article