Import messages from a Matrix room and output as HTML, TXT, JSON
Find a file
2026-01-07 23:10:15 +01:00
templates first commit 2025-12-27 16:59:02 +01:00
.gitignore first commit 2025-12-27 16:59:02 +01:00
database_connection.py first commit 2025-12-27 16:59:02 +01:00
dot-env-sample first commit 2025-12-27 16:59:02 +01:00
download_images.py first commit 2025-12-27 16:59:02 +01:00
export_messages.py more room upgrade fixes 2026-01-07 23:06:11 +01:00
import_messages.py join room performance fix 2026-01-07 23:10:15 +01:00
list_rooms.py first commit 2025-12-27 16:59:02 +01:00
matrix_connection.py first commit 2025-12-27 16:59:02 +01:00
Pipfile first commit 2025-12-27 16:59:02 +01:00
README.md more room upgrade fixes 2026-01-07 23:06:11 +01:00
schema.py more room upgrade fixes 2026-01-07 23:06:11 +01:00
setup.cfg first commit 2025-12-27 16:59:02 +01:00

Matrix Archive Tools

Import messages from a Matrix room, for research, archival, and preservation.

Use this responsibly and ethically. Don't re-publish people's messages without their knowledge and consent.

Based on https://github.com/osteele/matrix-archive

Setup

  • Install Pipenv. Run pipenv install.
  • copy dot-env-sample to .env and edit as needed

Authentication

You can authenticate in two ways:

  1. Token-based authentication (recommended): Set MATRIX_TOKEN to your access token.

    • The token will be automatically validated before use
    • If you initially provide username/password, the token will be automatically saved to your .env file for future use
    • This avoids repeated password logins
  2. Password authentication: Set MATRIX_USER and MATRIX_PASSWORD

    • Used as fallback if token is not set or invalid
    • After successful login, the token is automatically saved to .env

Also set:

  • MATRIX_ROOM_IDS: comma-separated list of Matrix room IDs (or a single id). Run pipenv run list to list the room ids.
  • MATRIX_HOST: your homeserver URL (defaults to https://matrix.org if not set)

Database

Set MONGODB_URI to a MongoDB connection URL, or install a local MongoDB instance.

Example .env file

MATRIX_TOKEN=your_access_token_here
MATRIX_HOST=https://xentonix.net
MATRIX_ROOM_IDS=!roomid1:server.org,!roomid2:server.org
MONGODB_URI=mongodb://localhost:27017/matrix_archive

Or with password authentication:

MATRIX_USER=@username:server.org
MATRIX_PASSWORD=your_password
MATRIX_HOST=https://xentonix.net
MATRIX_ROOM_IDS=!roomid1:server.org,!roomid2:server.org
MONGODB_URI=mongodb://localhost:27017/matrix_archive

Usage

Import Messages

pipenv run import imports the messages into the database.

Room Upgrades: The import process automatically detects when a room has been upgraded (when you see a tombstone event). When detected:

  • The upgrade relationship is saved to the database
  • The system attempts to automatically join the new room
  • Automatically imports messages from the new room
  • During export, messages from all rooms in the upgrade chain are automatically merged in chronological order

Finding Upgraded Rooms: If you're having trouble with a room alias that can't be joined, use the helper script:

pipenv run python find_upgraded_room.py '#kubuntu-devel:ubuntu.com'

This will:

  • Resolve the alias to a room ID
  • Check for upgrade tombstone events
  • Display the new room ID
  • Save the upgrade to the database

You can then add the new room ID to your MATRIX_ROOM_IDS in .env.

Export Messages

The export command supports two modes:

1. Export All Rooms (Default)

Simply run pipenv run export to export all rooms in ALL formats:

pipenv run export
# Exports all rooms to out/ directory in all formats (txt, html, json, yaml)

Options:

  • -f, --format [txt|html|json|yaml|all] - Export format (default: all formats)
  • -o, --output-dir DIR - Output directory (default: out)
  • --room-id ROOM_ID - Export only a specific room
  • --local-images / --no-local-images - Use local image references (default: false)
  • --copy-images / --no-copy-images - Copy images to output directory for web deployment (default: true)
  • --split-by-month / --no-split-by-month - Create monthly subfolders (default: true)

Examples:

pipenv run export                      # Export all rooms in all formats
pipenv run export -f txt               # Export all rooms as TXT only
pipenv run export -f html              # Export all rooms as HTML only
pipenv run export -o archives          # Export to archives/ directory
pipenv run export --room-id '!abc:...' # Export specific room in all formats

Text exports use an IRC-style format for easy reading:

[2024-12-26 14:23:01] <username> Message text here
[2024-12-26 14:24:15] * username performs an action
  
  --- Code Block ---
    code content here
  --- End Code ---

HTML exports include embedded CSS styling and support multiple message types (text, images, files, video, audio, emotes, notices).

Monthly archives: By default (--split-by-month), the export creates additional subfolders organized by year and month (e.g., 2024/01/, 2024/02/), each containing the messages from that month. Use --no-split-by-month to only create the full archive file.

Download Images

pipenv run download_images.py downloads all the thumbnail images in the database into a download directory (default thumbnails), skipping images that have already been downloaded.

Use the --no-thumbnails option to download full size images instead of thumbnails. In this case, the default directory is images instead of thumbnails.

Note: This uses authenticated media endpoints (/_matrix/client/v1/media/) which require your access token. Media downloads work with Matrix 1.11+ homeservers.

Web Deploy

To deploy your archives to a web server with images:

  1. Export with local images and copy them to the output directory:

    pipenv run export --local-images --copy-images -o out
    
  2. Upload the entire out/ directory to your web server. The images will be included at out/thumbnails/.

  3. Your HTML files will reference images with relative paths that work correctly:

    • out/Room.html → images at thumbnails/image.jpg
    • out/2025/03/Room.html → images at ../../../thumbnails/image.jpg

Example: If you deploy out/ to https://yoursite.com/archive/, the structure will be:

https://yoursite.com/archive/Room_Name.html
https://yoursite.com/archive/2025/03/Room_Name.html
https://yoursite.com/archive/thumbnails/image1.jpg
https://yoursite.com/archive/thumbnails/image2.jpg

License

MIT