| templates | ||
| .gitignore | ||
| database_connection.py | ||
| dot-env-sample | ||
| download_images.py | ||
| export_messages.py | ||
| import_messages.py | ||
| list_rooms.py | ||
| matrix_connection.py | ||
| Pipfile | ||
| README.md | ||
| schema.py | ||
| setup.cfg | ||
Matrix Archive Tools
Import messages from a Matrix room, for research, archival, and preservation.
Use this responsibly and ethically. Don't re-publish people's messages without their knowledge and consent.
Based on https://github.com/osteele/matrix-archive
Setup
- Install Pipenv. Run
pipenv install. - copy dot-env-sample to .env and edit as needed
Authentication
You can authenticate in two ways:
-
Token-based authentication (recommended): Set
MATRIX_TOKENto your access token.- The token will be automatically validated before use
- If you initially provide username/password, the token will be automatically saved to your
.envfile for future use - This avoids repeated password logins
-
Password authentication: Set
MATRIX_USERandMATRIX_PASSWORD- Used as fallback if token is not set or invalid
- After successful login, the token is automatically saved to
.env
Also set:
MATRIX_ROOM_IDS: comma-separated list of Matrix room IDs (or a single id). Runpipenv run listto list the room ids.MATRIX_HOST: your homeserver URL (defaults tohttps://matrix.orgif not set)
Database
Set MONGODB_URI to a MongoDB connection URL, or install a local MongoDB instance.
Example .env file
MATRIX_TOKEN=your_access_token_here
MATRIX_HOST=https://xentonix.net
MATRIX_ROOM_IDS=!roomid1:server.org,!roomid2:server.org
MONGODB_URI=mongodb://localhost:27017/matrix_archive
Or with password authentication:
MATRIX_USER=@username:server.org
MATRIX_PASSWORD=your_password
MATRIX_HOST=https://xentonix.net
MATRIX_ROOM_IDS=!roomid1:server.org,!roomid2:server.org
MONGODB_URI=mongodb://localhost:27017/matrix_archive
Usage
Import Messages
pipenv run import imports the messages into the database.
Room Upgrades: The import process automatically detects when a room has been upgraded (when you see a tombstone event). When detected:
- The upgrade relationship is saved to the database
- The system attempts to automatically join the new room
- Automatically imports messages from the new room
- During export, messages from all rooms in the upgrade chain are automatically merged in chronological order
Finding Upgraded Rooms: If you're having trouble with a room alias that can't be joined, use the helper script:
pipenv run python find_upgraded_room.py '#kubuntu-devel:ubuntu.com'
This will:
- Resolve the alias to a room ID
- Check for upgrade tombstone events
- Display the new room ID
- Save the upgrade to the database
You can then add the new room ID to your MATRIX_ROOM_IDS in .env.
Export Messages
The export command supports two modes:
1. Export All Rooms (Default)
Simply run pipenv run export to export all rooms in ALL formats:
pipenv run export
# Exports all rooms to out/ directory in all formats (txt, html, json, yaml)
Options:
-f, --format [txt|html|json|yaml|all]- Export format (default: all formats)-o, --output-dir DIR- Output directory (default: out)--room-id ROOM_ID- Export only a specific room--local-images / --no-local-images- Use local image references (default: false)--copy-images / --no-copy-images- Copy images to output directory for web deployment (default: true)--split-by-month / --no-split-by-month- Create monthly subfolders (default: true)
Examples:
pipenv run export # Export all rooms in all formats
pipenv run export -f txt # Export all rooms as TXT only
pipenv run export -f html # Export all rooms as HTML only
pipenv run export -o archives # Export to archives/ directory
pipenv run export --room-id '!abc:...' # Export specific room in all formats
Text exports use an IRC-style format for easy reading:
[2024-12-26 14:23:01] <username> Message text here
[2024-12-26 14:24:15] * username performs an action
--- Code Block ---
code content here
--- End Code ---
HTML exports include embedded CSS styling and support multiple message types (text, images, files, video, audio, emotes, notices).
Monthly archives: By default (--split-by-month), the export creates additional
subfolders organized by year and month (e.g., 2024/01/, 2024/02/), each containing
the messages from that month. Use --no-split-by-month to only create the full archive file.
Download Images
pipenv run download_images.py downloads all the thumbnail images in the
database into a download directory (default thumbnails), skipping images that
have already been downloaded.
Use the --no-thumbnails option to download full size images instead of
thumbnails. In this case, the default directory is images instead of
thumbnails.
Note: This uses authenticated media endpoints (/_matrix/client/v1/media/)
which require your access token. Media downloads work with Matrix 1.11+ homeservers.
Web Deploy
To deploy your archives to a web server with images:
-
Export with local images and copy them to the output directory:
pipenv run export --local-images --copy-images -o out -
Upload the entire
out/directory to your web server. The images will be included atout/thumbnails/. -
Your HTML files will reference images with relative paths that work correctly:
out/Room.html→ images atthumbnails/image.jpgout/2025/03/Room.html→ images at../../../thumbnails/image.jpg
Example: If you deploy out/ to https://yoursite.com/archive/, the structure will be:
https://yoursite.com/archive/Room_Name.html
https://yoursite.com/archive/2025/03/Room_Name.html
https://yoursite.com/archive/thumbnails/image1.jpg
https://yoursite.com/archive/thumbnails/image2.jpg
License
MIT