Attention! with update to layer 167, JSON output format got some important changes, check them out.
Exports messages (as JSON) and media from specified dialogs, groups and channels.
It gets only new (after already fetched) messages and resumes file downloads (if interrupted).
It works as a Telegram client. So yes, you will have to enter you phone, confirmation code and password (if any).
It will not fetch channel comments. If you need them, you should join channel's dicussion group.
go install github.com/3bl3gamer/tg_history_dumper@latest
tg_history_dumper [args]
Or
git clone https://github.com/3bl3gamer/tg_history_dumper
cd tg_history_dumper
go build
./tg_history_dumper [args]
app_id
and app_hash
must be obtained from Telegram. More info at https://core.telegram.org/api/obtaining_api_id#obtaining-api-id
tg_history_dumper -app-id=12345 -app-hash=abcdefg
It will ask credentials, save session to tg.session
file and download all dialogs (without media) to history
folder.
...is read from config.json
, different file may be provided via -config
argument.
Format:
{
"app_id": 12345,
"app_hash": "abcdefg",
"socks5_proxy_addr": "127.0.0.1:9050",
"socks5_proxy_user": "hackyhack",
"socks5_proxy_password": "passw0rd",
"request_interval_ms": 1000,
"session_file_path": "tg.session",
"out_dir_path": "history",
"history": [
"all",
{"exclude": {"type": "channel"}},
{"username": "my_channel"}
],
"stories": "none",
"media": [
{"type": "user"},
{"username": "my_channel"}
],
"history_limit": {
"5000": [
"all",
{"exclude": {"type": "user"}}
]
},
"dump_account": "off",
"dump_contacts": "off",
"dump_sessions": "off"
}
app_id
andapp_hash
— see preparing;socks5_proxy_addr
— (optional)address:post
of SOCKS5 proxy;socks5_proxy_user
— (optional) username for SOCKS5 proxy (if auth is required);socks5_proxy_password
— (optional) password for SOCKS5 proxy (if auth is required);request_interval_ms
— (optional, default is 1000) interval for requesting history message chunks (may be decreased, though it likely will not speed up the process, since TG has query rate limits);session_file_path
— (optional, default istg.session
) session file location (you will not have to login next time if it is present);out_dir_path
— (optional, default ishistory
) folder for saved messages and media;history
— (optional, default is{"type": "user"}
) chat filtering rules;stories
— (optional, default is"none"
) stories filtering rules;media
— (optional, default is"none"
) chat media filtering rules, only applies to chats matched tohistory
rules and to stories matched tostories
rules;history_limit
— (optional, default is{}
) new chat history limiting rules;dump_account
— (optional, default is"off"
, use"write"
to enable dump) dumps basic account information to file, does not apply when-list-chats
enabled;dump_contacts
— (optional, default is"off"
, use"write"
to enable dump) dumps contacts information to file, does not apply when-list-chats
enabled;dump_sessions
— (optional, default is"off"
, use"write"
to enable dump) dumps active sessions to file, does not apply when-list-chats
enabled.
If config has non-empty app_id
and app_hash
, dump may be updated just with tg_history_dumper
(without arguments).
Currently, stories are saved from user's/channel's public "posts" tab and (if accessible) from stories archive. Recent stories with the "Post to My Profile" switch turned off will not be saved.
Stories dumping is relatively slow, so there is a -skip-stories
flag to bypass stories saving as if config.stories
was set to "none"
.
History limits do not affect stories.
Limits define how many messages will be dumped for chats for the first time. They are configured as limit_count:rules. If chat matches more than one rule, the lower limit is applied. If chat does not match any rules, all messages are dumped. If there are already some messages from previous dump for the chat, its limits are ignored.
For example, this config sets limit to 5000 for groups, 10000 for channels, dialogs remain unlimited:
"history_limit": {
"5000": {"type": "group"},
"10000": {"type": "channel"}
}
Rules used to accept/reject specific chats (or media in these chats). Chat/file is accepted if it matches to some rules and not later excluded by others. Everything is rejected by default.
For example, this rule accepts only dialogs:
"history": {"type": "user"}
Accepts all chats except channels:
"history": [
"all",
{"exclude": {"type": "channel"}}
]
Accepts all media from dialogs and group chats but group media size is limited to 500 MiB:
"media": [
{"type": "user"},
{"type": "group", "media_max_size": "500M"}
]
Accepts all media from dialogs and two groups, groups media size is limited to 500 MiB:
"media": [
{"type": "user"},
{"only": [
{"title": "Group A"},
{"title": "Group B"},
], "with": {"media_max_size": "500M"}}
]
only
-rule may be rewritten as:
{"title": "Group A", "media_max_size": "500M"},
{"title": "Group B", "media_max_size": "500M"}
{
"id": 123,
"title": "Name",
"username": "uname",
"type": "user",
"media_max_size": "500M"
}
Matches chat/file by all provided attributes.
id
can be obtained from chats list;title
for users is"FirstName LastName"
;type
may be"user"
,"group"
or"channel"
;media_max_size
is only used inconfig.media
and must be in form"500M"
,"500K"
or"500"
(for bytes).
{"exclude": "inner rule"}
Excludes chats/files from match even if they matched some previous rule.
["rule0", "rule1", "more rules"]
Applies inner rules one by one.
{"only": "only-rule", "with": "with-rule"}
Tries with-rule
only if only-rule
matched.
"all"
Matches everything.
"none"
Matches nothing.
tg_history_dumper -list-chats
Outputs chats in format <type> <id> <limit> <title> (<username>)
.
Title for users is FirstName LastName
.
If chat does not match config.history
rules, the line is grayed out.
Some arguments override values from config
.
For example -chat='Some Chat'
may be used to override config.history
and update messages only from Some Chat
.
$ tg_history_dumper --help
Usage of tg_history_dumper:
-app-hash string
app hash
-app-id int
app id
-chat string
title of the chat to dump, overrides config.history
-config string
path to config file (default "config.json")
-debug
show debug log messages
-debug-tg
show debug TGClient log messages
-dump-account string
enable basic user information dump, use 'write' to enable dump, overrides config.dump_account
-dump-contacts string
enable contacts dump, use 'write' to enable dump, overrides config.dump_contacts
-dump-sessions string
enable active sessions dump, use 'write' to enable dump, overrides config.dump_sessions
-list-chats
list all available chats, do not dump anything
-logout
logout and remove session file, do not dump anything
-out string
output directory path, overrides config.out_dir_path
-preview-http string
HTTP service address to browse through the dump
-session string
session file path, overrides config.session_file_path
-skip-stories
do not dump sotries, overrides config.stories
-socks5 string
socks5 proxy address:port, overrides config.socks5_proxy_addr
-socks5-password string
socks5 proxy password, overrides config.socks5_proxy_password
-socks5-user string
socks5 proxy username, overrides config.socks5_proxy_user
All messages are saved as JSON Lines (aka jsonl) to file history/<id>_<title>
. Dumper searches directories only by id and renames folder when title is changed.
Each JSON object has special field "_"
with type name. Outermost objects has one more special field "_TL_LAYER"
with layer number (API version). For example:
{"Date":1601491406,"Message":"Hello World!","PeerID":{"ChannelID":1261507434,"_":"TL_peerChannel"},"_":"TL_message","_TL_LAYER":119}
(some message fields were removed for readability)
Related users and chats (aka peers) are saved to history/users
and history/chats
respectively. Each file is JSON Lines with some basic user/chat data like id, usrname, first/lastname, title, etc.
Lines are added not only when new peer is encountered but also when existing peer data (title for example) has changed compared to previous dump. So same users/chats may appear multiple times there. The last record for each id is the most recent one.
This applies only to users/chats own fields (name, phone, etc.). History messages are saved only once, edit/deletion is not detected.