Attachment server
The documentation states
The benefits of using "Archive" over "Export" and "Dump" are:
• it is well documented (see slackdump help chunk );
• it can be converted to other formats, including the native Slack Export
However upon using archive, it generates a folder of json.gz files. It is not possible to use a folder as input to the convert command
How can we use archive, set to output a native Slack export? I am trying to use Slackord which works with native Slack format, but cannot achieve that format here
Hey @jpfleischer , thanks for asking.
- It is possible to convert archive to Slack Export format, but not Dump format (not yet):
Here's the example on my toy workspace: First, archive --
$ slackdump archive
2025/01/09 22:13:59 INFO stream result=<CHM82GF99>
2025/01/09 22:14:00 INFO stream result=<CHY5HUESG>
2025/01/09 22:14:00 INFO stream result=<CHYLGDP0D>
2025/01/09 22:14:01 INFO stream result=<C011D885FP0>
2025/01/09 22:14:03 INFO stream result=<C045TUGSSTW>
2025/01/09 22:14:04 INFO stream result=<C04BJATRQRL>
2025/01/09 22:14:05 INFO stream result=<C07V963QS7K>
2025/01/09 22:14:05 INFO stream result=<Thread[C07V963QS7K:1730798743.474859]>
2025/01/09 22:14:06 INFO stream result=<D03MW5QR8R3>
2025/01/09 22:14:07 INFO stream result=<D034LJA178B>
2025/01/09 22:14:08 INFO stream result=<D015RNCFNRG>
2025/01/09 22:14:09 INFO stream result=<DNC8P5L69>
2025/01/09 22:14:10 INFO stream result=<DL98HT3QA>
2025/01/09 22:14:11 INFO stream result=<DHYNUJ00Y>
2025/01/09 22:14:11 INFO stream result=<Thread[DHYNUJ00Y:1710145284.728069]>
2025/01/09 22:14:12 INFO stream result=<Thread[DHYNUJ00Y:1710144976.814909]>
2025/01/09 22:14:12 INFO stream result=<Thread[DHYNUJ00Y:1665917454.731419]>
2025/01/09 22:14:12 INFO stream result=<DHMAB25DY>
2025/01/09 22:14:12 INFO stream result=<Thread[DHMAB25DY:1710063528.879959]>
2025/01/09 22:14:12 INFO Recorded workspace data filename=slackdump_20250109_221358 took=14.31128438s
Next — convert. No flags specified, converts to export by default --
$ slackdump convert slackdump_20250109_221358
2025/01/09 22:14:28 INFO converting input_format=chunk source=slackdump_20250109_221358 output_format=export output=slackdump_20250109_221428.zip
2025/01/09 22:14:28 WARN skipping file=F047E154GDN error="invalid file mode \"hidden_by_limit\""
2025/01/09 22:14:28 WARN skipping file=F046MB9M29K error="invalid file mode \"hidden_by_limit\""
2025/01/09 22:14:28 WARN skipping file=F06P7HCJF7B error="invalid file mode \"hidden_by_limit\""
2025/01/09 22:14:28 WARN skipping file=F06PU0ZAN2Z error="invalid file mode \"hidden_by_limit\""
2025/01/09 22:14:28 WARN skipping file=F06PU0ZJR9T error="invalid file mode \"hidden_by_limit\""
2025/01/09 22:14:28 WARN skipping file=F06QKMU57SL error="invalid file mode \"hidden_by_limit\""
2025/01/09 22:14:28 WARN skipping file=F06PZC3LB1A error="invalid file mode \"hidden_by_limit\""
2025/01/09 22:14:28 WARN skipping file=F06QKNJJ2F2 error="invalid file mode \"hidden_by_limit\""
2025/01/09 22:14:28 WARN skipping file=F06PWUTV3B4 error="invalid file mode \"hidden_by_limit\""
2025/01/09 22:14:28 INFO completed took=396.510631ms
- You can use
slackdump viewto view any of the generated formats, for example to view the archive that was just generated:
$ slackdump view slackdump_20250109_221358
2025/01/09 22:19:18 INFO listening on addr=localhost:8080
<...>
In order to generate an export for Slackord, just run slackdump convert <dir name>
It is not possible to use a folder as input to the convert command
Where did you encounter that limitation?
I see, thank you for your prompt response.
I notice that in your pasted terminal input/output, you do not slackdump view the newly generated zip file but rather the folder from before. When I do that, I can see all the channels listed on the left hand side, but clicking on any of the channels or direct messages gives an empty output in the right.
Here is what happens when I try to run it on the zip file. Clicking on any of the channels listed does nothing:
$ ./slackdump view slackdump_20250109_112936.zip
2025/01/09 11:32:22 INFO listening on addr=localhost:8080
2025/01/09 11:32:22 "GET http://localhost:8080/ HTTP/1.1" from 127.0.0.1:6264 - 200 9248B in 1.6589ms
2025/01/09 11:32:34 ERROR AllMessages in=channelHandler channel=CQKALJEJE error="AllMessages: walk: file does not exi
st: general"
2025/01/09 11:32:34 "GET http://localhost:8080/archives/CQKALJEJE HTTP/1.1" from 127.0.0.1:6264 - 500 48B in 3.2734ms
2025/01/09 11:33:27 ERROR AllMessages in=channelHandler channel=CQY47JGG0 error="AllMessages: walk: file does not exi
st: random"
2025/01/09 11:33:27 "GET http://localhost:8080/archives/CQY47JGG0 HTTP/1.1" from 127.0.0.1:6264 - 500 47B in 0s
2025/01/09 11:33:29 ERROR AllMessages in=channelHandler channel=D01AT204YAJ error="AllMessages: walk: file does not e
xist: D01AT204YAJ"
2025/01/09 11:33:29 "GET http://localhost:8080/archives/D01AT204YAJ HTTP/1.1" from 127.0.0.1:6264 - 500 52B in 0s
2025/01/09 11:33:30 ERROR AllMessages in=channelHandler channel=D01AT204YAJ error="AllMessages: walk: file does not e
xist: D01AT204YAJ"
Here is what happens when I run it on the folder, clicking the channels changes the header but has no messages
$ ./slackdump view slackdump_20250109_044827/
2025/01/09 11:34:19 INFO listening on addr=localhost:8080
2025/01/09 11:34:20 "GET http://localhost:8080/ HTTP/1.1" from 127.0.0.1:6274 - 200 9235B in 1.0603ms
2025/01/09 11:34:26 "GET http://localhost:8080/archives/C0187EFDFFY HTTP/1.1" from 127.0.0.1:6274 - 200 909B in 1.409
8ms
2025/01/09 11:34:39 "GET http://localhost:8080/archives/C01B5TWD6TS HTTP/1.1" from 127.0.0.1:6274 - 200 915B in 1.080
9ms
Interesting, to list the contents of the folder up to the file attachment id and output their sizes, would you mind running this against the archive folder?
find slackdump_20250109_044827 -depth 1 -exec ls -lgo {} + > chunk_contents.txt
Note the output is redirected to chunk_contents.txt
And then for the export zip (lists all files except attachment names):
unzip -l slackdump_20250109_221428.zip | grep -v '__uploads\/F[0-9A-Z]\+\/.\+$' > archive_contents.txt
and upload these files?
If there are sensitive channel names, that you'd rather not share in public issue, you can gpg encrypt it with my public key like this:
find slackdump_20250109_044827 -depth 1 -exec ls -lgo {} + | slackdump tools encrypt > chunk_contents.gpg
Also: did you specify a time range when archiving?
If you have jq installed, could you also run the following command for me to get the idea of number of chunks of different types in the #general channel (hopefully there are some):
gzcat CQKALJEJE.json.gz | jq '.t' | awk '{count[$1]++}END{for(t in count)print t,count[t]}' > counts.txt
and upload counts.txt as well?
As a note, i deleted the exported files and reran it again to see if it was a fluke (the issue still persists) and that is why the folder name is different. chunk_contents.txt archive_contents.txt
(why is there only one channel? that's the slackbot channel but I want others too)
I did not specify a time range since I want absolutely everything. counts.txt
P.S. i am on windows 10
Hey, thanks for posting the files. I'm looking at the general channel counts, and it seems that it doesn't contain any messages - There're only 3 chunks in total: (1) type 0 - channel messages, but most likely judging by the size of json.gz it's a terminal empty slice, (2) type 5 which is channel information and (3) type 7 which is channel users.
Could you confirm, If you open this channel (#general) in slack client, are there any messages which are not hidden by 90 days Slack's paywall?
Out of all chunks, it seems that only the conversation ID DQY47J7DW has messages.
I see. The messages are all hidden by the paywall, is this tool not able to circumvent that? I can see the messages are there but they are just blurry.
The only way to work around the paywall, is to pay them. The API doesn't allow to access the data behind paywall as well.
If you really need that data, I suggest the following masterplan:
- deactivate all users to minimise the sub cost
- then get a Pro subscription
- download everything, and
- cancel the sub.
I.e. this is the price for my toy workspace (1 active user):
I guess you already have it figured out with Discord, but here's the tip of the day - did you know that you can have your own Slack with blackjack and hookers: https://github.com/mattermost/mattermost
this is great, thanks for your patience. I paid the subscription fee but some files are giving
INFO Recorded workspace data filename=slackdump_20250110_001853 took=53m10.9868448s
2025/01/10 01:14:48 ERROR WithRetry maxAttempts=3 error="download to \"__uploads\\\\somestring\\\\20200913_213541.mp4\" failed, [src=https://files.slack.com/files-pri/someOtherString-somestring/download/20200913_213541.mp4]: unexpected EOF" attempt=1
any way to retry only those that failed?
- I see that this happened after the main archive processor has finished. Did this happen only after that or was there EOF errors before that as well?
- If the number of failed files is only a handful, you could run
slackdump search files "filename"then it would try and redownload it. - Looks like Slack terminates the connection, could you try and locate this file and see if it exists and you can play it through Slack client? The easiest way would be to use Slack search files feature and search by file name.
Hey @jpfleischer , I submitted #400 to address this type of errors and add an ability to manually run slackdump against the archive to redownload any missing files, see slackdump tools redownload <archive_directory> command, slackdump tools help redownload to get detailed description (which you can read in the PR as well :) )
Once merged into master, you can use it if you compile it from sources, I'll include it in the v3.0.3, but before I release, I'd like to to add the long awaited channel canvas support.
@jpfleischer did it work for you?
Hi rusq, i did end up using the search command for the two files that failed, and it worked.
So, as my end goal was to use Slackord after using the convert command to get native slack format, I did try to upload everything to a discord server.
Maybe my problem becomes less of a slackdump problem and more of a slackdump-slackord-collaboration problem, but while every message was indeed uploaded, the attachments that were sent were not the actual attachments (such as an MP3 file), but rather the name of the actual attachment (example.mp3) that was actually an HTML file pointing to a slack.com site.
The HTML looks like output.txt maybe there is a way to fix this in slackdump code? otherwise, I made a slackord issue at https://github.com/thomasloupe/Slackord/issues/109
But nonetheless, you have been immense help that has helped me accomplish my end goal of saving everything, because the attachments are still available on my local computer, and my direct messages as well. Thank you for being very responsive and for writing code in response to the errors I identified.
Thanks for your feedback, I'm glad that it worked.
For attachments, you could try and use this solution: https://github.com/rusq/slackdump/issues/371#issuecomment-2529901705 which was built by @codeallthethingz to import Slackdump-generated export into Slack. It updates the file links within the Slack Export archive and starts up a proxy to serve the files on the request of the target system, that's what probably the new reborn Slackord expects.
I'll reopen this, looks like it would benefit many, if this would be a built-in feature.
I started by looking at https://gist.github.com/codeallthethingz/38e340d15b26dc0b75e455aae37df8e2 which wasn't quite what I needed for Slack->Discord migration.
For anyone having this issue I've successfully used the attached scriptrewrite_slackdump_urls.py - just drop it in the export folder and run it.
It will rewrite the JSON files to use a new host/port - I'm just setting it to localhost and run a simple web server on the computer running Slackord2 to serve them up.