Skip to content

Commit da22642

Browse files
WsinePRESIDENT810
andauthored
Batch download implementation (#125)
* [Feat] 添加批量下载文件夹内全部文档的功能 (#121) * Initial support * Basically working * Use concurrency * Clean up * Integrate batch download into download functionality * Whoops * refactor: download documents in batch * format: tidy the code * update: batch download guideline --------- Co-authored-by: Jacket <[email protected]>
1 parent 86373fe commit da22642

File tree

15 files changed

+244
-79
lines changed

15 files changed

+244
-79
lines changed

README.md

Lines changed: 37 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Feishu2Md
1+
# feishu2md
22

33
[![Golang - feishu2md](https://img.shields.io/github/go-mod/go-version/wsine/feishu2md?color=%2376e1fe&logo=go)](https://go.dev/)
44
[![Unittest](https://github.com/Wsine/feishu2md/actions/workflows/unittest.yaml/badge.svg)](https://github.com/Wsine/feishu2md/actions/workflows/unittest.yaml)
@@ -20,13 +20,13 @@
2020
配置文件需要填写 APP ID 和 APP SECRET 信息,请参考 [飞书官方文档](https://open.feishu.cn/document/ukTMukTMukTM/ukDNz4SO0MjL5QzM/get-) 获取。推荐设置为
2121

2222
- 进入飞书[开发者后台](https://open.feishu.cn/app)
23-
- 创建企业自建应用,信息随意填写
24-
- 选择测试企业和人员,创建测试企业,绑定应用,切换至测试版本
25-
- (重要)打开权限管理,云文档,开通所有只读权限
26-
- 「查看、评论和导出文档」权限 `docs:doc:readonly`
27-
- 「查看 DocX 文档」权限 `docx:document:readonly`
28-
- 「查看、评论和下载云空间中所有文件」权限 `drive:drive:readonly`
29-
- 「查看和下载云空间中的文件」权限 `drive:file:readonly`
23+
- 创建企业自建应用(个人版),信息随意填写
24+
- (重要)打开权限管理,开通以下必要的权限(可点击以下链接参考 API 调试台->权限配置字段)
25+
- [获取文档基本信息](https://open.feishu.cn/document/server-docs/docs/docs/docx-v1/document/get),「查看新版文档」权限 `docx:document:readonly`
26+
- [获取文档所有块](https://open.feishu.cn/document/server-docs/docs/docs/docx-v1/document/list),「查看新版文档」权限 `docx:document:readonly`
27+
- [下载素材](https://open.feishu.cn/document/server-docs/docs/drive-v1/media/download),「下载云文档中的图片和附件」权限 `docs:document.media:download`
28+
- [获取文件夹中的文件清单](https://open.feishu.cn/document/server-docs/docs/drive-v1/folder/list)「查看、评论、编辑和管理云空间中所有文件」权限 `drive:file:readonly`
29+
- [获取知识空间节点信息](https://open.feishu.cn/document/server-docs/docs/wiki-v2/space-node/get_node),「查看知识库」权限 `wiki:wiki:readonly`
3030
- 打开凭证与基础信息,获取 App ID 和 App Secret
3131

3232
## 如何使用
@@ -71,6 +71,20 @@
7171
--appId value Set app id for the OPEN API
7272
--appSecret value Set app secret for the OPEN API
7373
--help, -h show help (default: false)
74+
75+
$ feishu2md dl -h
76+
NAME:
77+
feishu2md download - Download feishu/larksuite document to markdown file
78+
79+
USAGE:
80+
feishu2md download [command options] <url>
81+
82+
OPTIONS:
83+
--output value, -o value Specify the output directory for the markdown files (default: "./")
84+
--dump Dump json response of the OPEN API (default: false)
85+
--batch Download all documents under a folder (default: false)
86+
--help, -h show help (default: false)
87+
7488
```
7589
7690
**生成配置文件**
@@ -81,15 +95,28 @@
8195
8296
更多的配置选项请手动打开配置文件更改。
8397
84-
**下载为 Markdown**
98+
**下载单个文档为 Markdown**
8599
86-
通过 `feishu2md dl <your feishu docx url>` 直接下载,文档链接可以通过 **分享 > 开启链接分享 > 复制链接** 获得。
100+
通过 `feishu2md dl <your feishu docx url>` 直接下载,文档链接可以通过 **分享 > 开启链接分享 > 互联网上获得链接的人可阅读 > 复制链接** 获得。
87101
88102
示例:
89103
90104
```bash
91105
$ feishu2md dl "https://domain.feishu.cn/docx/docxtoken"
92106
```
107+
108+
**批量下载某文件夹内的全部文档为 Markdown**
109+
110+
此功能暂时不支持Docker版本
111+
112+
通过`feishu2md dl --batch <your feishu folder url>` 直接下载,文件夹链接可以通过 **分享 > 开启链接分享 > 互联网上获得链接的人可阅读 > 复制链接** 获得。
113+
114+
示例:
115+
116+
```bash
117+
$ feishu2md dl --batch -o output_directory "https://domain.feishu.cn/drive/folder/foldertoken"
118+
```
119+
93120
</details>
94121
95122
<details>

cmd/config.go

Lines changed: 10 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -15,13 +15,15 @@ type ConfigOpts struct {
1515

1616
var configOpts = ConfigOpts{}
1717

18-
func handleConfigCommand(opts *ConfigOpts) error {
18+
func handleConfigCommand() error {
1919
configPath, err := core.GetConfigFilePath()
20-
utils.CheckErr(err)
20+
if err != nil {
21+
return err
22+
}
2123

2224
fmt.Println("Configuration file on: " + configPath)
2325
if _, err := os.Stat(configPath); os.IsNotExist(err) {
24-
config := core.NewConfig(opts.appId, opts.appSecret)
26+
config := core.NewConfig(configOpts.appId, configOpts.appSecret)
2527
if err = config.WriteConfig2File(configPath); err != nil {
2628
return err
2729
}
@@ -31,13 +33,13 @@ func handleConfigCommand(opts *ConfigOpts) error {
3133
if err != nil {
3234
return err
3335
}
34-
if opts.appId != "" {
35-
config.Feishu.AppId = opts.appId
36+
if configOpts.appId != "" {
37+
config.Feishu.AppId = configOpts.appId
3638
}
37-
if opts.appSecret != "" {
38-
config.Feishu.AppSecret = opts.appSecret
39+
if configOpts.appSecret != "" {
40+
config.Feishu.AppSecret = configOpts.appSecret
3941
}
40-
if opts.appId != "" || opts.appSecret != "" {
42+
if configOpts.appId != "" || configOpts.appSecret != "" {
4143
if err = config.WriteConfig2File(configPath); err != nil {
4244
return err
4345
}

cmd/download.go

Lines changed: 98 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ import (
66
"os"
77
"path/filepath"
88
"strings"
9+
"sync"
910

1011
"github.com/88250/lute"
1112
"github.com/Wsine/feishu2md/core"
@@ -17,29 +18,20 @@ import (
1718
type DownloadOpts struct {
1819
outputDir string
1920
dump bool
21+
batch bool
2022
}
2123

22-
var downloadOpts = DownloadOpts{}
24+
var dlOpts = DownloadOpts{}
25+
var dlConfig core.Config
2326

24-
func handleDownloadCommand(url string, opts *DownloadOpts) error {
27+
func downloadDocument(client *core.Client, ctx context.Context, url string, opts *DownloadOpts) error {
2528
// Validate the url to download
26-
docType, docToken, err := utils.ValidateDownloadURL(url)
27-
utils.CheckErr(err)
29+
docType, docToken, err := utils.ValidateDocumentURL(url)
30+
if err != nil {
31+
return err
32+
}
2833
fmt.Println("Captured document token:", docToken)
2934

30-
// Load config
31-
configPath, err := core.GetConfigFilePath()
32-
utils.CheckErr(err)
33-
config, err := core.ReadConfigFromFile(configPath)
34-
utils.CheckErr(err)
35-
36-
// Create client with context
37-
ctx := context.WithValue(context.Background(), "output", config.Output)
38-
39-
client := core.NewClient(
40-
config.Feishu.AppId, config.Feishu.AppSecret,
41-
)
42-
4335
// for a wiki page, we need to renew docType and docToken first
4436
if docType == "wiki" {
4537
node, err := client.GetWikiNodeInfo(ctx, docToken)
@@ -48,24 +40,28 @@ func handleDownloadCommand(url string, opts *DownloadOpts) error {
4840
docToken = node.ObjToken
4941
}
5042
if docType == "docs" {
51-
return errors.Errorf("Feishu Docs is no longer supported. Please refer to the Readme/Release for v1_support.")
43+
return errors.Errorf(
44+
`Feishu Docs is no longer supported. ` +
45+
`Please refer to the Readme/Release for v1_support.`)
5246
}
5347

5448
// Process the download
5549
docx, blocks, err := client.GetDocxContent(ctx, docToken)
5650
utils.CheckErr(err)
5751

58-
parser := core.NewParser(ctx)
52+
parser := core.NewParser(dlConfig.Output)
5953

6054
title := docx.Title
6155
markdown := parser.ParseDocxContent(docx, blocks)
6256

63-
if !config.Output.SkipImgDownload {
57+
if !dlConfig.Output.SkipImgDownload {
6458
for _, imgToken := range parser.ImgTokens {
6559
localLink, err := client.DownloadImage(
66-
ctx, imgToken, filepath.Join(opts.outputDir, config.Output.ImageDir),
60+
ctx, imgToken, filepath.Join(opts.outputDir, dlConfig.Output.ImageDir),
6761
)
68-
utils.CheckErr(err)
62+
if utils.CheckErr(err) != nil {
63+
return err
64+
}
6965
markdown = strings.Replace(markdown, imgToken, localLink, 1)
7066
}
7167
}
@@ -83,7 +79,7 @@ func handleDownloadCommand(url string, opts *DownloadOpts) error {
8379
}
8480
}
8581

86-
if opts.dump {
82+
if dlOpts.dump {
8783
jsonName := fmt.Sprintf("%s.json", docToken)
8884
outputPath := filepath.Join(opts.outputDir, jsonName)
8985
data := struct {
@@ -103,7 +99,7 @@ func handleDownloadCommand(url string, opts *DownloadOpts) error {
10399

104100
// Write to markdown file
105101
mdName := fmt.Sprintf("%s.md", docToken)
106-
if config.Output.TitleAsFilename {
102+
if dlConfig.Output.TitleAsFilename {
107103
mdName = fmt.Sprintf("%s.md", title)
108104
}
109105
outputPath := filepath.Join(opts.outputDir, mdName)
@@ -114,3 +110,81 @@ func handleDownloadCommand(url string, opts *DownloadOpts) error {
114110

115111
return nil
116112
}
113+
114+
func downloadDocuments(client *core.Client, ctx context.Context, url string) error {
115+
// Validate the url to download
116+
folderToken, err := utils.ValidateFolderURL(url)
117+
if err != nil {
118+
return err
119+
}
120+
fmt.Println("Captured folder token:", folderToken)
121+
122+
// Error channel and wait group
123+
errChan := make(chan error)
124+
wg := sync.WaitGroup{}
125+
126+
// Recursively go through the folder and download the documents
127+
var processFolder func(ctx context.Context, folderPath, folderToken string) error
128+
processFolder = func(ctx context.Context, folderPath, folderToken string) error {
129+
files, err := client.GetDriveFolderFileList(ctx, nil, &folderToken)
130+
if err != nil {
131+
return err
132+
}
133+
opts := DownloadOpts{outputDir: folderPath, dump: dlOpts.dump, batch: false}
134+
for _, file := range files {
135+
if file.Type == "folder" {
136+
_folderPath := filepath.Join(folderPath, file.Name)
137+
if err := processFolder(ctx, _folderPath, file.Token); err != nil {
138+
return err
139+
}
140+
} else if file.Type == "docx" {
141+
// concurrently download the document
142+
wg.Add(1)
143+
go func(_url string) {
144+
if err := downloadDocument(client, ctx, _url, &opts); err != nil {
145+
errChan <- err
146+
}
147+
wg.Done()
148+
}(file.URL)
149+
}
150+
}
151+
return nil
152+
}
153+
if err := processFolder(ctx, dlOpts.outputDir, folderToken); err != nil {
154+
return err
155+
}
156+
157+
// Wait for all the downloads to finish
158+
go func() {
159+
wg.Wait()
160+
close(errChan)
161+
}()
162+
for err := range errChan {
163+
return err
164+
}
165+
return nil
166+
}
167+
168+
func handleDownloadCommand(url string) error {
169+
// Load config
170+
configPath, err := core.GetConfigFilePath()
171+
if err != nil {
172+
return err
173+
}
174+
dlConfig, err := core.ReadConfigFromFile(configPath)
175+
if err != nil {
176+
return err
177+
}
178+
179+
// Instantiate the client
180+
client := core.NewClient(
181+
dlConfig.Feishu.AppId, dlConfig.Feishu.AppSecret,
182+
)
183+
ctx := context.Background()
184+
185+
if dlOpts.batch {
186+
return downloadDocuments(client, ctx, url)
187+
}
188+
189+
return downloadDocument(client, ctx, url, &dlOpts)
190+
}

cmd/main.go

Lines changed: 11 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ func main() {
3838
},
3939
},
4040
Action: func(ctx *cli.Context) error {
41-
return handleConfigCommand(&configOpts)
41+
return handleConfigCommand()
4242
},
4343
},
4444
{
@@ -51,22 +51,28 @@ func main() {
5151
Aliases: []string{"o"},
5252
Value: "./",
5353
Usage: "Specify the output directory for the markdown files",
54-
Destination: &downloadOpts.outputDir,
54+
Destination: &dlOpts.outputDir,
5555
},
5656
&cli.BoolFlag{
5757
Name: "dump",
5858
Value: false,
5959
Usage: "Dump json response of the OPEN API",
60-
Destination: &downloadOpts.dump,
60+
Destination: &dlOpts.dump,
61+
},
62+
&cli.BoolFlag{
63+
Name: "batch",
64+
Value: false,
65+
Usage: "Download all documents under a folder",
66+
Destination: &dlOpts.batch,
6167
},
6268
},
6369
ArgsUsage: "<url>",
6470
Action: func(ctx *cli.Context) error {
6571
if ctx.NArg() == 0 {
66-
return cli.Exit("Please specify the document url", 1)
72+
return cli.Exit("Please specify the document/folder url", 1)
6773
} else {
6874
url := ctx.Args().First()
69-
return handleDownloadCommand(url, &downloadOpts)
75+
return handleDownloadCommand(url)
7076
}
7177
},
7278
},

core/client.go

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ import (
1010
"time"
1111

1212
"github.com/chyroc/lark"
13+
"github.com/chyroc/lark_rate_limiter"
1314
)
1415

1516
type Client struct {
@@ -21,6 +22,7 @@ func NewClient(appID, appSecret string) *Client {
2122
larkClient: lark.New(
2223
lark.WithAppCredential(appID, appSecret),
2324
lark.WithTimeout(60*time.Second),
25+
lark.WithApiMiddleware(lark_rate_limiter.Wait(5, 5)),
2426
),
2527
}
2628
}
@@ -104,3 +106,27 @@ func (c *Client) GetWikiNodeInfo(ctx context.Context, token string) (*lark.GetWi
104106
}
105107
return resp.Node, nil
106108
}
109+
110+
func (c *Client) GetDriveFolderFileList(ctx context.Context, pageToken *string, folderToken *string) ([]*lark.GetDriveFileListRespFile, error) {
111+
resp, _, err := c.larkClient.Drive.GetDriveFileList(ctx, &lark.GetDriveFileListReq{
112+
PageSize: nil,
113+
PageToken: pageToken,
114+
FolderToken: folderToken,
115+
})
116+
if err != nil {
117+
return nil, err
118+
}
119+
files := resp.Files
120+
for resp.HasMore {
121+
resp, _, err = c.larkClient.Drive.GetDriveFileList(ctx, &lark.GetDriveFileListReq{
122+
PageSize: nil,
123+
PageToken: &resp.NextPageToken,
124+
FolderToken: folderToken,
125+
})
126+
if err != nil {
127+
return nil, err
128+
}
129+
files = append(files, resp.Files...)
130+
}
131+
return files, nil
132+
}

0 commit comments

Comments
 (0)