Source Code
Feishu Document Reader
This skill enables reading and extracting content from Feishu (Lark) documents using the official Feishu Open API.
Configuration
Set Up the Skill
- Create the configuration file at
./reference/feishu_config.jsonwith your Feishu app credentials:
{
"app_id": "your_feishu_app_id_here",
"app_secret": "your_feishu_app_secret_here"
}
- Make sure the scripts are executable:
chmod +x scripts/read_doc.sh
chmod +x scripts/get_blocks.sh
Security Note: The configuration file should be kept secure and not committed to version control. Consider using proper file permissions (chmod 600 ./reference/feishu_config.json).
Usage
Basic Document Reading
To read a Feishu document, you need the document token (found in the URL: https://example.feishu.cn/docx/DOC_TOKEN).
Using the shell script (recommended):
# Make sure environment variables are set first
./scripts/read_doc.sh "your_doc_token_here"
# Or specify document type explicitly
./scripts/read_doc.sh "docx_token" "doc"
./scripts/read_doc.sh "sheet_token" "sheet"
Get Detailed Document Blocks (NEW)
For complete document structure with all blocks, use the dedicated blocks script:
# Get full document blocks structure
./scripts/get_blocks.sh "docx_AbCdEfGhIjKlMnOpQrStUv"
# Get specific block by ID
./scripts/get_blocks.sh "docx_token" "block_id"
# Get blocks with children
./scripts/get_blocks.sh "docx_token" "" "true"
Using Python directly for blocks:
python scripts/get_feishu_doc_blocks.py --doc-token "your_doc_token_here"
python scripts/get_feishu_doc_blocks.py --doc-token "docx_token" --block-id "block_id"
python scripts/get_feishu_doc_blocks.py --doc-token "docx_token" --include-children
Supported Document Types
- Docx documents (new Feishu docs): Full content extraction with blocks, metadata, and structure
- Doc documents (legacy): Basic metadata and limited content
- Sheets: Full spreadsheet data extraction with sheet navigation
- Slides: Basic metadata (content extraction requires additional permissions)
Features
Enhanced Content Extraction
- Structured output: Clean JSON with document metadata, content blocks, and hierarchy
- Complete blocks access: Full access to all document blocks including text, tables, images, headings, lists, etc.
- Block hierarchy: Proper parent-child relationships between blocks
- Text extraction: Automatic text extraction from complex block structures
- Table support: Proper table parsing with row/column structure
- Image handling: Image URLs and metadata extraction
- Link resolution: Internal and external link extraction
Block Types Supported
- text: Plain text and rich text content
- heading1/2/3: Document headings with proper hierarchy
- bullet/ordered: List items with nesting support
- table: Complete table structures with cells and formatting
- image: Image blocks with tokens and metadata
- quote: Block quotes
- code: Code blocks with language detection
- equation: Mathematical equations
- divider: Horizontal dividers
- page: Page breaks (in multi-page documents)
Error Handling & Diagnostics
- Detailed error messages: Clear explanations for common issues
- Permission validation: Checks required permissions before making requests
- Token validation: Validates document tokens before processing
- Retry logic: Automatic retries for transient network errors
- Rate limiting: Handles API rate limits gracefully
Security Features
- Secure credential storage: Supports both environment variables and secure file storage
- No credential logging: Credentials never appear in logs or output
- Minimal permissions: Uses only required API permissions
- Access token caching: Efficient token reuse to minimize API calls
Command Line Options
Main Document Reader
# Python script options
python scripts/read_feishu_doc.py --help
# Shell script usage
./scripts/read_doc.sh <doc_token> [doc|sheet|slide]
Blocks Reader (NEW)
# Get full document blocks
./scripts/get_blocks.sh <doc_token>
# Get specific block
./scripts/get_blocks.sh <doc_token> <block_id>
# Include children blocks
./scripts/get_blocks.sh <doc_token> "" true
# Python options
python scripts/get_feishu_doc_blocks.py --help
API Permissions Required
Your Feishu app needs the following permissions:
docx:document:readonly- Read document contentdoc:document:readonly- Read legacy document contentsheets:spreadsheet:readonly- Read spreadsheet content
Error Handling
Common errors and solutions:
- 403 Forbidden: Check app permissions and document sharing settings
- 404 Not Found: Verify document token is correct and document exists
- Token expired: Access tokens are valid for 2 hours, refresh as needed
- App ID/Secret invalid: Double-check your credentials in Feishu Open Platform
- Insufficient permissions: Ensure your app has the required API permissions
- 99991663: Application doesn't have permission to access the document
- 99991664: Document doesn't exist or has been deleted
- 99991668: Token expired, need to refresh
Examples
Extract document with full structure
# Read document
./scripts/read_doc.sh "docx_AbCdEfGhIjKlMnOpQrStUv"
Get complete document blocks (NEW)
# Get all blocks with full structure
./scripts/get_blocks.sh "docx_AbCdEfGhIjKlMnOpQrStUv"
# Get specific block details
./scripts/get_blocks.sh "docx_AbCdEfGhIjKlMnOpQrStUv" "blk_xxxxxxxxxxxxxx"
Process spreadsheet data
./scripts/read_doc.sh "sheet_XyZ123AbCdEfGhIj" "sheet"
Extract only text content (Python script)
python scripts/read_feishu_doc.py --doc-token "docx_token" --extract-text-only
Security Notes
- Never commit credentials: Keep app secrets out of version control
- Use minimal permissions: Only request permissions your use case requires
- Secure file permissions: Set proper file permissions on secret files (
chmod 600) - Environment isolation: Use separate apps for development and production
- Audit access: Regularly review which documents your app can access
Troubleshooting
Authentication Issues
- Verify your App ID and App Secret in Feishu Open Platform
- Ensure the app has been published with required permissions
- Check that environment variables or config files are properly set
- Test with the
test_auth.pyscript to verify credentials
Document Access Issues
- Ensure the document is shared with your app or in an accessible space
- Verify the document token format (should start with
docx_,doc_, orsheet_) - Check if the document requires additional sharing permissions
Network Issues
- Ensure your server can reach
open.feishu.cn - Check firewall rules if running in restricted environments
- The script includes retry logic for transient network failures
Blocks-Specific Issues
- Empty blocks response: Document might be empty or have no accessible blocks
- Missing block types: Some block types require additional permissions
- Incomplete hierarchy: Use
--include-childrenflag for complete block tree