remove(instagram): 移除 Instagram 渠道
Instagram 反爬封杀导致所有开源工具(instaloader 等)失效, 无论有无 cookies 都无法正常使用。 - 删除 instagram.py 渠道文件 - 移除 CLI 中 search-instagram、configure instagram-cookies 等命令 - 移除 setup/doctor 中 instaloader 依赖检查 - 更新 README、docs、SKILL.md、pyproject.toml 上游 issue: instaloader#2585, instaloader#2648 Relates to: #13
This commit is contained in:
parent
c3a9813b1c
commit
f70711e75e
11 changed files with 21 additions and 370 deletions
21
CHANGELOG.md
21
CHANGELOG.md
|
|
@ -10,13 +10,10 @@ All notable changes to this project will be documented in this file.
|
|||
|
||||
### 🆕 New Channels / 新增渠道
|
||||
|
||||
#### 📷 Instagram
|
||||
- Read public posts and profiles via [instaloader](https://github.com/instaloader/instaloader)
|
||||
- Search via Exa (free, no API key)
|
||||
- Optional cookie login for private content
|
||||
- 通过 instaloader 读取公开帖子和 Profile
|
||||
- 搜索通过 Exa(免费,无需 API Key)
|
||||
- 可选 Cookie 登录解锁私密内容
|
||||
#### ~~📷 Instagram~~ (removed — upstream blocked)
|
||||
- ~~Read public posts and profiles via [instaloader](https://github.com/instaloader/instaloader)~~
|
||||
- **Removed:** Instagram's aggressive anti-scraping measures broke all available open-source tools (instaloader, etc.). See [instaloader#2585](https://github.com/instaloader/instaloader/issues/2585). Will re-add when upstream recovers.
|
||||
- **已移除:** Instagram 反爬封杀导致所有开源工具(instaloader 等)失效。上游恢复后会重新加回。
|
||||
|
||||
#### 💼 LinkedIn
|
||||
- Read person profiles, company pages, and job details via [linkedin-scraper-mcp](https://github.com/stickerdaniel/linkedin-mcp-server)
|
||||
|
|
@ -38,12 +35,12 @@ All notable changes to this project will be documented in this file.
|
|||
|
||||
- Channel count: 9 → 12
|
||||
- `agent-reach doctor` now detects all 12 channels
|
||||
- CLI: added `search-instagram`, `search-linkedin`, `search-bosszhipin` subcommands
|
||||
- CLI: added `search-linkedin`, `search-bosszhipin` subcommands
|
||||
- Updated install guide with setup instructions for new channels
|
||||
- 渠道数量:9 → 12
|
||||
- `agent-reach doctor` 现在检测全部 12 个渠道
|
||||
- CLI:新增 `search-instagram`、`search-linkedin`、`search-bosszhipin` 子命令
|
||||
- 安装指南新增三个渠道的配置说明
|
||||
- 渠道数量:9 → 11
|
||||
- `agent-reach doctor` 现在检测全部 11 个渠道
|
||||
- CLI:新增 `search-linkedin`、`search-bosszhipin` 子命令
|
||||
- 安装指南新增渠道配置说明
|
||||
|
||||
---
|
||||
|
||||
|
|
|
|||
|
|
@ -69,7 +69,6 @@ AI Agent 已经能帮你写代码、改文档、管项目——但你让它去
|
|||
| 📺 **B站** | 本地:字幕提取 + 搜索 | 服务器也能用 | 告诉 Agent「帮我配代理」 |
|
||||
| 📖 **Reddit** | 搜索(通过 Exa 免费) | 读帖子和评论 | 告诉 Agent「帮我配代理」 |
|
||||
| 📕 **小红书** | — | 阅读、搜索、发帖、评论、点赞 | 告诉 Agent「帮我配小红书」 |
|
||||
| 📷 **Instagram** | 搜索(通过 Exa 免费) | 读取帖子和 Profile | 告诉 Agent「帮我配 Instagram」 |
|
||||
| 💼 **LinkedIn** | Jina Reader 读公开页面 | Profile 详情、公司页面、职位搜索 | 告诉 Agent「帮我配 LinkedIn」 |
|
||||
| 🏢 **Boss直聘** | Jina Reader 读职位页 | 搜索职位、向 HR 打招呼 | 告诉 Agent「帮我配 Boss直聘」 |
|
||||
|
||||
|
|
@ -148,7 +147,6 @@ channels/
|
|||
├── bilibili.py → yt-dlp ← 可以换成 bilibili-api……
|
||||
├── reddit.py → JSON API + Exa ← 可以换成 PRAW、Pushshift……
|
||||
├── xiaohongshu.py → mcporter MCP ← 可以换成其他 XHS 工具……
|
||||
├── instagram.py → instaloader ← 可以换成 instagrapi、官方 API……
|
||||
├── linkedin.py → linkedin-mcp ← 可以换成 LinkedIn API……
|
||||
├── bosszhipin.py → mcp-bosszp ← 可以换成其他招聘工具……
|
||||
├── rss.py → feedparser ← 可以换成 atoma……
|
||||
|
|
@ -167,7 +165,6 @@ channels/
|
|||
| GitHub | [gh CLI](https://cli.github.com) | 官方工具,认证后完整 API 能力 |
|
||||
| 读 RSS | [feedparser](https://github.com/kurtmckee/feedparser) | Python 生态标准选择,2.3K Star |
|
||||
| 小红书 | [xiaohongshu-mcp](https://github.com/xpzouying/xiaohongshu-mcp) | ⭐9K+,Go 语言,Docker 一键部署 |
|
||||
| Instagram | [instaloader](https://github.com/instaloader/instaloader) | ⭐9.8K,Python CLI,Cookie 登录,免费 |
|
||||
| LinkedIn | [linkedin-scraper-mcp](https://github.com/stickerdaniel/linkedin-mcp-server) | ⭐900+,MCP 服务,浏览器自动化 |
|
||||
| Boss直聘 | [mcp-bosszp](https://github.com/mucsbr/mcp-bosszp) | MCP 服务,支持职位搜索和打招呼 |
|
||||
|
||||
|
|
@ -189,7 +186,7 @@ Agent Reach 在设计上重视安全:
|
|||
|
||||
### 🍪 Cookie 安全建议
|
||||
|
||||
需要 Cookie 的平台(Twitter、小红书、Instagram)建议使用**专用小号**,不要用主账号。Cookie 等同于完整登录权限,用小号可以在凭据泄露时限制影响范围。
|
||||
需要 Cookie 的平台(Twitter、小红书)建议使用**专用小号**,不要用主账号。Cookie 等同于完整登录权限,用小号可以在凭据泄露时限制影响范围。
|
||||
|
||||
### 📦 安装方式
|
||||
|
||||
|
|
@ -268,14 +265,14 @@ Yes! Agent Reach is a standard CLI tool — any AI coding agent that can run she
|
|||
<details>
|
||||
<summary><strong>Is this free? Any API costs?</strong></summary>
|
||||
|
||||
100% free. All backends are open-source tools (bird CLI, yt-dlp, Jina Reader, instaloader, Exa, etc.) that don't require paid API keys. The only optional cost is a residential proxy (~$1/month) if you need Reddit/Bilibili access from a server.
|
||||
100% free. All backends are open-source tools (bird CLI, yt-dlp, Jina Reader, Exa, etc.) that don't require paid API keys. The only optional cost is a residential proxy (~$1/month) if you need Reddit/Bilibili access from a server.
|
||||
</details>
|
||||
|
||||
---
|
||||
|
||||
## 致谢
|
||||
|
||||
[Jina Reader](https://github.com/jina-ai/reader) · [yt-dlp](https://github.com/yt-dlp/yt-dlp) · [bird](https://www.npmjs.com/package/@steipete/bird) · [Exa](https://exa.ai) · [mcporter](https://github.com/steipete/mcporter) · [feedparser](https://github.com/kurtmckee/feedparser) · [xiaohongshu-mcp](https://github.com/xpzouying/xiaohongshu-mcp) · [instaloader](https://github.com/instaloader/instaloader) · [linkedin-scraper-mcp](https://github.com/stickerdaniel/linkedin-mcp-server) · [mcp-bosszp](https://github.com/mucsbr/mcp-bosszp)
|
||||
[Jina Reader](https://github.com/jina-ai/reader) · [yt-dlp](https://github.com/yt-dlp/yt-dlp) · [bird](https://www.npmjs.com/package/@steipete/bird) · [Exa](https://exa.ai) · [mcporter](https://github.com/steipete/mcporter) · [feedparser](https://github.com/kurtmckee/feedparser) · [xiaohongshu-mcp](https://github.com/xpzouying/xiaohongshu-mcp) · [linkedin-scraper-mcp](https://github.com/stickerdaniel/linkedin-mcp-server) · [mcp-bosszp](https://github.com/mucsbr/mcp-bosszp)
|
||||
|
||||
## License
|
||||
|
||||
|
|
|
|||
|
|
@ -20,7 +20,6 @@ from .rss import RSSChannel
|
|||
from .bilibili import BilibiliChannel
|
||||
from .exa_search import ExaSearchChannel
|
||||
from .xiaohongshu import XiaoHongShuChannel
|
||||
from .instagram import InstagramChannel
|
||||
from .linkedin import LinkedInChannel
|
||||
from .bosszhipin import BossZhipinChannel
|
||||
|
||||
|
|
@ -33,7 +32,6 @@ ALL_CHANNELS: List[Channel] = [
|
|||
RedditChannel(),
|
||||
BilibiliChannel(),
|
||||
XiaoHongShuChannel(),
|
||||
InstagramChannel(),
|
||||
LinkedInChannel(),
|
||||
BossZhipinChannel(),
|
||||
RSSChannel(),
|
||||
|
|
|
|||
|
|
@ -1,248 +0,0 @@
|
|||
# -*- coding: utf-8 -*-
|
||||
"""Instagram — via instaloader (free, open source).
|
||||
|
||||
Backend: instaloader (9.8K stars, Python CLI + library)
|
||||
Swap to: any Instagram access tool
|
||||
"""
|
||||
|
||||
import re
|
||||
import shutil
|
||||
import subprocess
|
||||
from pathlib import Path
|
||||
from urllib.parse import urlparse
|
||||
from .base import Channel, ReadResult, SearchResult
|
||||
from typing import List
|
||||
|
||||
|
||||
class InstagramChannel(Channel):
|
||||
name = "instagram"
|
||||
description = "Instagram 帖子和 Profile"
|
||||
backends = ["instaloader"]
|
||||
tier = 2 # Needs login for full access
|
||||
|
||||
def can_handle(self, url: str) -> bool:
|
||||
domain = urlparse(url).netloc.lower()
|
||||
return "instagram.com" in domain or "instagr.am" in domain
|
||||
|
||||
def check(self, config=None):
|
||||
# Check both CLI and Python module
|
||||
has_cli = shutil.which("instaloader")
|
||||
has_module = False
|
||||
try:
|
||||
import instaloader
|
||||
has_module = True
|
||||
except ImportError:
|
||||
pass
|
||||
|
||||
if not has_cli and not has_module:
|
||||
return "off", (
|
||||
"需要安装 instaloader:pip install instaloader\n"
|
||||
" 安装后可读取 Instagram 帖子和 Profile\n"
|
||||
" 登录: agent-reach configure instagram-cookies \"sessionid=xxx; csrftoken=yyy; ...\""
|
||||
)
|
||||
|
||||
# Check if cookies are configured
|
||||
cookie_file = Path.home() / ".agent-reach" / "instagram-cookies.txt"
|
||||
if cookie_file.exists():
|
||||
return "ok", "已登录,可读取 Instagram 帖子和 Profile"
|
||||
return "ok", "可读取公开帖子和 Profile。登录可访问更多内容:\n agent-reach configure instagram-cookies \"sessionid=xxx; csrftoken=yyy; ...\""
|
||||
|
||||
async def read(self, url: str, config=None) -> ReadResult:
|
||||
# Try instaloader (module or CLI)
|
||||
try:
|
||||
import instaloader
|
||||
return await self._read_instaloader(url, config)
|
||||
except ImportError:
|
||||
pass
|
||||
# Fallback: Jina Reader
|
||||
return await self._read_jina(url)
|
||||
|
||||
async def _read_instaloader(self, url: str, config=None) -> ReadResult:
|
||||
"""Read Instagram content using instaloader Python API."""
|
||||
import asyncio
|
||||
import concurrent.futures
|
||||
|
||||
def _sync_read():
|
||||
import instaloader
|
||||
L = instaloader.Instaloader(
|
||||
download_pictures=False,
|
||||
download_videos=False,
|
||||
download_video_thumbnails=False,
|
||||
download_geotags=False,
|
||||
download_comments=False,
|
||||
save_metadata=False,
|
||||
compress_json=False,
|
||||
max_connection_attempts=1, # Don't retry on rate limit
|
||||
)
|
||||
|
||||
# Try to load session: cookie file > saved session
|
||||
cookie_file = Path.home() / ".agent-reach" / "instagram-cookies.txt"
|
||||
if cookie_file.exists():
|
||||
try:
|
||||
cookie_str = cookie_file.read_text().strip()
|
||||
cookies = {}
|
||||
for part in cookie_str.split(";"):
|
||||
part = part.strip()
|
||||
if "=" in part:
|
||||
k, v = part.split("=", 1)
|
||||
cookies[k.strip()] = v.strip()
|
||||
if "sessionid" in cookies and "csrftoken" in cookies:
|
||||
# Extract username from ds_user_id or use generic
|
||||
username = cookies.get("ds_user_id", "user")
|
||||
L.context.load_session(username, cookies)
|
||||
except Exception:
|
||||
pass
|
||||
elif config and config.get("instagram_username"):
|
||||
try:
|
||||
L.load_session_from_file(config.get("instagram_username"))
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
path = urlparse(url).path.strip("/")
|
||||
|
||||
if "/p/" in url or "/reel/" in url:
|
||||
return self._read_post_sync(L, url, path)
|
||||
else:
|
||||
return self._read_profile_sync(L, url, path)
|
||||
|
||||
try:
|
||||
# Run with 15s timeout to avoid instaloader's 30-min retry
|
||||
loop = asyncio.get_event_loop()
|
||||
with concurrent.futures.ThreadPoolExecutor() as pool:
|
||||
result = await asyncio.wait_for(
|
||||
loop.run_in_executor(pool, _sync_read),
|
||||
timeout=15,
|
||||
)
|
||||
return result
|
||||
except (asyncio.TimeoutError, Exception):
|
||||
# Any error or timeout → Jina fallback
|
||||
return await self._read_jina(url)
|
||||
|
||||
def _read_post_sync(self, L, url: str, path: str) -> ReadResult:
|
||||
"""Read a single Instagram post (sync, runs in executor)."""
|
||||
import instaloader
|
||||
|
||||
# Extract shortcode from URL
|
||||
match = re.search(r"/(?:p|reel)/([A-Za-z0-9_-]+)", url)
|
||||
if not match:
|
||||
raise ValueError("Cannot extract shortcode from URL")
|
||||
|
||||
shortcode = match.group(1)
|
||||
try:
|
||||
post = instaloader.Post.from_shortcode(L.context, shortcode)
|
||||
|
||||
lines = []
|
||||
if post.caption:
|
||||
lines.append(post.caption)
|
||||
lines.append("")
|
||||
lines.append(f"👤 @{post.owner_username}")
|
||||
lines.append(f"❤️ {post.likes} likes")
|
||||
if post.comments:
|
||||
lines.append(f"💬 {post.comments} comments")
|
||||
lines.append(f"📅 {post.date_utc.strftime('%Y-%m-%d %H:%M')}")
|
||||
if post.location:
|
||||
lines.append(f"📍 {post.location}")
|
||||
if post.hashtags:
|
||||
lines.append(f"#️⃣ {' '.join('#' + h for h in post.hashtags)}")
|
||||
|
||||
return ReadResult(
|
||||
title=f"@{post.owner_username}: {(post.caption or '')[:80]}",
|
||||
content="\n".join(lines),
|
||||
url=url,
|
||||
author=f"@{post.owner_username}",
|
||||
date=post.date_utc.strftime("%Y-%m-%d"),
|
||||
platform="instagram",
|
||||
extra={"likes": post.likes, "comments": post.comments},
|
||||
)
|
||||
except Exception:
|
||||
raise # Let executor timeout handle fallback
|
||||
|
||||
def _read_profile_sync(self, L, url: str, path: str) -> ReadResult:
|
||||
"""Read an Instagram profile (sync, runs in executor)."""
|
||||
import instaloader
|
||||
|
||||
# Extract username from path
|
||||
username = path.split("/")[0] if path else ""
|
||||
if not username or username in ("p", "reel", "stories", "explore"):
|
||||
raise ValueError("Cannot extract username from URL")
|
||||
|
||||
try:
|
||||
profile = instaloader.Profile.from_username(L.context, username)
|
||||
|
||||
lines = []
|
||||
lines.append(f"👤 {profile.full_name} (@{profile.username})")
|
||||
if profile.biography:
|
||||
lines.append(f"📝 {profile.biography}")
|
||||
if profile.external_url:
|
||||
lines.append(f"🔗 {profile.external_url}")
|
||||
lines.append("")
|
||||
lines.append(f"📊 {profile.mediacount} posts · "
|
||||
f"{profile.followers} followers · "
|
||||
f"{profile.followees} following")
|
||||
if profile.is_verified:
|
||||
lines.append("✅ Verified")
|
||||
if profile.is_business_account and profile.business_category_name:
|
||||
lines.append(f"🏢 {profile.business_category_name}")
|
||||
|
||||
# Get recent posts (up to 5)
|
||||
lines.append("")
|
||||
lines.append("📸 Recent posts:")
|
||||
count = 0
|
||||
for post in profile.get_posts():
|
||||
if count >= 5:
|
||||
break
|
||||
caption = (post.caption or "")[:100].replace("\n", " ")
|
||||
lines.append(f" • ❤️{post.likes} | {post.date_utc.strftime('%m-%d')} | {caption}")
|
||||
count += 1
|
||||
|
||||
return ReadResult(
|
||||
title=f"{profile.full_name} (@{profile.username}) - Instagram",
|
||||
content="\n".join(lines),
|
||||
url=url,
|
||||
author=f"@{profile.username}",
|
||||
platform="instagram",
|
||||
extra={
|
||||
"followers": profile.followers,
|
||||
"posts": profile.mediacount,
|
||||
},
|
||||
)
|
||||
except Exception:
|
||||
raise # Let executor timeout handle fallback
|
||||
|
||||
async def _read_jina(self, url: str) -> ReadResult:
|
||||
"""Fallback: use Jina Reader."""
|
||||
import requests
|
||||
try:
|
||||
resp = requests.get(
|
||||
f"https://r.jina.ai/{url}",
|
||||
headers={"Accept": "text/markdown"},
|
||||
timeout=15,
|
||||
)
|
||||
resp.raise_for_status()
|
||||
text = resp.text
|
||||
return ReadResult(
|
||||
title=text[:100] if text else url,
|
||||
content=text,
|
||||
url=url,
|
||||
platform="instagram",
|
||||
)
|
||||
except Exception:
|
||||
return ReadResult(
|
||||
title="Instagram",
|
||||
content=(
|
||||
f"⚠️ 无法读取此 Instagram 内容: {url}\n\n"
|
||||
"提示:\n"
|
||||
"- 确保 URL 正确\n"
|
||||
"- 安装 instaloader: pip install instaloader\n"
|
||||
"- 登录以访问更多内容: instaloader --login YOUR_USERNAME"
|
||||
),
|
||||
url=url,
|
||||
platform="instagram",
|
||||
)
|
||||
|
||||
async def search(self, query: str, config=None, **kwargs) -> List[SearchResult]:
|
||||
"""Search Instagram via Exa."""
|
||||
limit = kwargs.get("limit", 10)
|
||||
from agent_reach.channels.exa_search import ExaSearchChannel
|
||||
exa = ExaSearchChannel()
|
||||
return await exa.search(f"site:instagram.com {query}", config=config, limit=limit)
|
||||
|
|
@ -89,11 +89,6 @@ def main():
|
|||
p_sx.add_argument("query", nargs="+", help="Search query")
|
||||
p_sx.add_argument("-n", "--num", type=int, default=10, help="Number of results")
|
||||
|
||||
# ── search-instagram ──
|
||||
p_si = sub.add_parser("search-instagram", help="Search Instagram")
|
||||
p_si.add_argument("query", nargs="+", help="Search query")
|
||||
p_si.add_argument("-n", "--num", type=int, default=10, help="Number of results")
|
||||
|
||||
# ── search-linkedin ──
|
||||
p_sl = sub.add_parser("search-linkedin", help="Search LinkedIn")
|
||||
p_sl.add_argument("query", nargs="+", help="Search query")
|
||||
|
|
@ -122,8 +117,7 @@ def main():
|
|||
p_conf = sub.add_parser("configure", help="Set a config value or auto-extract from browser")
|
||||
p_conf.add_argument("key", nargs="?", default=None,
|
||||
choices=["proxy", "github-token", "groq-key",
|
||||
"twitter-cookies", "youtube-cookies",
|
||||
"instagram-cookies"],
|
||||
"twitter-cookies", "youtube-cookies"],
|
||||
help="What to configure (omit if using --from-browser)")
|
||||
p_conf.add_argument("value", nargs="*", help="The value(s) to set")
|
||||
p_conf.add_argument("--from-browser", metavar="BROWSER",
|
||||
|
|
@ -436,23 +430,6 @@ def _install_system_deps():
|
|||
except Exception:
|
||||
print(" ⬜ undici install failed (optional — bird may not work behind proxies)")
|
||||
|
||||
# ── instaloader (for Instagram) ──
|
||||
if shutil.which("instaloader"):
|
||||
print(" ✅ instaloader already installed")
|
||||
else:
|
||||
print(" 📥 Installing instaloader...")
|
||||
try:
|
||||
subprocess.run(
|
||||
[sys.executable, "-m", "pip", "install", "instaloader"],
|
||||
capture_output=True, text=True, timeout=120,
|
||||
)
|
||||
if shutil.which("instaloader"):
|
||||
print(" ✅ instaloader installed (Instagram reading)")
|
||||
else:
|
||||
print(" ⬜ instaloader install failed (optional — try: pip install instaloader)")
|
||||
except Exception:
|
||||
print(" ⬜ instaloader install failed (optional — try: pip install instaloader)")
|
||||
|
||||
|
||||
def _install_system_deps_safe():
|
||||
"""Safe mode: check what's installed, print instructions for what's missing."""
|
||||
|
|
@ -464,7 +441,6 @@ def _install_system_deps_safe():
|
|||
("gh", ["gh"], "GitHub CLI", "https://cli.github.com — or: apt install gh / brew install gh"),
|
||||
("node", ["node", "npm"], "Node.js", "https://nodejs.org — or: apt install nodejs npm"),
|
||||
("bird", ["bird", "birdx"], "bird CLI (Twitter)", "npm install -g @steipete/bird"),
|
||||
("instaloader", ["instaloader"], "instaloader (Instagram)", "pip install instaloader"),
|
||||
]
|
||||
|
||||
missing = []
|
||||
|
|
@ -495,7 +471,6 @@ def _install_system_deps_dryrun():
|
|||
("gh CLI", ["gh"], "apt install gh / brew install gh"),
|
||||
("Node.js", ["node"], "curl NodeSource setup | bash + apt install nodejs"),
|
||||
("bird CLI", ["bird", "birdx"], "npm install -g @steipete/bird"),
|
||||
("instaloader", ["instaloader"], "pip install instaloader"),
|
||||
]
|
||||
|
||||
for label, binaries, method in checks:
|
||||
|
|
@ -764,9 +739,6 @@ def _cmd_configure(args):
|
|||
config.set("groq_api_key", value)
|
||||
print(f"✅ Groq key configured!")
|
||||
|
||||
elif args.key == "instagram-cookies":
|
||||
_configure_instagram_cookies(value)
|
||||
|
||||
|
||||
def _cmd_doctor():
|
||||
from agent_reach.config import Config
|
||||
|
|
@ -787,30 +759,6 @@ def _parse_cookie_header(cookie_str: str) -> dict:
|
|||
return cookies
|
||||
|
||||
|
||||
def _configure_instagram_cookies(value: str):
|
||||
"""Save Instagram cookies from Cookie-Editor Header String."""
|
||||
from pathlib import Path
|
||||
|
||||
cookies = _parse_cookie_header(value)
|
||||
if "sessionid" not in cookies:
|
||||
print("❌ Cookie 里缺少 sessionid。")
|
||||
print(" 确保你已登录 Instagram,然后用 Cookie-Editor 导出 Header String。")
|
||||
print(' 格式: agent-reach configure instagram-cookies "sessionid=xxx; csrftoken=yyy; ..."')
|
||||
return
|
||||
|
||||
cookie_dir = Path.home() / ".agent-reach"
|
||||
cookie_dir.mkdir(parents=True, exist_ok=True)
|
||||
cookie_file = cookie_dir / "instagram-cookies.txt"
|
||||
cookie_file.write_text(value.strip())
|
||||
cookie_file.chmod(0o600)
|
||||
|
||||
print(f"✅ Instagram cookies 已保存!")
|
||||
print(f" sessionid: {cookies['sessionid'][:8]}...")
|
||||
if "csrftoken" in cookies:
|
||||
print(f" csrftoken: ✅")
|
||||
if "ds_user_id" in cookies:
|
||||
print(f" ds_user_id: {cookies['ds_user_id']}")
|
||||
print(f" 文件: {cookie_file}")
|
||||
|
||||
|
||||
def _cmd_setup():
|
||||
|
|
@ -952,8 +900,6 @@ async def _cmd_search(args):
|
|||
results = await eyes.search_bilibili(query, limit=num)
|
||||
elif args.command == "search-xhs":
|
||||
results = await eyes.search_xhs(query, limit=num)
|
||||
elif args.command == "search-instagram":
|
||||
results = await eyes.search_instagram(query, limit=num)
|
||||
elif args.command == "search-linkedin":
|
||||
results = await eyes.search_linkedin(query, limit=num)
|
||||
elif args.command == "search-bosszhipin":
|
||||
|
|
|
|||
|
|
@ -101,12 +101,6 @@ class AgentReach:
|
|||
results = await ch.search(query, config=self.config, limit=limit)
|
||||
return [r.to_dict() for r in results]
|
||||
|
||||
async def search_instagram(self, query: str, limit: int = 10) -> List[Dict[str, Any]]:
|
||||
"""Search Instagram via Exa."""
|
||||
ch = get_channel("instagram")
|
||||
results = await ch.search(query, config=self.config, limit=limit)
|
||||
return [r.to_dict() for r in results]
|
||||
|
||||
async def search_linkedin(self, query: str, limit: int = 10) -> List[Dict[str, Any]]:
|
||||
"""Search LinkedIn via MCP or Exa."""
|
||||
ch = get_channel("linkedin")
|
||||
|
|
|
|||
|
|
@ -2,11 +2,11 @@
|
|||
name: agent-reach
|
||||
description: >
|
||||
Give your AI agent eyes to see the entire internet. Read and search across
|
||||
Twitter/X, Reddit, YouTube, GitHub, Bilibili, XiaoHongShu, Instagram, LinkedIn,
|
||||
Twitter/X, Reddit, YouTube, GitHub, Bilibili, XiaoHongShu, LinkedIn,
|
||||
Boss直聘, RSS, and any web page — all from a single CLI.
|
||||
Use when: (1) reading content from URLs (tweets, Reddit posts, articles, videos),
|
||||
(2) searching across platforms (web, Twitter, Reddit, GitHub, YouTube, Bilibili,
|
||||
XiaoHongShu, Instagram, LinkedIn, Boss直聘),
|
||||
XiaoHongShu, LinkedIn, Boss直聘),
|
||||
(3) user asks to configure/enable a platform channel,
|
||||
(4) checking channel health or updating Agent Reach.
|
||||
Triggers: "search Twitter/Reddit/YouTube", "read this URL", "find posts about",
|
||||
|
|
@ -31,7 +31,7 @@ pip install https://github.com/Panniantong/agent-reach/archive/main.zip
|
|||
agent-reach install --env=auto
|
||||
```
|
||||
|
||||
`install` auto-detects your environment and installs core dependencies (Node.js, mcporter, bird CLI, gh CLI, instaloader). Read the output and run `agent-reach doctor` to see what's active.
|
||||
`install` auto-detects your environment and installs core dependencies (Node.js, mcporter, bird CLI, gh CLI). Read the output and run `agent-reach doctor` to see what's active.
|
||||
|
||||
## Commands
|
||||
|
||||
|
|
@ -40,7 +40,7 @@ agent-reach install --env=auto
|
|||
agent-reach read <url>
|
||||
agent-reach read <url> --json # structured output
|
||||
```
|
||||
Handles: tweets, Reddit posts, articles, YouTube/Bilibili (transcripts), GitHub repos, Instagram posts, LinkedIn profiles, Boss直聘 jobs, XiaoHongShu notes, RSS feeds, and any web page.
|
||||
Handles: tweets, Reddit posts, articles, YouTube/Bilibili (transcripts), GitHub repos, LinkedIn profiles, Boss直聘 jobs, XiaoHongShu notes, RSS feeds, and any web page.
|
||||
|
||||
### Search
|
||||
|
||||
|
|
@ -52,7 +52,6 @@ agent-reach search-github "query" # GitHub (--lang <language>)
|
|||
agent-reach search-youtube "query" # YouTube
|
||||
agent-reach search-bilibili "query" # Bilibili (B站)
|
||||
agent-reach search-xhs "query" # XiaoHongShu (小红书)
|
||||
agent-reach search-instagram "query" # Instagram
|
||||
agent-reach search-linkedin "query" # LinkedIn
|
||||
agent-reach search-bosszhipin "query" # Boss直聘
|
||||
```
|
||||
|
|
@ -71,7 +70,6 @@ agent-reach check-update # check for new versions
|
|||
|
||||
```bash
|
||||
agent-reach configure twitter-cookies "auth_token=xxx; ct0=yyy"
|
||||
agent-reach configure instagram-cookies "sessionid=xxx; csrftoken=yyy; ..."
|
||||
agent-reach configure proxy http://user:pass@ip:port
|
||||
agent-reach configure --from-browser chrome # auto-extract cookies from local browser
|
||||
```
|
||||
|
|
|
|||
|
|
@ -58,7 +58,6 @@ Copy that to your Agent. A few minutes later, it can read tweets, search Reddit,
|
|||
| 🌐 **Web** | Read | Zero config | Any URL → clean Markdown ([Jina Reader](https://github.com/jina-ai/reader) ⭐9.8K) |
|
||||
| 🐦 **Twitter/X** | Read · Search | Zero config / Cookie | Single tweets readable out of the box. Cookie unlocks search, timeline, posting ([bird](https://github.com/steipete/bird)) |
|
||||
| 📕 **XiaoHongShu** | Read · Search · **Post · Comment · Like** | mcporter | Via [xiaohongshu-mcp](https://github.com/user/xiaohongshu-mcp) internal API, install and go |
|
||||
| 📷 **Instagram** | Search (via Exa) | Read posts and profiles | Tell your Agent "help me set up Instagram" |
|
||||
| 💼 **LinkedIn** | Jina Reader (public pages) | Full profiles, companies, job search | Tell your Agent "help me set up LinkedIn" |
|
||||
| 🏢 **Boss直聘** | Jina Reader (job pages) | Job search, greet recruiters | Tell your Agent "help me set up Boss直聘" |
|
||||
| 🔍 **Web Search** | Search | Auto-configured | Auto-configured during install, free, no API key ([Exa](https://exa.ai) via [mcporter](https://github.com/nicepkg/mcporter)) |
|
||||
|
|
@ -185,7 +184,6 @@ channels/
|
|||
├── bilibili.py → yt-dlp ← swap to bilibili-api…
|
||||
├── reddit.py → JSON API + Exa ← swap to PRAW, Pushshift…
|
||||
├── xiaohongshu.py → mcporter MCP ← swap to other XHS tools…
|
||||
├── instagram.py → instaloader ← swap to instagrapi, official API…
|
||||
├── linkedin.py → linkedin-mcp ← swap to LinkedIn API…
|
||||
├── bosszhipin.py → mcp-bosszp ← swap to other job tools…
|
||||
├── rss.py → feedparser ← swap to atoma…
|
||||
|
|
@ -204,7 +202,6 @@ channels/
|
|||
| GitHub | [gh CLI](https://cli.github.com) | Official tool, full API after auth |
|
||||
| Read RSS | [feedparser](https://github.com/kurtmckee/feedparser) | Python ecosystem standard, 2.3K stars |
|
||||
| XiaoHongShu | [xiaohongshu-mcp](https://github.com/user/xiaohongshu-mcp) | Internal API, bypasses anti-bot |
|
||||
| Instagram | [instaloader](https://github.com/instaloader/instaloader) | 9.8K stars, Python CLI, cookie auth, free |
|
||||
| LinkedIn | [linkedin-scraper-mcp](https://github.com/stickerdaniel/linkedin-mcp-server) | 900+ stars, MCP server, browser automation |
|
||||
| Boss直聘 | [mcp-bosszp](https://github.com/mucsbr/mcp-bosszp) | MCP server, job search + recruiter greeting |
|
||||
|
||||
|
|
@ -253,7 +250,7 @@ Yes! Agent Reach is a standard CLI tool. Any AI coding agent that can execute sh
|
|||
<details>
|
||||
<summary><strong>Is Agent Reach free? Any API costs?</strong></summary>
|
||||
|
||||
100% free and open source. All backends (bird CLI, yt-dlp, Jina Reader, instaloader, Exa) are free tools that don't require paid API keys. The only optional cost is a residential proxy (~$1/month) if you need Reddit/Bilibili access from a server.
|
||||
100% free and open source. All backends (bird CLI, yt-dlp, Jina Reader, Exa) are free tools that don't require paid API keys. The only optional cost is a residential proxy (~$1/month) if you need Reddit/Bilibili access from a server.
|
||||
</details>
|
||||
|
||||
<details>
|
||||
|
|
@ -272,7 +269,7 @@ Agent Reach integrates with xiaohongshu-mcp (runs in Docker). After setup, use `
|
|||
|
||||
## Credits
|
||||
|
||||
[Jina Reader](https://github.com/jina-ai/reader) · [yt-dlp](https://github.com/yt-dlp/yt-dlp) · [bird](https://github.com/steipete/bird) · [Exa](https://exa.ai) · [feedparser](https://github.com/kurtmckee/feedparser) · [instaloader](https://github.com/instaloader/instaloader) · [linkedin-scraper-mcp](https://github.com/stickerdaniel/linkedin-mcp-server) · [mcp-bosszp](https://github.com/mucsbr/mcp-bosszp)
|
||||
[Jina Reader](https://github.com/jina-ai/reader) · [yt-dlp](https://github.com/yt-dlp/yt-dlp) · [bird](https://github.com/steipete/bird) · [Exa](https://exa.ai) · [feedparser](https://github.com/kurtmckee/feedparser) · [linkedin-scraper-mcp](https://github.com/stickerdaniel/linkedin-mcp-server) · [mcp-bosszp](https://github.com/mucsbr/mcp-bosszp)
|
||||
|
||||
## License
|
||||
|
||||
|
|
|
|||
|
|
@ -80,7 +80,7 @@ Only ask the user when you genuinely need their input (credentials, permissions,
|
|||
|
||||
Some channels need credentials only the user can provide. Based on the doctor output, ask for what's missing:
|
||||
|
||||
> 🔒 **Security tip:** For platforms that need cookies (Twitter, XiaoHongShu, Instagram), we recommend using a **dedicated/secondary account** rather than your main account. Cookie-based auth grants full account access — using a separate account limits the blast radius if credentials are ever compromised.
|
||||
> 🔒 **Security tip:** For platforms that need cookies (Twitter, XiaoHongShu), we recommend using a **dedicated/secondary account** rather than your main account. Cookie-based auth grants full account access — using a separate account limits the blast radius if credentials are ever compromised.
|
||||
|
||||
**Twitter search & posting (server users):**
|
||||
> "To unlock Twitter search, I need your Twitter cookies. Install the Cookie-Editor Chrome extension, go to x.com/twitter.com, click the extension → Export → Header String, and paste it to me."
|
||||
|
|
@ -125,20 +125,6 @@ mcporter config add xiaohongshu http://localhost:18060/mcp
|
|||
> - **本地电脑(有浏览器):** 打开 http://localhost:18060 扫码登录即可。
|
||||
> - **服务器(无 UI 界面):** 服务器上通常没有浏览器,无法直接扫码。最方便的方式是在自己的电脑上用浏览器登录小红书,然后用 [Cookie-Editor](https://chromewebstore.google.com/detail/cookie-editor/hlkenndednhfkekhgcdicdfddnkalmdm) 插件导出 Cookie(Header String 格式),发给 Agent 即可完成配置。详见 [Cookie 导出指南](cookie-export.md)。
|
||||
|
||||
**Instagram (需要 instaloader):**
|
||||
> "Instagram 需要 instaloader。我来帮你安装。"
|
||||
|
||||
```bash
|
||||
pip install instaloader
|
||||
```
|
||||
|
||||
> **登录方式(解锁私密内容):**
|
||||
> - **方法 1(推荐):Cookie-Editor 导入:** 在浏览器登录 Instagram → 用 [Cookie-Editor](https://chromewebstore.google.com/detail/cookie-editor/hlkenndednhfkekhgcdicdfddnkalmdm) 导出 Header String → 粘贴:
|
||||
> ```bash
|
||||
> agent-reach configure instagram-cookies "sessionid=xxx; csrftoken=yyy; ..."
|
||||
> ```
|
||||
> - **方法 2:instaloader 命令行登录:** `instaloader --login YOUR_USERNAME`(需要输密码,有 2FA 的话还要输验证码)
|
||||
|
||||
**LinkedIn (可选 — linkedin-scraper-mcp):**
|
||||
> "LinkedIn 基本内容可通过 Jina Reader 读取。完整功能(Profile 详情、职位搜索)需要 linkedin-scraper-mcp。"
|
||||
|
||||
|
|
@ -250,6 +236,5 @@ If the user wants a different agent to handle it, let them choose.
|
|||
| `agent-reach search-youtube "query"` | Search YouTube |
|
||||
| `agent-reach search-bilibili "query"` | Search Bilibili |
|
||||
| `agent-reach search-xhs "query"` | Search XiaoHongShu |
|
||||
| `agent-reach search-instagram "query"` | Search Instagram |
|
||||
| `agent-reach search-linkedin "query"` | Search LinkedIn |
|
||||
| `agent-reach search-bosszhipin "query"` | Search Boss直聘 |
|
||||
|
|
|
|||
|
|
@ -68,16 +68,3 @@ bird search "test"
|
|||
**解决方案:**
|
||||
- **本地电脑:** 正常使用,一般不会被拦
|
||||
- **服务器:** 使用 Jina Reader 读取职位页面 + Exa 搜索职位信息作为替代
|
||||
|
||||
---
|
||||
|
||||
## Instagram: Checkpoint / 安全验证
|
||||
|
||||
**症状:** `instaloader --login` 触发 Instagram 安全验证
|
||||
|
||||
**原因:** Instagram 检测到从未见过的设备/位置登录。
|
||||
|
||||
**解决方案:**
|
||||
1. 在自己的浏览器登录 Instagram
|
||||
2. 用 Cookie-Editor 导出 Cookie
|
||||
3. 配置:`agent-reach configure instagram-cookies "sessionid=xxx; csrftoken=yyy; ..."`
|
||||
|
|
|
|||
|
|
@ -10,7 +10,7 @@ keywords = [
|
|||
"ai-agent", "llm-tools", "agent-infrastructure", "mcp",
|
||||
"web-reader", "web-scraper", "search",
|
||||
"twitter-scraper", "reddit-scraper", "youtube-transcript",
|
||||
"bilibili", "xiaohongshu", "instagram",
|
||||
"bilibili", "xiaohongshu",
|
||||
"ai-search", "cli", "automation",
|
||||
"claude-code", "cursor", "openai",
|
||||
"free-api", "no-api-key",
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue