From f70711e75e52286abe09f1bf0e451d367daad14d Mon Sep 17 00:00:00 2001 From: Panniantong Date: Thu, 26 Feb 2026 07:20:13 +0100 Subject: [PATCH] =?UTF-8?q?remove(instagram):=20=E7=A7=BB=E9=99=A4=20Insta?= =?UTF-8?q?gram=20=E6=B8=A0=E9=81=93?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Instagram 反爬封杀导致所有开源工具(instaloader 等)失效, 无论有无 cookies 都无法正常使用。 - 删除 instagram.py 渠道文件 - 移除 CLI 中 search-instagram、configure instagram-cookies 等命令 - 移除 setup/doctor 中 instaloader 依赖检查 - 更新 README、docs、SKILL.md、pyproject.toml 上游 issue: instaloader#2585, instaloader#2648 Relates to: #13 --- CHANGELOG.md | 21 ++- README.md | 9 +- agent_reach/channels/__init__.py | 2 - agent_reach/channels/instagram.py | 248 ------------------------------ agent_reach/cli.py | 56 +------ agent_reach/core.py | 6 - agent_reach/skill/SKILL.md | 10 +- docs/README_en.md | 7 +- docs/install.md | 17 +- docs/troubleshooting.md | 13 -- pyproject.toml | 2 +- 11 files changed, 21 insertions(+), 370 deletions(-) delete mode 100644 agent_reach/channels/instagram.py diff --git a/CHANGELOG.md b/CHANGELOG.md index 1fa1840..aa77912 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -10,13 +10,10 @@ All notable changes to this project will be documented in this file. ### 🆕 New Channels / 新增渠道 -#### 📷 Instagram -- Read public posts and profiles via [instaloader](https://github.com/instaloader/instaloader) -- Search via Exa (free, no API key) -- Optional cookie login for private content -- 通过 instaloader 读取公开帖子和 Profile -- 搜索通过 Exa(免费,无需 API Key) -- 可选 Cookie 登录解锁私密内容 +#### ~~📷 Instagram~~ (removed — upstream blocked) +- ~~Read public posts and profiles via [instaloader](https://github.com/instaloader/instaloader)~~ +- **Removed:** Instagram's aggressive anti-scraping measures broke all available open-source tools (instaloader, etc.). See [instaloader#2585](https://github.com/instaloader/instaloader/issues/2585). Will re-add when upstream recovers. +- **已移除:** Instagram 反爬封杀导致所有开源工具(instaloader 等)失效。上游恢复后会重新加回。 #### 💼 LinkedIn - Read person profiles, company pages, and job details via [linkedin-scraper-mcp](https://github.com/stickerdaniel/linkedin-mcp-server) @@ -38,12 +35,12 @@ All notable changes to this project will be documented in this file. - Channel count: 9 → 12 - `agent-reach doctor` now detects all 12 channels -- CLI: added `search-instagram`, `search-linkedin`, `search-bosszhipin` subcommands +- CLI: added `search-linkedin`, `search-bosszhipin` subcommands - Updated install guide with setup instructions for new channels -- 渠道数量:9 → 12 -- `agent-reach doctor` 现在检测全部 12 个渠道 -- CLI:新增 `search-instagram`、`search-linkedin`、`search-bosszhipin` 子命令 -- 安装指南新增三个渠道的配置说明 +- 渠道数量:9 → 11 +- `agent-reach doctor` 现在检测全部 11 个渠道 +- CLI:新增 `search-linkedin`、`search-bosszhipin` 子命令 +- 安装指南新增渠道配置说明 --- diff --git a/README.md b/README.md index 6cb1327..0f428b6 100644 --- a/README.md +++ b/README.md @@ -69,7 +69,6 @@ AI Agent 已经能帮你写代码、改文档、管项目——但你让它去 | 📺 **B站** | 本地:字幕提取 + 搜索 | 服务器也能用 | 告诉 Agent「帮我配代理」 | | 📖 **Reddit** | 搜索(通过 Exa 免费) | 读帖子和评论 | 告诉 Agent「帮我配代理」 | | 📕 **小红书** | — | 阅读、搜索、发帖、评论、点赞 | 告诉 Agent「帮我配小红书」 | -| 📷 **Instagram** | 搜索(通过 Exa 免费) | 读取帖子和 Profile | 告诉 Agent「帮我配 Instagram」 | | 💼 **LinkedIn** | Jina Reader 读公开页面 | Profile 详情、公司页面、职位搜索 | 告诉 Agent「帮我配 LinkedIn」 | | 🏢 **Boss直聘** | Jina Reader 读职位页 | 搜索职位、向 HR 打招呼 | 告诉 Agent「帮我配 Boss直聘」 | @@ -148,7 +147,6 @@ channels/ ├── bilibili.py → yt-dlp ← 可以换成 bilibili-api…… ├── reddit.py → JSON API + Exa ← 可以换成 PRAW、Pushshift…… ├── xiaohongshu.py → mcporter MCP ← 可以换成其他 XHS 工具…… -├── instagram.py → instaloader ← 可以换成 instagrapi、官方 API…… ├── linkedin.py → linkedin-mcp ← 可以换成 LinkedIn API…… ├── bosszhipin.py → mcp-bosszp ← 可以换成其他招聘工具…… ├── rss.py → feedparser ← 可以换成 atoma…… @@ -167,7 +165,6 @@ channels/ | GitHub | [gh CLI](https://cli.github.com) | 官方工具,认证后完整 API 能力 | | 读 RSS | [feedparser](https://github.com/kurtmckee/feedparser) | Python 生态标准选择,2.3K Star | | 小红书 | [xiaohongshu-mcp](https://github.com/xpzouying/xiaohongshu-mcp) | ⭐9K+,Go 语言,Docker 一键部署 | -| Instagram | [instaloader](https://github.com/instaloader/instaloader) | ⭐9.8K,Python CLI,Cookie 登录,免费 | | LinkedIn | [linkedin-scraper-mcp](https://github.com/stickerdaniel/linkedin-mcp-server) | ⭐900+,MCP 服务,浏览器自动化 | | Boss直聘 | [mcp-bosszp](https://github.com/mucsbr/mcp-bosszp) | MCP 服务,支持职位搜索和打招呼 | @@ -189,7 +186,7 @@ Agent Reach 在设计上重视安全: ### 🍪 Cookie 安全建议 -需要 Cookie 的平台(Twitter、小红书、Instagram)建议使用**专用小号**,不要用主账号。Cookie 等同于完整登录权限,用小号可以在凭据泄露时限制影响范围。 +需要 Cookie 的平台(Twitter、小红书)建议使用**专用小号**,不要用主账号。Cookie 等同于完整登录权限,用小号可以在凭据泄露时限制影响范围。 ### 📦 安装方式 @@ -268,14 +265,14 @@ Yes! Agent Reach is a standard CLI tool — any AI coding agent that can run she
Is this free? Any API costs? -100% free. All backends are open-source tools (bird CLI, yt-dlp, Jina Reader, instaloader, Exa, etc.) that don't require paid API keys. The only optional cost is a residential proxy (~$1/month) if you need Reddit/Bilibili access from a server. +100% free. All backends are open-source tools (bird CLI, yt-dlp, Jina Reader, Exa, etc.) that don't require paid API keys. The only optional cost is a residential proxy (~$1/month) if you need Reddit/Bilibili access from a server.
--- ## 致谢 -[Jina Reader](https://github.com/jina-ai/reader) · [yt-dlp](https://github.com/yt-dlp/yt-dlp) · [bird](https://www.npmjs.com/package/@steipete/bird) · [Exa](https://exa.ai) · [mcporter](https://github.com/steipete/mcporter) · [feedparser](https://github.com/kurtmckee/feedparser) · [xiaohongshu-mcp](https://github.com/xpzouying/xiaohongshu-mcp) · [instaloader](https://github.com/instaloader/instaloader) · [linkedin-scraper-mcp](https://github.com/stickerdaniel/linkedin-mcp-server) · [mcp-bosszp](https://github.com/mucsbr/mcp-bosszp) +[Jina Reader](https://github.com/jina-ai/reader) · [yt-dlp](https://github.com/yt-dlp/yt-dlp) · [bird](https://www.npmjs.com/package/@steipete/bird) · [Exa](https://exa.ai) · [mcporter](https://github.com/steipete/mcporter) · [feedparser](https://github.com/kurtmckee/feedparser) · [xiaohongshu-mcp](https://github.com/xpzouying/xiaohongshu-mcp) · [linkedin-scraper-mcp](https://github.com/stickerdaniel/linkedin-mcp-server) · [mcp-bosszp](https://github.com/mucsbr/mcp-bosszp) ## License diff --git a/agent_reach/channels/__init__.py b/agent_reach/channels/__init__.py index f9aee55..2971a13 100644 --- a/agent_reach/channels/__init__.py +++ b/agent_reach/channels/__init__.py @@ -20,7 +20,6 @@ from .rss import RSSChannel from .bilibili import BilibiliChannel from .exa_search import ExaSearchChannel from .xiaohongshu import XiaoHongShuChannel -from .instagram import InstagramChannel from .linkedin import LinkedInChannel from .bosszhipin import BossZhipinChannel @@ -33,7 +32,6 @@ ALL_CHANNELS: List[Channel] = [ RedditChannel(), BilibiliChannel(), XiaoHongShuChannel(), - InstagramChannel(), LinkedInChannel(), BossZhipinChannel(), RSSChannel(), diff --git a/agent_reach/channels/instagram.py b/agent_reach/channels/instagram.py deleted file mode 100644 index ce18bba..0000000 --- a/agent_reach/channels/instagram.py +++ /dev/null @@ -1,248 +0,0 @@ -# -*- coding: utf-8 -*- -"""Instagram — via instaloader (free, open source). - -Backend: instaloader (9.8K stars, Python CLI + library) -Swap to: any Instagram access tool -""" - -import re -import shutil -import subprocess -from pathlib import Path -from urllib.parse import urlparse -from .base import Channel, ReadResult, SearchResult -from typing import List - - -class InstagramChannel(Channel): - name = "instagram" - description = "Instagram 帖子和 Profile" - backends = ["instaloader"] - tier = 2 # Needs login for full access - - def can_handle(self, url: str) -> bool: - domain = urlparse(url).netloc.lower() - return "instagram.com" in domain or "instagr.am" in domain - - def check(self, config=None): - # Check both CLI and Python module - has_cli = shutil.which("instaloader") - has_module = False - try: - import instaloader - has_module = True - except ImportError: - pass - - if not has_cli and not has_module: - return "off", ( - "需要安装 instaloader:pip install instaloader\n" - " 安装后可读取 Instagram 帖子和 Profile\n" - " 登录: agent-reach configure instagram-cookies \"sessionid=xxx; csrftoken=yyy; ...\"" - ) - - # Check if cookies are configured - cookie_file = Path.home() / ".agent-reach" / "instagram-cookies.txt" - if cookie_file.exists(): - return "ok", "已登录,可读取 Instagram 帖子和 Profile" - return "ok", "可读取公开帖子和 Profile。登录可访问更多内容:\n agent-reach configure instagram-cookies \"sessionid=xxx; csrftoken=yyy; ...\"" - - async def read(self, url: str, config=None) -> ReadResult: - # Try instaloader (module or CLI) - try: - import instaloader - return await self._read_instaloader(url, config) - except ImportError: - pass - # Fallback: Jina Reader - return await self._read_jina(url) - - async def _read_instaloader(self, url: str, config=None) -> ReadResult: - """Read Instagram content using instaloader Python API.""" - import asyncio - import concurrent.futures - - def _sync_read(): - import instaloader - L = instaloader.Instaloader( - download_pictures=False, - download_videos=False, - download_video_thumbnails=False, - download_geotags=False, - download_comments=False, - save_metadata=False, - compress_json=False, - max_connection_attempts=1, # Don't retry on rate limit - ) - - # Try to load session: cookie file > saved session - cookie_file = Path.home() / ".agent-reach" / "instagram-cookies.txt" - if cookie_file.exists(): - try: - cookie_str = cookie_file.read_text().strip() - cookies = {} - for part in cookie_str.split(";"): - part = part.strip() - if "=" in part: - k, v = part.split("=", 1) - cookies[k.strip()] = v.strip() - if "sessionid" in cookies and "csrftoken" in cookies: - # Extract username from ds_user_id or use generic - username = cookies.get("ds_user_id", "user") - L.context.load_session(username, cookies) - except Exception: - pass - elif config and config.get("instagram_username"): - try: - L.load_session_from_file(config.get("instagram_username")) - except Exception: - pass - - path = urlparse(url).path.strip("/") - - if "/p/" in url or "/reel/" in url: - return self._read_post_sync(L, url, path) - else: - return self._read_profile_sync(L, url, path) - - try: - # Run with 15s timeout to avoid instaloader's 30-min retry - loop = asyncio.get_event_loop() - with concurrent.futures.ThreadPoolExecutor() as pool: - result = await asyncio.wait_for( - loop.run_in_executor(pool, _sync_read), - timeout=15, - ) - return result - except (asyncio.TimeoutError, Exception): - # Any error or timeout → Jina fallback - return await self._read_jina(url) - - def _read_post_sync(self, L, url: str, path: str) -> ReadResult: - """Read a single Instagram post (sync, runs in executor).""" - import instaloader - - # Extract shortcode from URL - match = re.search(r"/(?:p|reel)/([A-Za-z0-9_-]+)", url) - if not match: - raise ValueError("Cannot extract shortcode from URL") - - shortcode = match.group(1) - try: - post = instaloader.Post.from_shortcode(L.context, shortcode) - - lines = [] - if post.caption: - lines.append(post.caption) - lines.append("") - lines.append(f"👤 @{post.owner_username}") - lines.append(f"❤️ {post.likes} likes") - if post.comments: - lines.append(f"💬 {post.comments} comments") - lines.append(f"📅 {post.date_utc.strftime('%Y-%m-%d %H:%M')}") - if post.location: - lines.append(f"📍 {post.location}") - if post.hashtags: - lines.append(f"#️⃣ {' '.join('#' + h for h in post.hashtags)}") - - return ReadResult( - title=f"@{post.owner_username}: {(post.caption or '')[:80]}", - content="\n".join(lines), - url=url, - author=f"@{post.owner_username}", - date=post.date_utc.strftime("%Y-%m-%d"), - platform="instagram", - extra={"likes": post.likes, "comments": post.comments}, - ) - except Exception: - raise # Let executor timeout handle fallback - - def _read_profile_sync(self, L, url: str, path: str) -> ReadResult: - """Read an Instagram profile (sync, runs in executor).""" - import instaloader - - # Extract username from path - username = path.split("/")[0] if path else "" - if not username or username in ("p", "reel", "stories", "explore"): - raise ValueError("Cannot extract username from URL") - - try: - profile = instaloader.Profile.from_username(L.context, username) - - lines = [] - lines.append(f"👤 {profile.full_name} (@{profile.username})") - if profile.biography: - lines.append(f"📝 {profile.biography}") - if profile.external_url: - lines.append(f"🔗 {profile.external_url}") - lines.append("") - lines.append(f"📊 {profile.mediacount} posts · " - f"{profile.followers} followers · " - f"{profile.followees} following") - if profile.is_verified: - lines.append("✅ Verified") - if profile.is_business_account and profile.business_category_name: - lines.append(f"🏢 {profile.business_category_name}") - - # Get recent posts (up to 5) - lines.append("") - lines.append("📸 Recent posts:") - count = 0 - for post in profile.get_posts(): - if count >= 5: - break - caption = (post.caption or "")[:100].replace("\n", " ") - lines.append(f" • ❤️{post.likes} | {post.date_utc.strftime('%m-%d')} | {caption}") - count += 1 - - return ReadResult( - title=f"{profile.full_name} (@{profile.username}) - Instagram", - content="\n".join(lines), - url=url, - author=f"@{profile.username}", - platform="instagram", - extra={ - "followers": profile.followers, - "posts": profile.mediacount, - }, - ) - except Exception: - raise # Let executor timeout handle fallback - - async def _read_jina(self, url: str) -> ReadResult: - """Fallback: use Jina Reader.""" - import requests - try: - resp = requests.get( - f"https://r.jina.ai/{url}", - headers={"Accept": "text/markdown"}, - timeout=15, - ) - resp.raise_for_status() - text = resp.text - return ReadResult( - title=text[:100] if text else url, - content=text, - url=url, - platform="instagram", - ) - except Exception: - return ReadResult( - title="Instagram", - content=( - f"⚠️ 无法读取此 Instagram 内容: {url}\n\n" - "提示:\n" - "- 确保 URL 正确\n" - "- 安装 instaloader: pip install instaloader\n" - "- 登录以访问更多内容: instaloader --login YOUR_USERNAME" - ), - url=url, - platform="instagram", - ) - - async def search(self, query: str, config=None, **kwargs) -> List[SearchResult]: - """Search Instagram via Exa.""" - limit = kwargs.get("limit", 10) - from agent_reach.channels.exa_search import ExaSearchChannel - exa = ExaSearchChannel() - return await exa.search(f"site:instagram.com {query}", config=config, limit=limit) diff --git a/agent_reach/cli.py b/agent_reach/cli.py index 9d7d89e..8e1f36e 100644 --- a/agent_reach/cli.py +++ b/agent_reach/cli.py @@ -89,11 +89,6 @@ def main(): p_sx.add_argument("query", nargs="+", help="Search query") p_sx.add_argument("-n", "--num", type=int, default=10, help="Number of results") - # ── search-instagram ── - p_si = sub.add_parser("search-instagram", help="Search Instagram") - p_si.add_argument("query", nargs="+", help="Search query") - p_si.add_argument("-n", "--num", type=int, default=10, help="Number of results") - # ── search-linkedin ── p_sl = sub.add_parser("search-linkedin", help="Search LinkedIn") p_sl.add_argument("query", nargs="+", help="Search query") @@ -122,8 +117,7 @@ def main(): p_conf = sub.add_parser("configure", help="Set a config value or auto-extract from browser") p_conf.add_argument("key", nargs="?", default=None, choices=["proxy", "github-token", "groq-key", - "twitter-cookies", "youtube-cookies", - "instagram-cookies"], + "twitter-cookies", "youtube-cookies"], help="What to configure (omit if using --from-browser)") p_conf.add_argument("value", nargs="*", help="The value(s) to set") p_conf.add_argument("--from-browser", metavar="BROWSER", @@ -436,23 +430,6 @@ def _install_system_deps(): except Exception: print(" ⬜ undici install failed (optional — bird may not work behind proxies)") - # ── instaloader (for Instagram) ── - if shutil.which("instaloader"): - print(" ✅ instaloader already installed") - else: - print(" 📥 Installing instaloader...") - try: - subprocess.run( - [sys.executable, "-m", "pip", "install", "instaloader"], - capture_output=True, text=True, timeout=120, - ) - if shutil.which("instaloader"): - print(" ✅ instaloader installed (Instagram reading)") - else: - print(" ⬜ instaloader install failed (optional — try: pip install instaloader)") - except Exception: - print(" ⬜ instaloader install failed (optional — try: pip install instaloader)") - def _install_system_deps_safe(): """Safe mode: check what's installed, print instructions for what's missing.""" @@ -464,7 +441,6 @@ def _install_system_deps_safe(): ("gh", ["gh"], "GitHub CLI", "https://cli.github.com — or: apt install gh / brew install gh"), ("node", ["node", "npm"], "Node.js", "https://nodejs.org — or: apt install nodejs npm"), ("bird", ["bird", "birdx"], "bird CLI (Twitter)", "npm install -g @steipete/bird"), - ("instaloader", ["instaloader"], "instaloader (Instagram)", "pip install instaloader"), ] missing = [] @@ -495,7 +471,6 @@ def _install_system_deps_dryrun(): ("gh CLI", ["gh"], "apt install gh / brew install gh"), ("Node.js", ["node"], "curl NodeSource setup | bash + apt install nodejs"), ("bird CLI", ["bird", "birdx"], "npm install -g @steipete/bird"), - ("instaloader", ["instaloader"], "pip install instaloader"), ] for label, binaries, method in checks: @@ -764,9 +739,6 @@ def _cmd_configure(args): config.set("groq_api_key", value) print(f"✅ Groq key configured!") - elif args.key == "instagram-cookies": - _configure_instagram_cookies(value) - def _cmd_doctor(): from agent_reach.config import Config @@ -787,30 +759,6 @@ def _parse_cookie_header(cookie_str: str) -> dict: return cookies -def _configure_instagram_cookies(value: str): - """Save Instagram cookies from Cookie-Editor Header String.""" - from pathlib import Path - - cookies = _parse_cookie_header(value) - if "sessionid" not in cookies: - print("❌ Cookie 里缺少 sessionid。") - print(" 确保你已登录 Instagram,然后用 Cookie-Editor 导出 Header String。") - print(' 格式: agent-reach configure instagram-cookies "sessionid=xxx; csrftoken=yyy; ..."') - return - - cookie_dir = Path.home() / ".agent-reach" - cookie_dir.mkdir(parents=True, exist_ok=True) - cookie_file = cookie_dir / "instagram-cookies.txt" - cookie_file.write_text(value.strip()) - cookie_file.chmod(0o600) - - print(f"✅ Instagram cookies 已保存!") - print(f" sessionid: {cookies['sessionid'][:8]}...") - if "csrftoken" in cookies: - print(f" csrftoken: ✅") - if "ds_user_id" in cookies: - print(f" ds_user_id: {cookies['ds_user_id']}") - print(f" 文件: {cookie_file}") def _cmd_setup(): @@ -952,8 +900,6 @@ async def _cmd_search(args): results = await eyes.search_bilibili(query, limit=num) elif args.command == "search-xhs": results = await eyes.search_xhs(query, limit=num) - elif args.command == "search-instagram": - results = await eyes.search_instagram(query, limit=num) elif args.command == "search-linkedin": results = await eyes.search_linkedin(query, limit=num) elif args.command == "search-bosszhipin": diff --git a/agent_reach/core.py b/agent_reach/core.py index 3128677..c5e8a3c 100644 --- a/agent_reach/core.py +++ b/agent_reach/core.py @@ -101,12 +101,6 @@ class AgentReach: results = await ch.search(query, config=self.config, limit=limit) return [r.to_dict() for r in results] - async def search_instagram(self, query: str, limit: int = 10) -> List[Dict[str, Any]]: - """Search Instagram via Exa.""" - ch = get_channel("instagram") - results = await ch.search(query, config=self.config, limit=limit) - return [r.to_dict() for r in results] - async def search_linkedin(self, query: str, limit: int = 10) -> List[Dict[str, Any]]: """Search LinkedIn via MCP or Exa.""" ch = get_channel("linkedin") diff --git a/agent_reach/skill/SKILL.md b/agent_reach/skill/SKILL.md index e9e6028..b7af5af 100644 --- a/agent_reach/skill/SKILL.md +++ b/agent_reach/skill/SKILL.md @@ -2,11 +2,11 @@ name: agent-reach description: > Give your AI agent eyes to see the entire internet. Read and search across - Twitter/X, Reddit, YouTube, GitHub, Bilibili, XiaoHongShu, Instagram, LinkedIn, + Twitter/X, Reddit, YouTube, GitHub, Bilibili, XiaoHongShu, LinkedIn, Boss直聘, RSS, and any web page — all from a single CLI. Use when: (1) reading content from URLs (tweets, Reddit posts, articles, videos), (2) searching across platforms (web, Twitter, Reddit, GitHub, YouTube, Bilibili, - XiaoHongShu, Instagram, LinkedIn, Boss直聘), + XiaoHongShu, LinkedIn, Boss直聘), (3) user asks to configure/enable a platform channel, (4) checking channel health or updating Agent Reach. Triggers: "search Twitter/Reddit/YouTube", "read this URL", "find posts about", @@ -31,7 +31,7 @@ pip install https://github.com/Panniantong/agent-reach/archive/main.zip agent-reach install --env=auto ``` -`install` auto-detects your environment and installs core dependencies (Node.js, mcporter, bird CLI, gh CLI, instaloader). Read the output and run `agent-reach doctor` to see what's active. +`install` auto-detects your environment and installs core dependencies (Node.js, mcporter, bird CLI, gh CLI). Read the output and run `agent-reach doctor` to see what's active. ## Commands @@ -40,7 +40,7 @@ agent-reach install --env=auto agent-reach read agent-reach read --json # structured output ``` -Handles: tweets, Reddit posts, articles, YouTube/Bilibili (transcripts), GitHub repos, Instagram posts, LinkedIn profiles, Boss直聘 jobs, XiaoHongShu notes, RSS feeds, and any web page. +Handles: tweets, Reddit posts, articles, YouTube/Bilibili (transcripts), GitHub repos, LinkedIn profiles, Boss直聘 jobs, XiaoHongShu notes, RSS feeds, and any web page. ### Search @@ -52,7 +52,6 @@ agent-reach search-github "query" # GitHub (--lang ) agent-reach search-youtube "query" # YouTube agent-reach search-bilibili "query" # Bilibili (B站) agent-reach search-xhs "query" # XiaoHongShu (小红书) -agent-reach search-instagram "query" # Instagram agent-reach search-linkedin "query" # LinkedIn agent-reach search-bosszhipin "query" # Boss直聘 ``` @@ -71,7 +70,6 @@ agent-reach check-update # check for new versions ```bash agent-reach configure twitter-cookies "auth_token=xxx; ct0=yyy" -agent-reach configure instagram-cookies "sessionid=xxx; csrftoken=yyy; ..." agent-reach configure proxy http://user:pass@ip:port agent-reach configure --from-browser chrome # auto-extract cookies from local browser ``` diff --git a/docs/README_en.md b/docs/README_en.md index 1ba8a40..d636e16 100644 --- a/docs/README_en.md +++ b/docs/README_en.md @@ -58,7 +58,6 @@ Copy that to your Agent. A few minutes later, it can read tweets, search Reddit, | 🌐 **Web** | Read | Zero config | Any URL → clean Markdown ([Jina Reader](https://github.com/jina-ai/reader) ⭐9.8K) | | 🐦 **Twitter/X** | Read · Search | Zero config / Cookie | Single tweets readable out of the box. Cookie unlocks search, timeline, posting ([bird](https://github.com/steipete/bird)) | | 📕 **XiaoHongShu** | Read · Search · **Post · Comment · Like** | mcporter | Via [xiaohongshu-mcp](https://github.com/user/xiaohongshu-mcp) internal API, install and go | -| 📷 **Instagram** | Search (via Exa) | Read posts and profiles | Tell your Agent "help me set up Instagram" | | 💼 **LinkedIn** | Jina Reader (public pages) | Full profiles, companies, job search | Tell your Agent "help me set up LinkedIn" | | 🏢 **Boss直聘** | Jina Reader (job pages) | Job search, greet recruiters | Tell your Agent "help me set up Boss直聘" | | 🔍 **Web Search** | Search | Auto-configured | Auto-configured during install, free, no API key ([Exa](https://exa.ai) via [mcporter](https://github.com/nicepkg/mcporter)) | @@ -185,7 +184,6 @@ channels/ ├── bilibili.py → yt-dlp ← swap to bilibili-api… ├── reddit.py → JSON API + Exa ← swap to PRAW, Pushshift… ├── xiaohongshu.py → mcporter MCP ← swap to other XHS tools… -├── instagram.py → instaloader ← swap to instagrapi, official API… ├── linkedin.py → linkedin-mcp ← swap to LinkedIn API… ├── bosszhipin.py → mcp-bosszp ← swap to other job tools… ├── rss.py → feedparser ← swap to atoma… @@ -204,7 +202,6 @@ channels/ | GitHub | [gh CLI](https://cli.github.com) | Official tool, full API after auth | | Read RSS | [feedparser](https://github.com/kurtmckee/feedparser) | Python ecosystem standard, 2.3K stars | | XiaoHongShu | [xiaohongshu-mcp](https://github.com/user/xiaohongshu-mcp) | Internal API, bypasses anti-bot | -| Instagram | [instaloader](https://github.com/instaloader/instaloader) | 9.8K stars, Python CLI, cookie auth, free | | LinkedIn | [linkedin-scraper-mcp](https://github.com/stickerdaniel/linkedin-mcp-server) | 900+ stars, MCP server, browser automation | | Boss直聘 | [mcp-bosszp](https://github.com/mucsbr/mcp-bosszp) | MCP server, job search + recruiter greeting | @@ -253,7 +250,7 @@ Yes! Agent Reach is a standard CLI tool. Any AI coding agent that can execute sh
Is Agent Reach free? Any API costs? -100% free and open source. All backends (bird CLI, yt-dlp, Jina Reader, instaloader, Exa) are free tools that don't require paid API keys. The only optional cost is a residential proxy (~$1/month) if you need Reddit/Bilibili access from a server. +100% free and open source. All backends (bird CLI, yt-dlp, Jina Reader, Exa) are free tools that don't require paid API keys. The only optional cost is a residential proxy (~$1/month) if you need Reddit/Bilibili access from a server.
@@ -272,7 +269,7 @@ Agent Reach integrates with xiaohongshu-mcp (runs in Docker). After setup, use ` ## Credits -[Jina Reader](https://github.com/jina-ai/reader) · [yt-dlp](https://github.com/yt-dlp/yt-dlp) · [bird](https://github.com/steipete/bird) · [Exa](https://exa.ai) · [feedparser](https://github.com/kurtmckee/feedparser) · [instaloader](https://github.com/instaloader/instaloader) · [linkedin-scraper-mcp](https://github.com/stickerdaniel/linkedin-mcp-server) · [mcp-bosszp](https://github.com/mucsbr/mcp-bosszp) +[Jina Reader](https://github.com/jina-ai/reader) · [yt-dlp](https://github.com/yt-dlp/yt-dlp) · [bird](https://github.com/steipete/bird) · [Exa](https://exa.ai) · [feedparser](https://github.com/kurtmckee/feedparser) · [linkedin-scraper-mcp](https://github.com/stickerdaniel/linkedin-mcp-server) · [mcp-bosszp](https://github.com/mucsbr/mcp-bosszp) ## License diff --git a/docs/install.md b/docs/install.md index 499302b..bab8c04 100644 --- a/docs/install.md +++ b/docs/install.md @@ -80,7 +80,7 @@ Only ask the user when you genuinely need their input (credentials, permissions, Some channels need credentials only the user can provide. Based on the doctor output, ask for what's missing: -> 🔒 **Security tip:** For platforms that need cookies (Twitter, XiaoHongShu, Instagram), we recommend using a **dedicated/secondary account** rather than your main account. Cookie-based auth grants full account access — using a separate account limits the blast radius if credentials are ever compromised. +> 🔒 **Security tip:** For platforms that need cookies (Twitter, XiaoHongShu), we recommend using a **dedicated/secondary account** rather than your main account. Cookie-based auth grants full account access — using a separate account limits the blast radius if credentials are ever compromised. **Twitter search & posting (server users):** > "To unlock Twitter search, I need your Twitter cookies. Install the Cookie-Editor Chrome extension, go to x.com/twitter.com, click the extension → Export → Header String, and paste it to me." @@ -125,20 +125,6 @@ mcporter config add xiaohongshu http://localhost:18060/mcp > - **本地电脑(有浏览器):** 打开 http://localhost:18060 扫码登录即可。 > - **服务器(无 UI 界面):** 服务器上通常没有浏览器,无法直接扫码。最方便的方式是在自己的电脑上用浏览器登录小红书,然后用 [Cookie-Editor](https://chromewebstore.google.com/detail/cookie-editor/hlkenndednhfkekhgcdicdfddnkalmdm) 插件导出 Cookie(Header String 格式),发给 Agent 即可完成配置。详见 [Cookie 导出指南](cookie-export.md)。 -**Instagram (需要 instaloader):** -> "Instagram 需要 instaloader。我来帮你安装。" - -```bash -pip install instaloader -``` - -> **登录方式(解锁私密内容):** -> - **方法 1(推荐):Cookie-Editor 导入:** 在浏览器登录 Instagram → 用 [Cookie-Editor](https://chromewebstore.google.com/detail/cookie-editor/hlkenndednhfkekhgcdicdfddnkalmdm) 导出 Header String → 粘贴: -> ```bash -> agent-reach configure instagram-cookies "sessionid=xxx; csrftoken=yyy; ..." -> ``` -> - **方法 2:instaloader 命令行登录:** `instaloader --login YOUR_USERNAME`(需要输密码,有 2FA 的话还要输验证码) - **LinkedIn (可选 — linkedin-scraper-mcp):** > "LinkedIn 基本内容可通过 Jina Reader 读取。完整功能(Profile 详情、职位搜索)需要 linkedin-scraper-mcp。" @@ -250,6 +236,5 @@ If the user wants a different agent to handle it, let them choose. | `agent-reach search-youtube "query"` | Search YouTube | | `agent-reach search-bilibili "query"` | Search Bilibili | | `agent-reach search-xhs "query"` | Search XiaoHongShu | -| `agent-reach search-instagram "query"` | Search Instagram | | `agent-reach search-linkedin "query"` | Search LinkedIn | | `agent-reach search-bosszhipin "query"` | Search Boss直聘 | diff --git a/docs/troubleshooting.md b/docs/troubleshooting.md index 880c7a0..d69e8b8 100644 --- a/docs/troubleshooting.md +++ b/docs/troubleshooting.md @@ -68,16 +68,3 @@ bird search "test" **解决方案:** - **本地电脑:** 正常使用,一般不会被拦 - **服务器:** 使用 Jina Reader 读取职位页面 + Exa 搜索职位信息作为替代 - ---- - -## Instagram: Checkpoint / 安全验证 - -**症状:** `instaloader --login` 触发 Instagram 安全验证 - -**原因:** Instagram 检测到从未见过的设备/位置登录。 - -**解决方案:** -1. 在自己的浏览器登录 Instagram -2. 用 Cookie-Editor 导出 Cookie -3. 配置:`agent-reach configure instagram-cookies "sessionid=xxx; csrftoken=yyy; ..."` diff --git a/pyproject.toml b/pyproject.toml index 34c7f68..f0e0e9f 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -10,7 +10,7 @@ keywords = [ "ai-agent", "llm-tools", "agent-infrastructure", "mcp", "web-reader", "web-scraper", "search", "twitter-scraper", "reddit-scraper", "youtube-transcript", - "bilibili", "xiaohongshu", "instagram", + "bilibili", "xiaohongshu", "ai-search", "cli", "automation", "claude-code", "cursor", "openai", "free-api", "no-api-key",