Jay Chi
SUMMARY
WORK EXPERIENCE
- Built and maintained large-scale web crawling systems for multimodal AI training data and business data demands, covering recruitment, resume, headhunter, and competitor intelligence scenarios
- Provided real-time crawling APIs for internal business workflows, including certificate verification, web search, and on-demand data retrieval
- Developed backend services and crawler task workflows using Python, Java, FastAPI, Spring Boot, Kafka, MongoDB, and Redis
- Designed distributed crawling pipelines with coroutine-based concurrency, proxy management, task scheduling, retry handling, state management, and structured data parsing
- Performed anti-bot and risk-control analysis, covering device/browser fingerprinting, TLS/HTTP/2 fingerprinting, network traffic capture, captcha solving, and customized patched/stealth Playwright runtimes
- Reverse-engineered JavaScript, Android, and WeChat Mini Program workflows to analyze request signatures, encryption logic, authentication flows, and anti-crawling mechanisms
- Applied cryptographic analysis including AES, RSA, and message digest algorithms to reproduce protected request parameters and verify data integrity
- Designed the architecture of an AI-agent-assisted crawling platform, integrating Model Context Protocol, context management, SSE streaming, and tool orchestration to support crawler configuration, debugging.
PROJECT
- Built and maintained large-scale crawling workflows for multimodal AI training data collection, covering text, image, video, document, and structured web data from platforms including YouTube, Zhihu, Baidu Wenku, and other high-risk web sources
- Supported PB-scale annual data collection volume, contributing to a data supply system with up to 5PB/year collection capacity for AI training and business data delivery
- Developed distributed crawling pipelines with coroutine-based concurrency, proxy management, retry/backoff strategies, task state management, and structured data parsing to ensure stable high-throughput delivery
- Analyzed platform anti-bot mechanisms, including request behavior limits, access-control triggers, fingerprinting signals, Android/Web API constraints, and JS-rendering barriers
- Built and maintained multiple business-intelligence crawling systems covering headhunter resumes, headhunter jobs, competitor job postings, corporate landscape data, ad promotion data, marketing balance, and transaction records
- Supported Purple for headhunter-platform resume crawling, including account/session handling, resume fetching, structured parsing, task state management, and callback-based result delivery
- Supported Quake for headhunter-platform job crawling, including platform login flows, job list/detail retrieval, position parsing, session management, and crawler stability improvements
- Built competitor job crawling workflows to track job posting updates, online/offline status changes, and market signals from competitor platforms; supported downstream CRM analysis to clean and classify data into two business categories
- Maintained corporate landscape crawling workflows for company profiles, licenses, qualification records, and related corporate metadata, with primary ownership of Hong Kong company data sources
- Implemented authenticated crawling workflows for ad promotion, account-level marketing metrics, balance information, promotion records, and transaction details to support financial and marketing data reconciliation
- Improved crawler robustness through proxy management, browser automation, captcha handling, cookie/session management, retry/backoff strategies, structured error handling, and reverse engineering of request signatures and anti-crawling mechanisms
- Designed the architecture of an AI-agent-assisted crawling platform for crawler configuration, field parsing, seed management, and debugging workflows
- Integrated Model Context Protocol, context management, SSE streaming, and tool orchestration to connect LLM reasoning with crawler tools and browser automation
- Built agent workflows for request/response analysis, parsing-field recommendation, seed browsing, and crawler configuration assistance
- Improved multi-turn agent reliability by handling context consistency, tool-call state, MCP service lifecycle, and streaming response behavior
EDUCATION
SKILLS
Python·Java·FastAPI·Spring Boot·Asyncio·Concurrency·Socket Programming·Distributed Systems·Kafka·MongoDB·MySQL·Redis
Web Scraping·Anti-bot & Risk Control Analysis·Device & Browser Fingerprinting·Captcha Solving·JavaScript / Android / WeChat Mini Program Reverse Engineering·Cryptography: AES, RSA, Message Digest Algorithms·TLS / HTTP/2 Fingerprinting·Browser Automation: Playwright
Model Context Protocol·Context Management·SSE Streaming·Agent Orchestration
Docker Compose·Linux·Git·Nginx·Proxy Networking
Mandarin Chinese (Native)·English (Professional Working Proficiency)
Jay Chi
SUMMARY
Jay Chi
SUMMARY
WORK EXPERIENCE
- Built and maintained large-scale web crawling systems for multimodal AI training data and business data demands, covering recruitment, resume, headhunter, and competitor intelligence scenarios
- Provided real-time crawling APIs for internal business workflows, including certificate verification, web search, and on-demand data retrieval
- Developed backend services and crawler task workflows using Python, Java, FastAPI, Spring Boot, Kafka, MongoDB, and Redis
- Designed distributed crawling pipelines with coroutine-based concurrency, proxy management, task scheduling, retry handling, state management, and structured data parsing
- Performed anti-bot and risk-control analysis, covering device/browser fingerprinting, TLS/HTTP/2 fingerprinting, network traffic capture, captcha solving, and customized patched/stealth Playwright runtimes
- Reverse-engineered JavaScript, Android, and WeChat Mini Program workflows to analyze request signatures, encryption logic, authentication flows, and anti-crawling mechanisms
- Applied cryptographic analysis including AES, RSA, and message digest algorithms to reproduce protected request parameters and verify data integrity
- Designed the architecture of an AI-agent-assisted crawling platform, integrating Model Context Protocol, context management, SSE streaming, and tool orchestration to support crawler configuration, debugging.
PROJECT
- Built and maintained large-scale crawling workflows for multimodal AI training data collection, covering text, image, video, document, and structured web data from platforms including YouTube, Zhihu, Baidu Wenku, and other high-risk web sources
- Supported PB-scale annual data collection volume, contributing to a data supply system with up to 5PB/year collection capacity for AI training and business data delivery
- Developed distributed crawling pipelines with coroutine-based concurrency, proxy management, retry/backoff strategies, task state management, and structured data parsing to ensure stable high-throughput delivery
- Analyzed platform anti-bot mechanisms, including request behavior limits, access-control triggers, fingerprinting signals, Android/Web API constraints, and JS-rendering barriers
- Built and maintained multiple business-intelligence crawling systems covering headhunter resumes, headhunter jobs, competitor job postings, corporate landscape data, ad promotion data, marketing balance, and transaction records
- Supported Purple for headhunter-platform resume crawling, including account/session handling, resume fetching, structured parsing, task state management, and callback-based result delivery
- Supported Quake for headhunter-platform job crawling, including platform login flows, job list/detail retrieval, position parsing, session management, and crawler stability improvements
- Built competitor job crawling workflows to track job posting updates, online/offline status changes, and market signals from competitor platforms; supported downstream CRM analysis to clean and classify data into two business categories
- Maintained corporate landscape crawling workflows for company profiles, licenses, qualification records, and related corporate metadata, with primary ownership of Hong Kong company data sources
- Implemented authenticated crawling workflows for ad promotion, account-level marketing metrics, balance information, promotion records, and transaction details to support financial and marketing data reconciliation
- Improved crawler robustness through proxy management, browser automation, captcha handling, cookie/session management, retry/backoff strategies, structured error handling, and reverse engineering of request signatures and anti-crawling mechanisms
- Designed the architecture of an AI-agent-assisted crawling platform for crawler configuration, field parsing, seed management, and debugging workflows
- Integrated Model Context Protocol, context management, SSE streaming, and tool orchestration to connect LLM reasoning with crawler tools and browser automation
- Built agent workflows for request/response analysis, parsing-field recommendation, seed browsing, and crawler configuration assistance
- Improved multi-turn agent reliability by handling context consistency, tool-call state, MCP service lifecycle, and streaming response behavior
EDUCATION
SKILLS
Python·Java·FastAPI·Spring Boot·Asyncio·Concurrency·Socket Programming·Distributed Systems·Kafka·MongoDB·MySQL·Redis
Web Scraping·Anti-bot & Risk Control Analysis·Device & Browser Fingerprinting·Captcha Solving·JavaScript / Android / WeChat Mini Program Reverse Engineering·Cryptography: AES, RSA, Message Digest Algorithms·TLS / HTTP/2 Fingerprinting·Browser Automation: Playwright
Model Context Protocol·Context Management·SSE Streaming·Agent Orchestration
Docker Compose·Linux·Git·Nginx·Proxy Networking
Mandarin Chinese (Native)·English (Professional Working Proficiency)