Available for consulting

Daniel Otero

AI Engineer · Data Scientist · Applied Researcher

I build production systems across three fronts — agentic AI (LangGraph, RAG, multi-agent), NLP and text analytics (semantic search, embeddings, bibliometric networks), and applied data science (clustering, dashboards, pipelines) — for research and product teams across Latin America.

Daniel Otero

Bridging social science and AI

I'm an economist and computer-science engineer (M.Sc.) working across three fronts: agentic AI (LLM orchestration, RAG, multi-agent systems), NLP and text analytics (semantic search, embeddings, bibliometric networks), and applied data science (clustering, statistical modeling, dashboards). My path moves between them — sometimes within a single project.

That breadth means I do the technical work and understand the social, organizational, and research context behind it. I've shipped conversational agents serving hundreds of users monthly across Latin America, ML clustering pipelines for survey research, RAG systems with vector search, and 6 monitoring dashboards across 4 countries for data-capture and impact-evaluation processes.

Currently leading data science and AI at Estudio Plural — designing LLM-based tools for behavioral research, knowledge retrieval, and organizational intelligence. I publish peer-reviewed work on bibliometric NLP, teach, and consult on applied research projects when there's a good fit.

700+
Active WhatsApp bot users / month
8
Countries reached with data systems
104K
Nodes in bibliometric citation network
4
Peer-reviewed publications

What I work with

Agentic AI & LLMs
LangChainLangGraph RAGMulti-agent Prompt EngineeringFine-tuning Hugging FaceOpenRouter
NLP & Text Analytics
EmbeddingsSemantic Search Text ClassificationSentiment Analysis Network AnalysisBibliometrics
Data Science & Stats
PythonR Pandasscikit-learn PlotlyClustering PCAStatistical Modeling
Agentic Coding Systems
Claude CodeCodex OpenCode
Infrastructure & Storage
FastAPIStreamlit Next.jsDocker GitHub Actionsn8n TwilioPostgreSQL MongoDBQdrant Neo4j

Selected projects

Production systems across three fronts — agentic AI, NLP, and data science — built for research and product teams.

Data Science
Cali Electoral Map
Live

Interactive map of Cali's 339 neighborhoods with the 2026 first-round presidential results. Official Registraduría tally (216 polling stations, 5,158 tables) geolocated to neighborhood level by cross-referencing IDESC's WFS school layer + OpenStreetMap geocoding. Leaflet, static site on Vercel.

339 neighborhoods · 1M+ official votes mapped
LeafletGeoJSON PythonShapely Open Data
Data Science NLP
Narrativas 2026
Live

Social-listening platform mapping the public X/Twitter conversation around Colombia's 2026 presidential race. Apify ingestion → LLM classification of emotion, framing and stance → embedding-based narrative clustering (fastembed + HDBSCAN) → a directed interaction graph with coalition alignment. Next.js 16 static site on Vercel.

Navigable network graph · sentiment over time
Next.jsNLP Embeddingsnetworkx Apify
Data Science NLP
LIN — Social Listening
Live

Digital social-listening pilot on the Spanish-speaking online conversation (TikTok focus), built for an Estudio Plural × Camino proposal. 742 unique videos over four months, each AI-classified as signal vs noise — only ~35% of keyword matches are genuine, so raw volume overstates the topic roughly 3×. Hashtag-and-creator network, geography, engagement by theme, and a video-by-video corpus explorer. Next.js on Vercel.

742 videos classified · raw volume overstates the topic ~3×
Next.jsNLP ApifyOpenRouter Recharts
Data Science
SGR Dashboard
Live

Interactive dashboard for Colombia's General Royalties System (SGR). Real-time data from datos.gov.co via Socrata API, dynamic filters, choropleth maps, and Excel export. Deployed on Streamlit Cloud.

Saves 8 hours / week to the project formulation team
Streamlitpydeck PlotlySocrata API GeoJSON
Data Science Agentic
AMA Survey Pipeline
Live

Survey-processing pipeline for AMA's social field research across multiple cities. KoboToolbox ingestion, validation and deduplication feeding an interactive endline results dashboard — auto-generated charts, cross-tabs, and LLM-written report sections. Deployed on Streamlit Cloud.

Multi-city field survey · live endline dashboard
Pythonpandas KoboToolboxStreamlit LLM
Data Science
AMA Bot Monitoring
Live

Monitoring dashboard for the AMA WhatsApp bot. Tracks user activity, sessions, and engagement across deployments. Streamlit + Supabase backend with Plotly visualizations and Excel exports.

Real-time tracking of bot activity for the research team
StreamlitSupabase SQLAlchemyPlotly Python
Agentic Data Science
archetypeSuite
Code

End-to-end ML pipeline for archetype discovery. LangGraph orchestrates ingestion → profiling → preprocessing → algorithm selection → clustering → LLM-generated narrative. 33 automated tests passing.

Cuts survey analysis time by 50%
PythonLangGraph scikit-learnStreamlit OpenRouter
Agentic NLP
Aly — WhatsApp AI Agent
MVP

Multilingual bot (ES/EN/PT) for Equimundo's A+P Manual. 5 sequential LLM agents: language detection → intent classification → specialized response (factual, planning, ideation, sensitive topics). Built with FastAPI + LangGraph.

Active across 4 countries · ~400 users / month
FastAPILangGraph Text ClassificationMongoDB Twilio
Agentic NLP
convocatorias-bot
Production

Automated daily scanner of 15+ funding and grant sources. Claude AI filters by organizational relevance, deduplicates results, and sends curated alerts to Slack. Runs on GitHub Actions every morning.

Saves 10 hours / week to the project formulation team
PythonBeautifulSoup Claude AISlack API GitHub Actions
Data Science
Aly Dashboard
Code

Operational monitoring dashboard for the Aly (Apapáchar) WhatsApp bot. KPIs with sparklines and deltas, geographic visualization, alert flags with Excel export and review-status toggle, and a leaderboard with drill-down. Multi-page Streamlit app with custom navigation and i18n.

Real-time bot monitoring for the research team
StreamlitSupabase PlotlyPostgreSQL Python
Data Science Agentic
AMA Lineabase 2026
Active

Field-survey validation pipeline for the AMA program in Leticia (Colombia) and Cobija (Bolivia). KoboToolbox QC, ID validation, duration outlier detection per classroom, attendance crosschecks vs Google Forms, school-level Excel reports, and LLM-generated narrative summaries via OpenRouter.

2 cities across 2 countries · automated QC + reporting
Pythonpandas KoboToolboxOpenRouter CLI
Agentic
agentChatBuilder
In Dev

No-code SaaS platform for building multi-agent chatbots with multi-channel deployment (WhatsApp, Telegram, Web). Full UI in Next.js + shadcn/ui; FastAPI backend with MongoDB Atlas and Supabase auth.

Next.jsTypeScript FastAPIMongoDB Supabase

Where I've worked

Estudio Plural
  • Data Science & AI Specialist Jan 2025 – Present
  • Behavioral Research & Analytics Lead Jun 2024 – Dec 2024
  • Data Analytics Consultant Dec 2023 – May 2024
  • Data Analytics Consultant Jul 2023 – Aug 2023
  • Built 2 conversational agents with LangGraph: first deployed across 4 countries (CO/EC/PE/BO) with ~400 users/month; second active in CO & MX with ~300 users/month.
  • Real-time monitoring dashboards with leaderboards and automatic report generation, giving researchers immediate access to bot usage metrics with no manual extraction.
  • Automation flows in n8n and Zapier for admin and accounting → 80% time saved on repetitive tasks.
  • AI system for automatic detection of funding opportunities → 10 hours/week saved for the project formulation team.
  • Python data pipelines connecting KoboToolbox & Typeform to Supabase dashboards → ~90% reduction in field-data monitoring time.
  • Multi-agent processing system for clustering and behavioral narrative generation over survey data → 50% reduction in analysis time.
Glasswing International
  • AI & Automation Consultant May 2025 – Jul 2025
  • Designed and shipped an end-to-end accounting-automation system that replaces the accountant's manual forwarding and review of supplier invoices → ~12 hours/week of manual work freed. The routing worker (Python, Gmail API + SQLite) classifies each incoming invoice and routes it to the right coordinators from a matrix of 323 suppliers and ~1,900 assignments, running in production on a VPS (Docker Compose behind Tailscale) with Telegram alerts.
  • Built the automatic payment-package verification module: it extracts and cross-checks multiple documents (invoice, RUT, chamber-of-commerce, bank certification, transfer/reimbursement request) with Claude via OpenRouter (text + vision for scans), robust to how attachments arrive (merged or separate). It flags amount, NIT and beneficiary mismatches and replies approving or returning with corrections → each review cut from ~15 min to under 1 min.
Octopus Force
  • Project Analyst Jan 2025 – Mar 2026
  • Research Leader Jul 2023 – Dec 2024
  • Built SGR (General Royalties System) monitoring dashboard integrated with the datos.gov.co Open Data API → 8 hours/week saved for the project formulation team.
  • Developed a prompt library for technology surveillance → research time per report cut from 8 to 3 days (-63%), applied across ~20 reports for companies in Valle del Cauca.
  • Deployed intelligent agents for information synthesis and organization across research, search, and project formulation in public and corporate contexts.
  • Built MVP of a multi-agent assistant for document management, focused on classification and efficient access to technical and administrative documents.
Universidad del Valle, CIDSE
  • Data Analysis Consultant Nov 2024 – Dec 2024
  • Data Analytics & Experimental Design Consultant Oct 2023 – Dec 2023
  • Shiny Developer Oct 2020 – Dec 2020
  • Sample design, construction and deployment of experimental surveys in oTree, with results processing using clustering algorithms for computational social science projects.
  • Narrative and social-network analysis using NLP and text mining in R.
  • Interactive Shiny dashboards for non-technical research teams.
Tell Business Storytelling
  • Data Analytics Consultant Mar 2024 – Jun 2024
  • Data Scientist Mar 2020 – Dec 2021
  • Designed and analyzed an end-to-end Typeform survey including final report → 50% time reduction vs. previous process.
  • Automated survey processing and report generation across 6 countries (CL, SV, CO, MX, UY, PE) → 90% time saved for the research team.
  • Built data capture system via Twitter API + Google Trends/News → weekly collection cut from 2 days to 20 minutes.
  • NLP and text mining pipelines for sentiment analysis, clustering, and user-persona construction over the captured data.
Universidad del Valle, CINARA Institute
  • Quantitative Analytics Lead · PUDA2022 Project Sep 2021 – Jun 2023
  • Sample design and construction of socio-environmental surveys; processed results applying PCA and clustering for data characterization in water and sanitation contexts.
  • Network models and fuzzy logic systems applied to complex socio-environmental systems.
Fundación Univalle
  • Advisor Sep 2020 – Nov 2020
  • Applied clustering and PCA over SISBEN IV data for categorization and georeferencing of vulnerable population, with technical reporting.
Directrix Analytics
  • Data Scientist · HORIZONT Project Nov 2018 – Apr 2019
  • Automated industrial sensor data capture from a kiln at the ARGOS Yumbo (Valle del Cauca) plant, enabling continuous monitoring of process variables.
  • PCA and predictive models on the captured data; visualization dashboards for plant teams.
CIDSE, Universidad del Valle
  • Research Assistant Jan 2017 – Jul 2020
  • Automated download of scientific citation data from the RePEc API, building a network of 104,589 nodes6 months of manual work saved.
  • Built semantic models and citation networks in R and Python for bibliometric and influence analysis in economics.
  • Co-author of 4 peer-reviewed publications in international journals (see Publications).

Publications

Bibliometric NLP and citation-network analysis applied to economic discourse — 104K+ nodes across four peer-reviewed studies.

The Drifting Influence of Hall's Random Walk Hypothesis on Consumption Modeling
García, C., Otero, D. & Salazar, B. · History of Political Economy, 55(1), 103–143 · 2023
doi.org/10.1215/00182702-10213653 ↗
A Tale of a Tool: The Impact of Sims's Vector Autoregressions on Macroeconometrics
Salazar, B. & Otero, D. · History of Political Economy, 51(3), 557–578 · 2019
doi.org/10.1215/00182702-7551924 ↗
La revolución empírica en economía
Salazar, B. & Otero, D. · Apuntes del CENES, 38(68) · 2019
doi.org/10.19053/01203053.v38.n68.2019.8792 ↗
La revolución de los nuevos clásicos: redes, influencia y metodología
Salazar, B. & Otero, D. · Revista de Economía Institucional, 17(32), 39–69 · 2015
doi.org/10.18601/01245996.v17n32.02 ↗

How I can help

Hire me to take an idea from prototype to production — across conversational AI, data systems, and applied research.

Conversational AI & Agents

Production chatbots and multi-agent assistants — multilingual, with RAG over your documents and conversation memory. Deployed where your users already are.

WhatsAppWeb RAGLangGraph Multi-agent

Data Pipelines & Dashboards

From messy field data to decisions your team can act on: ingestion, automated validation and QC, and live dashboards they'll actually use.

KoboToolboxStreamlit PostgreSQLAutomation

NLP & Applied Research

Research-grade text and data analysis — semantic search, classification, sentiment, clustering and network analysis over surveys, documents and organizational data.

EmbeddingsSemantic Search ClusteringNetworks

Let's work together

Open to consulting, research collaborations, and new projects — especially where AI, data, and social impact intersect.