typescript

Building a Local Chatbot with Typescript, Express, Langchain, and Ollama (Qwen)

Mark Allen

11 Aug 2025 • 4 min read

In the world of AI-powered applications, large language models (LLMs) like OpenAI’s GPT models get most of the attention. However, running your own model locally offers several advantages: no API costs, no rate limits, and complete control over your data. In this article, we'll walk through building a local chatbot API in Typescript using Express and Langchain, with Ollama serving a Qwen3 model.

The result will be a flexible, local-first chatbot you can integrate into any frontend or automation pipeline.

Section 1: Installing Dependencies and Initial Project Setup

We start by creating our Node.js project and installing the required dependencies.

Make sure you have the latest lts version of node and yarninstalled. I use nvm to manage this.

nvm install lts/*
npm install -g strip-json-comments-cli
npm install -g yarn

Now, create the directory for the chatbot and run yarn init

mkdir chatbot-express-example
cd chatbot-express-example
node --version > .nvmrc
yarn init -y

This will give you a package.json like:

{
  "name": "chatbot-express-example",
  "version": "0.1.0",
  "main": "index.js",
  "author": "Mark C Allen <mark@markcallen.com>",
  "license": "MIT"
  }

Now setup express and typescript.

yarn add express
yarn add -D typescript @types/node @types/express tsx rimraf
yarn add langchain @langchain/community

And configure the tsconfig.json

npx tsc --init --rootDir src --outDir dist \
  --esModuleInterop --target es2020 --module commonjs \
  --verbatimModuleSyntax false --allowJs true --noImplicitAny true
# clean up tsconfig
cat tsconfig.json \
  | strip-json-comments --no-whitespace \
  | jq -r . > tsconfig.pretty.json && mv tsconfig.pretty.json tsconfig.json

Here’s what’s happening:

express — Minimal web framework for building the API.
typescript, tsx, @types/node, @types/express — TypeScript support and types.
rimraf — Utility to clear the build folder.
langchain & @langchain/community — Abstractions for prompt management, model calls, and streaming with Ollama.

Section 2: Writing the Express API with LangChain

We now create our chatbot server at src/index.ts:

import express from "express"
import type { Request, Response } from "express";
import { ChatPromptTemplate, MessagesPlaceholder } from "@langchain/core/prompts";
import { RunnableSequence } from "@langchain/core/runnables";
import { ChatOllama } from "@langchain/community/chat_models/ollama";
import { HumanMessage, SystemMessage } from "@langchain/core/messages";

const app = express();
app.use(express.json());

const OLLAMA_URL = process.env.OLLAMA_URL ?? "http://localhost:11434";
const MODEL = process.env.OLLAMA_MODEL ?? "qwen3:0.6b";

const llm = new ChatOllama({
  baseUrl: OLLAMA_URL,
  model: MODEL,
  temperature: 0.7,
});

const system = new SystemMessage(
  "You are everyday devops bot, a concise DevOps assistant. Answer directly, with examples when useful."
);

const prompt = ChatPromptTemplate.fromMessages([
  system,
  new MessagesPlaceholder("messages"),
]);

const chain = RunnableSequence.from([
  prompt,
  llm,
]);

app.post("/chat", async (req: Request, res: Response) => {
  const { question } = req.body ?? {};
  if (!question) return res.status(400).json({ error: "Missing 'question'." });

  const aiMsg = await chain.invoke({
    messages: [new HumanMessage(question)],
  });
  const content =
    typeof aiMsg.content === "string"
      ? aiMsg.content
      : aiMsg.content.map((c: any) => c?.text ?? "").join("");

  res.json({ answer: content });
});

app.post("/chat/stream", async (req: Request, res: Response) => {
  const { question } = req.body ?? {};
  if (!question) return res.status(400).json({ error: "Missing 'question'." });

  res.setHeader("Content-Type", "text/event-stream");
  res.setHeader("Cache-Control", "no-cache");
  res.setHeader("Connection", "keep-alive");

  const stream = await chain.stream({
    messages: [new HumanMessage(question)],
  });

  for await (const chunk of stream) {
    const piece =
      typeof chunk.content === "string"
        ? chunk.content
        : Array.isArray(chunk.content)
        ? chunk.content.map((c: any) => c?.text ?? "").join("")
        : "";
    if (piece) res.write(`data: ${JSON.stringify(piece)}\n\n`);
  }
  res.end();
});

const port = Number(process.env.PORT ?? 3000);
app.listen(port, () => {
  console.log(`API listening on :${port}`);
});

What’s happening here:

SystemMessage defines the assistant’s tone and scope.
ChatPromptTemplate structures the input for the model.
ChatOllama connects LangChain to the local Ollama server.
/chat handles single-response interactions.
/chat/stream handles streamed responses

To build and run locally, add scripts for dev, build and start

npm pkg set "scripts.build"="rimraf ./dist && tsc"
npm pkg set "scripts.start"="node dist/index.js"

Section 3: Ollama

It's best to run ollama on the OS directory to utilize the GPUs. See Supercharge Your Local AI for more details on how to do this.

Start the server

ollama serve

Pull and run the model

ollama pull qwen3:0.6b
ollama run qwen3:0.6b

Section 4: Building and Running

To build and start:

yarn build
yarn start

Test the chatbot:

curl -sS -X POST http://localhost:3000/chat   -H "Content-Type: application/json"   -d '{"question":"Give me one CI/CD best practice."}'

Streaming test:

curl -N -X POST http://localhost:3000/chat/stream   -H "Content-Type: application/json"   -d '{"question":"List three feature flag tips."}'

Add everything to git:

git init

cat << EOF > .gitignore
.env
yarn-error.log
dist/
node_modules/
EOF

git add .
git commit -m "First checkin" -a

I've published this to github at: https://github.com/markcallen/chatbot-express-example

Next Steps

The next steps are now to add linting, docker and dev containers to stream-line your development process.

Conclusion

We’ve built a local-first chatbot API that uses Langchain to manage prompts and connect to Ollama, running the Qwen model locally.
By combining Typescript, Express and Langchain, we gain:

Cost-free inference (no API tokens)
Total control over model updates and behaviour
Easy extensibility for new endpoints, memory, and tools

You can now integrate this backend into a web UI, CLI, or automation workflow, keeping everything under your control while still leveraging powerful LLM capabilities.