LLMs and JavaScript:
practical approaches

n recent years, generative models based on deep neural networks have revolutionized the tech landscape. Initially developed for academic purposes, they have now become an integral part of everyday applications, from automated content generation to multilingual translation and advanced decision support.
The speed at which these technologies are being adopted and integrated into existing systems is remarkable. To further accelerate this process and make complex concepts—such as model usage, training, and deployment—more accessible, it is essential to adopt practical approaches that align with how we design and develop technological solutions.
In this article, we will explore the integration of language models in both client-side and server-side architectures using Xenova’s Transformers.js library. We will leverage advanced AI capabilities both within a standalone web app—without requiring external servers—and in a backend server via API endpoints, based on our open-source project @volcanicminds/backend.
While Python remains the dominant language in the AI ecosystem, JavaScript offers unique advantages that make it a compelling choice in certain contexts:
This versatility makes JavaScript a strategic choice for those looking to integrate generative models into modern web applications, offering both in-browser AI execution and scalable server-side solutions.
Transformers.js is a JavaScript port of Hugging Face’s widely used Transformers library. This version allows developers to:
The combination of these features makes Transformers.js a powerful tool for building innovative AI applications in both client-side and server-side environments.
The demo web app uses Transformers.js to run inference directly in the browser. Once installed and launched, the application is accessible at http://localhost:5173. The core code that interacts with the model is located in src/transformers/worker.js:
import { pipeline, env } from '@xenova/transformers'
env.allowLocalModels = false
/**
* This class uses the Singleton pattern to ensure that only one instance of the pipeline is loaded.
*/
class SamplePipeline {
static task = 'text-generation'
static model = 'Xenova/gpt2'
static instance = null
static async getInstance(progress_callback = null) {
if (this.instance === null) {
this.instance = pipeline(this.task, this.model, { progress_callback })
}
return this.instance
}
}
// Listen for messages from the main thread
self.addEventListener('message', async (event) => {
const { model, task, text, ...rest } = event.data
if (SamplePipeline.model !== model || SamplePipeline.task !== task) {
// Invalidate model if different
SamplePipeline.model = model || SamplePipeline.model
SamplePipeline.task = task || SamplePipeline.task
if (SamplePipeline.instance !== null) {
;(await SamplePipeline.getInstance()).dispose()
SamplePipeline.instance = null
}
}
// Retrieve the code-completion pipeline. When called for the first time,
// this will load the pipeline and save it for future use.
const generator = await SamplePipeline.getInstance((x) => {
// We also add a progress callback to the pipeline so that we can
// track model loading.
self.postMessage(x)
})
// Actually perform the code-completion
let output = await generator(text, {
...rest,
// Allows for partial output
callback_function: (x) => {
self.postMessage({
status: 'update',
output: generator.tokenizer.decode(x[0].output_token_ids, { skip_special_tokens: true })
})
}
})
// Send the output back to the main thread
self.postMessage({
status: 'complete',
output: output
})
})
LLM response preview
This approach is ideal for applications that require fast responses without relying on external servers. However, using a lightweight model like distilgpt2 comes with some limitations in terms of output quality.
The backend project employs an architecture based on Fastify and TypeORM to provide an API endpoint leveraging Transformers.js. The text generation logic is located in: src/api/pipeline/controller/TextGenerationPipeline.ts:
export class TextGenerationPipeline {
static task = 'text-generation'
static model = 'Xenova/distilgpt2'
static instance = null
static async getInstance(progress_callback = null) {
if (this.instance === null) {
const { pipeline } = await Function('return import("@xenova/transformers")')()
this.instance = pipeline(this.task, this.model, { progress_callback })
}
return this.instance
}
static async execute({ text }) {
const executor: any = await this.getInstance()
let answer = await executor(text, {
temperature: 2,
max_new_tokens: 50,
repetition_penalty: 5,
no_repeat_ngram_size: 2,
num_beams: 2,
num_return_sequences: 1
})
answer = answer?.length > 0 ? answer[0] : answer
return answer?.generated_text || answer
}
}
Postman response
This approach is ideal for centralized applications, where computational load can be managed on more powerful servers. For instance, more complex and high-performance models can be integrated to improve response quality.
These two approaches offer complementary advantages:
A hybrid solution, where tasks are distributed between client and server, could offer an optimal balance but requires careful planning to manage communication and workload distribution.
The demo projects illustrate how to integrate generative AI capabilities using Transformers.js in both a fully client-side environment and a server-side architecture. While not production-ready, these examples serve as a starting point for developers interested in exploring generative models in real-world applications.
For further information, refer to the following resources:
With the continuous evolution of generative models and supporting libraries, the possibilities for innovation are nearly limitless. Whether building a lightweight web app or a robust backend, the key is to start experimenting and contributing to the growing AI ecosystem.
Publication date: January 15, 2025
Last updated: January 16, 2025