LLMs and JavaScript:
practical guides
The generative model revolution: a use case with Transformers.js
n recent years, generative models based on deep neural networks have revolutionized the tech landscape. Initially developed for academic purposes, they have now become an integral part of everyday applications, from automated content generation to multilingual translation and advanced decision support.
The speed at which these technologies are being adopted and integrated into existing systems is remarkable. To further accelerate this process and make complex concepts—such as model usage, training, and deployment—more accessible, it is essential to adopt practical approaches that align with how we design and develop technological solutions.
In this article, we will explore the integration of language models in both client-side and server-side architectures using Xenova’s Transformers.js library. We will leverage advanced AI capabilities both within a standalone web app—without requiring external servers—and in a backend server via API endpoints, based on our open-source project @volcanicminds/backend.
Why JavaScript?
While Python remains the dominant language in the AI ecosystem, JavaScript offers unique advantages that make it a compelling choice in certain contexts:
- direct execution in the browser: enables AI models to run on the client side, eliminating the need for external servers and reducing response latency;
- backend flexibility: with technologies like Node.js, JavaScript allows the creation of efficient APIs to manage more complex models on a server;
- seamless integration: easily adapts to existing infrastructures, simplifying the incorporation of generative models into web applications;
- rapid development: facilitates prototyping thanks to compatibility with modern browsers and libraries like Transformers.js.
This versatility makes JavaScript a strategic choice for those looking to integrate generative models into modern web applications, offering both in-browser AI execution and scalable server-side solutions.
The Transformers.js library
Transformers.js is a JavaScript port of Hugging Face’s widely used Transformers library. This version allows developers to:
- perform in-browser inference without external servers;
- integrate pre-trained models into existing applications;
- leverage an active community that provides models, tools, and development support.
The combination of these features makes Transformers.js a powerful tool for building innovative AI applications in both client-side and server-side environments.
Practical use case: standalone web app
The demo web app uses Transformers.js to run inference directly in the browser. Once installed and launched, the application is accessible at http://localhost:5173. The core code that interacts with the model is located in src/transformers/worker.js:
import { pipeline, env } from '@xenova/transformers'
env.allowLocalModels = false
/**
* This class uses the Singleton pattern to ensure that only one instance of the pipeline is loaded.
*/
class SamplePipeline {
static task = 'text-generation'
static model = 'Xenova/gpt2'
static instance = null
static async getInstance(progress_callback = null) {
if (this.instance === null) {
this.instance = pipeline(this.task, this.model, { progress_callback })
}
return this.instance
}
}
// Listen for messages from the main thread
self.addEventListener('message', async (event) => {
const { model, task, text, ...rest } = event.data
if (SamplePipeline.model !== model || SamplePipeline.task !== task) {
// Invalidate model if different
SamplePipeline.model = model || SamplePipeline.model
SamplePipeline.task = task || SamplePipeline.task
if (SamplePipeline.instance !== null) {
;(await SamplePipeline.getInstance()).dispose()
SamplePipeline.instance = null
}
}
// Retrieve the code-completion pipeline. When called for the first time,
// this will load the pipeline and save it for future use.
const generator = await SamplePipeline.getInstance((x) => {
// We also add a progress callback to the pipeline so that we can
// track model loading.
self.postMessage(x)
})
// Actually perform the code-completion
let output = await generator(text, {
...rest,
// Allows for partial output
callback_function: (x) => {
self.postMessage({
status: 'update',
output: generator.tokenizer.decode(x[0].output_token_ids, { skip_special_tokens: true })
})
}
})
// Send the output back to the main thread
self.postMessage({
status: 'complete',
output: output
})
})Workflow
- the user submits a message through the web app interface;
- the message is passed to a web worker;
- the web worker processes the request using a predefined LLM model (in this case, Xenova/distilgpt2);
- the model generates a response, which is displayed in the interface.
LLM response preview
This approach is ideal for applications that require fast responses without relying on external servers. However, using a lightweight model like distilgpt2 comes with some limitations in terms of output quality.
Practical use case: backend API
The backend project employs an architecture based on Fastify and TypeORM to provide an API endpoint leveraging Transformers.js. The text generation logic is located in: src/api/pipeline/controller/TextGenerationPipeline.ts:
export class TextGenerationPipeline {
static task = 'text-generation'
static model = 'Xenova/distilgpt2'
static instance = null
static async getInstance(progress_callback = null) {
if (this.instance === null) {
const { pipeline } = await Function('return import("@xenova/transformers")')()
this.instance = pipeline(this.task, this.model, { progress_callback })
}
return this.instance
}
static async execute({ text }) {
const executor: any = await this.getInstance()
let answer = await executor(text, {
temperature: 2,
max_new_tokens: 50,
repetition_penalty: 5,
no_repeat_ngram_size: 2,
num_beams: 2,
num_return_sequences: 1
})
answer = answer?.length > 0 ? answer[0] : answer
return answer?.generated_text || answer
}
}Workflow
- a request is sent to the API endpoint with a text input;
- the API processes the input using a predefined model (again, Xenova/distilgpt2 for simplicity);
- the generated response is returned to the client.
Postman response
This approach is ideal for centralized applications, where computational load can be managed on more powerful servers. For instance, more complex and high-performance models can be integrated to improve response quality.
Comparison: web app vs backend
These two approaches offer complementary advantages:
- standalone web app: eliminates the need for a server, allowing greater client-side autonomy. However, the browser’s limited resources can be a constraint for complex models;
- backend API: centralizes processing, enabling the use of more advanced models. It is particularly suitable for scenarios where hardware resources are not a limiting factor.
A hybrid solution, where tasks are distributed between client and server, could offer an optimal balance but requires careful planning to manage communication and workload distribution.
Conclusion
The demo projects illustrate how to integrate generative AI capabilities using Transformers.js in both a fully client-side environment and a server-side architecture. While not production-ready, these examples serve as a starting point for developers interested in exploring generative models in real-world applications.
For further information, refer to the following resources:
With the continuous evolution of generative models and supporting libraries, the possibilities for innovation are nearly limitless. Whether building a lightweight web app or a robust backend, the key is to start experimenting and contributing to the growing AI ecosystem.
Tag: Technology, AI
Publication date: January 15, 2025
Latest revision: January 16, 2025

