With speech interfaces changing into extra of a factor, it’s value exploring a number of the issues we will do with speech interactions. Like, what if let’s imagine one thing and have that transcribed and pumped out as a downloadable PDF?
Effectively, spoiler alert: we completely can do this! There are libraries and frameworks we will cobble collectively to make it occur, and that’s what we’re going to do collectively on this article.
These are the instruments we‘re utilizing
First off, these are the 2 massive gamers: Subsequent.js and Categorical.js.
Subsequent.js tacks on further functionalities to React, together with key options for constructing static websites. It’s a go-to for a lot of builders due to what it presents proper out of the field, like dynamic routing, picture optimization, built-in-domain and subdomain routing, quick refreshes, file system routing, and API routes… amongst many, many different issues.
In our case, we positively want Subsequent.js for its API routes on our shopper server. We would like a route that takes a textual content file, converts it to PDF, writes it to our filesystem, then sends a response to the shopper.
Categorical.js permits us to get slightly Node.js app going with routing, HTTP helpers, and templating. It’s a server for our personal API, which is what we’ll want as we move and parse information between issues.
We’ve another dependencies we’ll be placing to make use of:
- react-speech-recognition: A library for changing speech to textual content, making it accessible to React parts.
- regenerator-runtime: A library for troubleshooting the “
regeneratorRuntime
is just not outlined” error that exhibits up in Subsequent.js when utilizing react-speech-recognition - html-pdf-node: A library for changing an HTML web page or public URL right into a PDF
- axios: A library for making HTTP requests in each the browser and Node.js
- cors: A library that enables cross-origin useful resource sharing
Establishing
The very first thing we wish to do is create two mission folders, one for the shopper and one for the server. Title them no matter you’d like. I’m naming mine audio-to-pdf-client
and audio-to-pdf-server
, respectively.
The quickest technique to get began with Subsequent.js on the shopper aspect is to bootstrap it with create-next-app. So, open your terminal and run the next command out of your shopper mission folder:
npx create-next-app shopper
Now we want our Categorical server. We will get it by cd
-ing into the server mission folder and working the npm init
command. A bundle.json
file might be created within the server mission folder as soon as it’s completed.
We nonetheless want to truly set up Categorical, so let’s do this now with npm set up specific
. Now we will create a brand new index.js
file within the server mission folder and drop this code in there:
const specific = require("specific")
const app = specific()
app.hear(4000, () => console.log("Server is working on port 4000"))
Able to run the server?
node index.js
We’re going to wish a pair extra folders and and one other file to maneuver ahead:
- Create a
parts
folder within the shopper mission folder. - Create a
SpeechToText.jsx
file within theparts
subfolder.
Earlier than we go any additional, we have now slightly cleanup to do. Particularly, we have to change the default code within the pages/index.js
file with this:
import Head from "subsequent/head";
import SpeechToText from "../parts/SpeechToText";
export default operate House() {
return (
<div className="dwelling">
<Head>
<title>Audio To PDF</title>
<meta
identify="description"
content material="An app that converts audio to pdf within the browser"
/>
<hyperlink rel="icon" href="https://css-tricks.com/favicon.ico" />
</Head>
<h1>Convert your speech to pdf</h1>
<predominant>
<SpeechToText />
</predominant>
</div>
);
}
The imported SpeechToText
element will ultimately be exported from parts/SpeechToText.jsx
.
Let’s set up the opposite dependencies
Alright, we have now the preliminary setup for our app out of the best way. Now we will set up the libraries that deal with the info that’s handed round.
We will set up our shopper dependencies with:
npm set up react-speech-recognition regenerator-runtime axios
Our Categorical server dependencies are up subsequent, so let’s cd
into the server mission folder and set up these:
npm set up html-pdf-node cors
Most likely a superb time to pause and ensure the information in our mission folders are in tact. Right here’s what it is best to have within the shopper mission folder at this level:
/audio-to-pdf-web-client
├─ /parts
| └── SpeechToText.jsx
├─ /pages
| ├─ _app.js
| └── index.js
└── /kinds
├─globals.css
└── House.module.css
And right here’s what it is best to have within the server mission folder:
/audio-to-pdf-server
└── index.js
Constructing the UI
Effectively, our speech-to-PDF wouldn’t be all that nice if there’s no technique to work together with it, so let’s make a React element for it that we will name <SpeechToText>
.
You may completely use your personal markup. Right here’s what I’ve acquired to present you an concept of the items we’re placing collectively:
import React from "react";
const SpeechToText = () => {
return (
<>
<part>
<div className="button-container">
<button sort="button" type={{ "--bgColor": "blue" }}>
Begin
</button>
<button sort="button" type={{ "--bgColor": "orange" }}>
Cease
</button>
</div>
<div
className="phrases"
contentEditable
suppressContentEditableWarning={true}
></div>
<div className="button-container">
<button sort="button" type={{ "--bgColor": "purple" }}>
Reset
</button>
<button sort="button" type={{ "--bgColor": "inexperienced" }}>
Convert to pdf
</button>
</div>
</part>
</>
);
};
export default SpeechToText;
This element returns a React fragment that comprises an HTML <``part``>
aspect that comprises three divs:
.button-container
comprises two buttons that might be used to begin and cease speech recognition..phrases
hascontentEditable
andsuppressContentEditableWarning
attributes to make this aspect editable and suppress any warnings from React.- One other
.button-container
holds two extra buttons that might be used to reset and convert speech to PDF, respectively.
Styling is one other factor altogether. I gained’t go into it right here, however you’re welcome to make use of some kinds I wrote both as a place to begin on your personal kinds/international.css
file.
View Full CSS
html,
physique {
padding: 0;
margin: 0;
font-family: -apple-system, BlinkMacSystemFont, Segoe UI, Roboto, Oxygen,
Ubuntu, Cantarell, Fira Sans, Droid Sans, Helvetica Neue, sans-serif;
}
a {
shade: inherit;
text-decoration: none;
}
* {
box-sizing: border-box;
}
.dwelling {
background-color: #333;
min-height: 100%;
padding: 0 1rem;
padding-bottom: 3rem;
}
h1 {
width: 100%;
max-width: 400px;
margin: auto;
padding: 2rem 0;
text-align: middle;
text-transform: capitalize;
shade: white;
font-size: 1rem;
}
.button-container {
text-align: middle;
show: flex;
justify-content: middle;
hole: 3rem;
}
button {
shade: white;
background-color: var(--bgColor);
font-size: 1.2rem;
padding: 0.5rem 1.5rem;
border: none;
border-radius: 20px;
cursor: pointer;
}
button:hover {
opacity: 0.9;
}
button:energetic {
remodel: scale(0.99);
}
.phrases {
max-width: 700px;
margin: 50px auto;
peak: 50vh;
border-radius: 5px;
padding: 1rem 2rem 1rem 5rem;
background-image: -webkit-gradient(
linear,
0 0,
0 100%,
from(#d9eaf3),
color-stop(4%, #fff)
) 0 4px;
background-size: 100% 3rem;
background-attachment: scroll;
place: relative;
line-height: 3rem;
overflow-y: auto;
}
.success,
.error {
background-color: #fff;
margin: 1rem auto;
padding: 0.5rem 1rem;
border-radius: 5px;
width: max-content;
text-align: middle;
show: block;
}
.success {
shade: inexperienced;
}
.error {
shade: purple;
}
The CSS variables in there are getting used to manage the background shade of the buttons.
Let’s see the newest adjustments! Run npm run dev
within the terminal and test them out.
You need to see this in browser whenever you go to http://localhost:3000
:
Our first speech to textual content conversion!
The primary motion to take is to import the mandatory dependencies into our <SpeechToText>
element:
import React, { useRef, useState } from "react";
import SpeechRecognition, {
useSpeechRecognition,
} from "react-speech-recognition";
import axios from "axios";
Then we test if speech recognition is supported by the browser and render a discover if not supported:
const speechRecognitionSupported =
SpeechRecognition.browserSupportsSpeechRecognition();
if (!speechRecognitionSupported) {
return <div>Your browser doesn't help speech recognition.</div>;
}
Subsequent up, let’s extract transcript
and resetTranscript
from the useSpeechRecognition()
hook:
const { transcript, resetTranscript } = useSpeechRecognition();
That is what we want for the state that handles listening
:
const [listening, setListening] = useState(false);
We additionally want a ref
for the div
with the contentEditable
attribute, then we have to add the ref
attribute to it and move transcript
as youngsters
:
const textBodyRef = useRef(null);
…and:
<div
className="phrases"
contentEditable
ref={textBodyRef}
suppressContentEditableWarning={true}
>
{transcript}
</div>
The very last thing we want here’s a operate that triggers speech recognition and to tie that operate to the onClick
occasion listener of our button. The button units listening to true
and makes it run constantly. We’ll disable the button whereas it’s in that state to stop us from firing off further occasions.
const startListening = () => {
setListening(true);
SpeechRecognition.startListening({
steady: true,
});
};
…and:
<button
sort="button"
onClick={startListening}
type={{ "--bgColor": "blue" }}
disabled={listening}
>
Begin
</button>
Clicking on the button ought to now begin up the transcription.
Extra capabilities
OK, so we have now a element that may begin listening. However now we want it to do a couple of different issues as nicely, like stopListening
, resetText
and handleConversion
. Let’s make these capabilities.
const stopListening = () => {
setListening(false);
SpeechRecognition.stopListening();
};
const resetText = () => {
stopListening();
resetTranscript();
textBodyRef.present.innerText = "";
};
const handleConversion = async () => {}
Every of the capabilities might be added to an onClick
occasion listener on the suitable buttons:
<button
sort="button"
onClick={stopListening}
type={{ "--bgColor": "orange" }}
disabled={listening === false}
>
Cease
</button>
<div className="button-container">
<button
sort="button"
onClick={resetText}
type={{ "--bgColor": "purple" }}
>
Reset
</button>
<button
sort="button"
type={{ "--bgColor": "inexperienced" }}
onClick={handleConversion}
>
Convert to pdf
</button>
</div>
The handleConversion
operate is asynchronous as a result of we are going to ultimately be making an API request. The “Cease” button has the disabled attribute that might be be triggered when listening is fake.
If we restart the server and refresh the browser, we will now begin, cease, and reset our speech transcription within the browser.
Now what we want is for the app to transcribe that acknowledged speech by changing it to a PDF file. For that, we want the server-side path from Categorical.js.
Establishing the API route
The aim of this route is to take a textual content file, convert it to a PDF, write that PDF to our filesystem, then ship a response to the shopper.
To setup, we’d open the server/index.js
file and import the html-pdf-node
and fs
dependencies that might be used to put in writing and open our filesystem.
const HTMLToPDF = require("html-pdf-node");
const fs = require("fs");
const cors = require("cors)
Subsequent, we are going to setup our route:
app.use(cors())
app.use(specific.json())
app.submit("https://css-tricks.com/", (req, res) => {
// and so on.
})
We then proceed to outline our choices required in an effort to use html-pdf-node
contained in the route:
let choices = { format: "A4" };
let file = {
content material: `<html><physique><pre type="font-size: 1.2rem">${req.physique.textual content}</pre></physique></html>`,
};
The choices
object accepts a price to set the paper dimension and elegance. Paper sizes comply with a a lot totally different system than the sizing items we usually use on the internet. For instance, A4 is the standard letter dimension.
The file
object accepts both the URL of a public web site or HTML markup. As a way to generate our HTML web page, we are going to use the html
, physique
, pre
HTML tags and the textual content from the req.physique
.
You may apply any styling of your selection.
Subsequent, we are going to add a trycatch
to deal with any errors which may pop up alongside the best way:
attempt {
} catch(error){
console.log(error);
res.standing(500).ship(error);
}
Subsequent, we are going to use the generatePdf
from the html-pdf-node
library to generate a pdfBuffer
(the uncooked PDF file) from our file and create a singular pdfName
:
HTMLToPDF.generatePdf(file, choices).then((pdfBuffer) => {
// console.log("PDF Buffer:-", pdfBuffer);
const pdfName = "./information/speech" + Date.now() + ".pdf";
// Subsequent code right here
}
From there, we use the filesystem module to put in writing, learn and (sure, lastly!) ship a response to the shopper app:
fs.writeFile(pdfName, pdfBuffer, operate (writeError) {
if (writeError) {
return res
.standing(500)
.json({ message: "Unable to put in writing file. Attempt once more." });
}
fs.readFile(pdfName, operate (readError, readData) {
if (!readError && readData) {
// console.log({ readData });
res.setHeader("Content material-Kind", "utility/pdf");
res.setHeader("Content material-Disposition", "attachment");
res.ship(readData);
return;
}
return res
.standing(500)
.json({ message: "Unable to put in writing file. Attempt once more." });
});
});
Let’s break that down a bit:
- The
writeFile
filesystem module accepts a file identify, information and a callback operate that may returns an error message if there’s a difficulty writing to the file. When you’re working with a CDN that gives error endpoints, you would use these as a substitute. - The
readFile
filesystem module accepts a file identify and a callback operate that’s succesful or returning a learn error in addition to the learn information. As soon as we have now no learn error and the learn information is current, we are going to assemble and ship a response to the shopper. Once more, this may be changed along with your CDN’s endpoints in case you have them. - The
res.setHeader("Content material-Kind", "utility/pdf");
tells the browser that we’re sending a PDF file. - The
res.setHeader("Content material-Disposition", "attachment");
tells the browser to make the obtained information downloadable.
For the reason that API route prepared, we will use it in our app at http://localhost:4000
. We will the proceed to the shopper a part of our utility to finish the handleConversion
operate.
Dealing with the conversion
Earlier than we will begin engaged on a handleConversion
operate, we have to create a state that handles our API requests for loading, error, success, and different messages. We’re going use React’s useState
hook to set that up:
const [response, setResponse] = useState({
loading: false,
message: "",
error: false,
success: false,
});
Within the handleConversion
operate, we are going to test for when the online web page has been loaded earlier than working our code and ensure the div
with the editable
attribute is just not empty:
if (typeof window !== "undefined") {
const userText = textBodyRef.present.innerText;
// console.log(textBodyRef.present.innerText);
if (!userText) {
alert("Please converse or write some textual content.");
return;
}
}
We proceed by wrapping our eventual API request in a trycatch
, dealing with any error which will come up, and updating the response state:
attempt {
} catch(error){
setResponse({
...response,
loading: false,
error: true,
message:
"An surprising error occurred. Textual content not transformed. Please attempt once more",
success: false,
});
}
Subsequent, we set some values for the response state and likewise set config for axios
and make a submit request to the server:
setResponse({
...response,
loading: true,
message: "",
error: false,
success: false,
});
const config = {
headers: {
"Content material-Kind": "utility/json",
},
responseType: "blob",
};
const res = await axios.submit(
"http://localhost:4000",
{
textual content: textBodyRef.present.innerText,
},
config
);
As soon as we have now gotten a profitable response, we set the response state with the suitable values and instruct the browser to obtain the obtained PDF:
setResponse({
...response,
loading: false,
error: false,
message:
"Conversion was profitable. Your obtain will begin quickly...",
success: true,
});
// convert the obtained information to a file
const url = window.URL.createObjectURL(new Blob([res.data]));
// create an anchor aspect
const hyperlink = doc.createElement("a");
// set the href of the created anchor aspect
hyperlink.href = url;
// add the obtain attribute, give the downloaded file a reputation
hyperlink.setAttribute("obtain", "yourfile.pdf");
// add the created anchor tag to the DOM
doc.physique.appendChild(hyperlink);
// power a click on on the hyperlink to begin a simulated obtain
hyperlink.click on();
And we will use the next under the contentEditable div
for displaying messages:
<div>
{response.success && <i className="success">{response.message}</i>}
{response.error && <i className="error">{response.message}</i>}
</div>
Ultimate code
I’ve packaged every thing up on GitHub so you’ll be able to try the total supply code for each the server and the shopper.