Surroundings Setup
A step-by-step tutorial on easy methods to make Spark NLP work in your native pc
Apache Spark is an open-source framework for quick and general-purpose information processing. It supplies a unified engine that may run complicated analytics, together with Machine Studying, in a quick and distributed approach.
Spark NLP is an Apache Spark module that gives superior Pure Language Processing (NLP) capabilities to Spark functions. It may be used to construct complicated textual content processing pipelines, together with tokenization, sentence splitting, a part of speech tagging, parsing, and named entity recognition.
Though the documentation, which describes easy methods to set up Spark NLP is kind of clear, typically you may get caught, whereas putting in it. Because of this, on this article, I attempt to describe a step-by-step process to make Spark NLP work in your pc.
To put in Spark NLP, you must set up the next instruments:
- Python
- Java
- Scala
- Apache Spark
- PySpark
- Spark NLP.
You may have already put in Python following the process described within the technical necessities part. So, we will begin putting in the software program from the second step, Java.
Spark NLP is constructed on high of Apache Spark, which might be put in on any OS that helps Java 8. Examine if in case you have Java 8 by working the next command in Terminal:
java –model
If Java is already put in, you must see the next output:
openjdk model “1.8.0_322”
OpenJDK Runtime Surroundings (construct 1.8.0_322-bre_2022_02_28_15_01-b00)OpenJDK 64-Bit Server VM (construct 25.322-b00, blended mode)
If Java 8 is just not put in, you may obtain Java 8 from this hyperlink and observe the wizard.
In Ubuntu, you may set up openjdk-8
by the bundle supervisor:
sudo apt-get set up openjdk-8-jre
In Mac OS, you may set up openjdk-8
by brew
:
brew set up openjdk@8
You probably have one other model of Java put in, you may obtain Java 8, as beforehand described, after which set the JAVA_HOME
setting variable to the trail to the Java 8 listing.
Apache Spark requires scala 2.12 or 2.13 to work correctly. You’ll be able to set up scala 2.12.15 following the process described right here.
As soon as put in, you may confirm that scala works correctly, by working the next command:
scala -version
You’ll be able to obtain Apache Spark from its official Website, obtainable right here. There are a lot of variations of Apache Spark. Personally, I’ve put in model 3.1.2, which is out there right here.
You obtain the bundle, after which, you may extract it, and place it wherever you need in your filesystem. Then, you should add the trail to the bin listing contained in your spark listing to the PATH
setting variable. In Unix, you may export the PATH
variable:
export PATH=$PATH:/path/to/spark/bin
Then, you export the SPARK_HOME
setting variable with the trail to your spark listing. In Unix, you may export the SPARK_HOME
variable as follows:
export SPARK_HOME=”/path/to/spark”
To test if Apache Spark is put in correctly, you may run the next command:
spark-shell
A shell ought to open:
Welcome to____ __/ __/__ ___ _____/ /___ / _ / _ `/ __/ ‘_//___/ .__/_,_/_/ /_/_ model 3.1.2/_/Utilizing Scala model 2.12.15 (OpenJDK 64-Bit Server VM, Java 1.8.0_322)Kind in expressions to have them evaluated.Kind :assist for extra info.scala>
To exit the shell, you should use Ctrl+C.
PySpark and Spark NLP are two Python libraries, you could set up by pip:
pip set up pyspark
pip set up spark-nlp
Now Spark NLP needs to be prepared in your pc!
Congratulations! You may have simply put in Spark NLP in your pc! You may have put in Java, Scala, Apache Spark, Spark NLP and PySpark!
Now it’s time to play with Spark NLP. There are a lot of tutorials obtainable on the Net. I counsel you to start out from the next notebooks:
You may also test this tutorial that explains easy methods to combine Spark NLP with Comet, a platform used to watch Machine Learing experiments
You probably have learn this far, for me it’s already loads for right now. Thanks! You’ll be able to learn my trending articles at this hyperlink.