Introduction
Whereas studying Python or studying another person’s code, you could have encountered the ‘u’ and ‘r’ prefixes and uncooked string literals. However what do these phrases imply? How do they have an effect on our Python code? On this article, we’ll attemp to demystify these ideas and perceive their utilization in Python.
String Literals in Python
A string literal in Python is a sequence of characters enclosed in quotes. We are able to use both single quotes (‘ ‘) or double quotes (” “) to outline a string.
my_string = 'Hey, StackAbuse readers!'
print(my_string)
my_string = "Hey, StackAbuse readers!"
print(my_string)
Operating this code will provide you with the next:
$ python string_example.py
Hey, StackAbuse readers!
Hey, StackAbuse readers!
Fairly easy, proper? In my view, the factor that confuses most individuals is the “literal” half. We’re used to calling them simply “strings”, so while you hear it being known as a “string literal”, it appears like one thing extra difficult.
Python additionally gives different methods to outline strings. We are able to prefix our string literals with sure characters to vary their habits. That is the place ‘u’ and ‘r’ prefixes are available, which we’ll discuss later.
Python additionally helps triple quotes (”’ ”’ or “”” “””) to outline strings. These are particularly helpful once we wish to outline a string that spans a number of traces.
This is an instance of a multi-line string:
my_string = """
Hey,
StackAbuse readers!
"""
print(my_string)
Operating this code will output the next:
$ python multiline_string_example.py
Hey,
StackAbuse readers!
Discover the newlines within the output? That is due to triple quotes!
What are ‘u’ and ‘r’ String Prefixes?
In Python, string literals can have non-compulsory prefixes that present further details about the string. These prefixes are ‘u’ and ‘r’, and so they’re used earlier than the string literal to specify its sort. The ‘u’ prefix stands for Unicode, and the ‘r’ prefix stands for uncooked.
Now, chances are you’ll be questioning what Unicode and uncooked strings are. Nicely, let’s break them down one after the other, beginning with the ‘u’ prefix.
The ‘u’ String Prefix
The ‘u’ prefix in Python stands for Unicode. It is used to outline a Unicode string. However what’s a Unicode string?
Unicode is a global encoding commonplace that gives a novel quantity for each character, regardless of the platform, program, or language. This makes it potential to make use of and show textual content from a number of languages and image units in your Python applications.
In Python 3.x, all strings are Unicode by default. Nonetheless, in Python 2.x, it is advisable use the ‘u’ prefix to outline a Unicode string.
For example, if you wish to create a string with Chinese language characters in Python 2.x, you would want to make use of the ‘u’ prefix like so:
chinese_string = u'ä½ å¥½'
print(chinese_string)
Once you run this code, you will get the output:
$ ä½ å¥½
Which is “Hey” in Chinese language.
Word: In Python 3.x, you may nonetheless use the ‘u’ prefix, however it’s not crucial as a result of all strings are Unicode by default.
So, that is the ‘u’ prefix. It helps you’re employed with worldwide textual content in your Python applications, particularly when you’re utilizing Python 2.x. However what concerning the ‘r’ prefix? We’ll dive into that within the subsequent part.
The ‘r’ String Prefix
The ‘r’ prefix in Python denotes a uncooked string literal. Once you prefix a string with ‘r’, it tells Python to interpret the string precisely as it’s and to not interpret any backslashes or particular metacharacters that the string might need.
Take into account this code:
normal_string = "tTab character"
print(normal_string)
Output:
Tab character
Take a look at our hands-on, sensible information to studying Git, with best-practices, industry-accepted requirements, and included cheat sheet. Cease Googling Git instructions and truly be taught it!
Right here, t
is interpreted as a tab character. But when we prefix this string with ‘r’:
raw_string = r"tTab character"
print(raw_string)
Output:
tTab character
You’ll be able to see that the ‘t’ is now not interpreted as a tab character. It is handled as two separate characters: a backslash and ‘t’.
That is significantly helpful when coping with common expressions, or when it is advisable embrace plenty of backslashes in your string.
Working with ‘u’ and ‘r’ Prefixes in Python 2.x
Now, let’s discuss Python 2.x. In Python 2.x, the ‘u’ prefix was used to indicate a Unicode string, whereas the ‘r’ prefix was used to indicate a uncooked string, similar to in Python 3.x.
Nonetheless, the distinction lies within the default string sort. In Python 3.x, all strings are Unicode by default. However in Python 2.x, strings have been ASCII by default. So, when you wanted to work with Unicode strings in Python 2.x, you needed to prefix them with ‘u’.
unicode_string = u"Hey, world!"
print(unicode_string)
Output:
Hey, world!
However what when you wanted a string to be each Unicode and uncooked in Python 2.x? You could possibly use each ‘u’ and ‘r’ prefixes collectively, like this:
unicode_raw_string = ur"tHello, world!"
print(unicode_raw_string)
Output:
tHello, world!
Word: The ‘ur’ syntax is not supported in Python 3.x. In case you want a string to be each uncooked and Unicode in Python 3.x, you should use the ‘r’ prefix alone, as a result of all strings are Unicode by default.
The important thing level right here is that the ‘u’ prefix was extra essential in Python 2.x because of the ASCII default. In Python 3.x, all strings are Unicode by default, so the ‘u’ prefix will not be as important. Nonetheless, the ‘r’ prefix continues to be very helpful for working with uncooked strings in each variations.
Utilizing Uncooked String Literals
Now that we perceive what uncooked string literals are, let us take a look at extra examples of how we will use them in our Python code.
One of the crucial frequent makes use of for uncooked string literals is in common expressions. Common expressions usually embrace backslashes, which might result in points if not dealt with appropriately. Through the use of a uncooked string literal, we will extra simply keep away from these issues.
One other frequent use case for uncooked string literals is when working with Home windows file paths. As chances are you’ll know, Home windows makes use of backslashes in its file paths, which might trigger points in Python because of the backslash’s function as an escape character. Through the use of a uncooked string literal, we will keep away from these points fully.
This is an instance:
path = "C:pathtofile"
print(path)
path = r"C:pathtofile"
print(path)
As you may see, the uncooked string literal permits us to appropriately signify the file path, whereas the usual string doesn’t.
Widespread Errors and The right way to Keep away from Them
When working with ‘u’ and ‘r’ string prefixes and uncooked string literals in Python, there are a selection of frequent errors that builders usually make. Let’s undergo a few of them and see how one can keep away from them.
First, one frequent mistake is utilizing the ‘u’ prefix in Python 3.x. Keep in mind, the ‘u’ prefix will not be wanted in Python 3.x as strings are Unicode by default on this model. Utilizing it will not trigger an error, however it’s redundant and will doubtlessly confuse different builders studying your code.
u_string = u'Hey, World!'
Second, forgetting to make use of the ‘r’ prefix when working with common expressions can result in sudden outcomes resulting from escape sequences. All the time use the ‘r’ prefix when coping with common expressions in Python.
regex = 'bwordb'
regex = r'bwordb'
Final, not understanding that uncooked string literals don’t deal with the backslash as a particular character can result in errors. For example, when you’re attempting to incorporate a literal backslash on the finish of a uncooked string, you may run into points as Python nonetheless interprets a single backslash on the finish of the string as escaping the closing quote. To incorporate a backslash on the finish, it is advisable escape it with one other backslash, even in a uncooked string.
raw_string = r'C:path'
# That is the right method
raw_string = r'C:path'
Conclusion
On this article, we have explored the ‘u’ and ‘r’ string prefixes in Python, in addition to uncooked string literals. We have realized that the ‘u’ prefix is used to indicate Unicode strings, whereas the ‘r’ prefix is used for uncooked strings, which deal with backslashes as literal characters relatively than escape characters. We additionally delved into frequent errors when utilizing these prefixes and uncooked string literals, and the best way to keep away from them.