A Practical Guide to Regular Expressions – Learn RegEx with Real Life Examples

Tasnim Ferdous

What are Regular Expressions?

Regular expressions, also known as regex, work by defining patterns that you can use to search for certain characters or words inside strings.

Once you define the pattern you want to use, you can make edits, delete certain characters or words, substitute one thing for another, extract relevant information from a file or any string that contains that particular pattern, and so on.

Why Should You Learn Regex?

Regex let you to do text processing in a way that can save you a lot of time. It can also introduce some fun in the process.

Using regex can make locating information much easier. Once you find your target, you can batch edit/replate/delete or whatever processing you need to do.

Some practical examples of using regex are batch file renaming, parsing logs, validating forms, making mass edits in a codebase, and recursive search.

In this tutorial, we're going to cover regex basics with the help of this site . Later on, I will introduce some regex challenges that you'll solve using Python. I'll also show you how to use tools like sed and grep with regex.

Like many things in life, regular expressions are one of those things that you can only truly understand by doing. I encourage you to play around with regex as you are going through this article.

Table of Contents

Exact match, match ranges in regex, match any character not in the set, character classes, quantifiers, how to use logical or in regex, how to reference capture groups, how to name capture groups, recursive regex search with grep, substitution with sed, lookbehinds, logs parsing, bulk file renaming, email validation, password constraints.

  • Final words

Regex Basics

A regular expression is nothing but a sequence of characters that match a pattern. Besides using literal characters (like 'abc'), there are some meta characters (*,+,? and so on) which have special purposes. There are also features like character classes which can help you simplify your regular expressions.

Before writing any regex, you'll need to learn about all the basic cases and edge cases for the pattern you are looking for.

For instance, if you want to match 'Hello World', do you want the line to start with 'Hello' or can it start with anything? Do you want exactly one space between 'Hello' and 'World' or there can be more? Can other characters come after 'World' or should the line end there? Do you care about case sensitivity? And so on.

These are the kind of questions you must have the answer to before you sit down to write your regex.

The most basic form of regex involves matching a sequence of characters in a similar way as you can do with Ctrl-F in a text editor.

exact_match

On the top you can see the number of matches, and on the bottom an explanation is provided for what the regex matches character by character.

Character set

Regex character sets allow you to match any one character from a group of characters. The group is surrounded by square brackets [].

For example, t[ah]i matches "tai" and "thi". Here 't' and 'i' are fixed but between them can occur 'a' or 'h'.

match_set

Sometimes you may want to match a group of characters which are sequential in nature, such as any uppercase English letter. But writing all 26 letters would be quite tedious.

Regex solves this issue with ranges. The "-" acts as a range operator. Some valid ranges are shown below:

You can also specify partial ranges, such as [b-e] to match any of the letters 'bcde' or [3-6] to match any of the numbers '3456'.

match_set_ranges

You are not limited to specifying only one range inside a character set. You can use multiple ranges and also combine them with any other additional character(s). Here, [3-6u-w;] will match any of '3456uvw' or semicolon ';'.

match_set_ranges_multi

If you prefix the set with a '^', the inverse operation will be performed. For example, [^A-Z0-9] will match anything except uppercase letters and digits.

match_set_not

While writing regex, you'll need to match certain groups such as digits quite often and multiple times in the same expression as well.

So for example, how would you match a pattern like 'letter-digit-letter-digit'?

With what you've learned up until now, you can come up with [a-zA-Z]-[0-9]-[a-zA-z]-[0-9] . This works, but you can see how the expression can get quite messy as the pattern length gets bigger.

To make the expression simpler, classes have been assigned to well-defined character groups such as digits. The following table shows these classes and their equivalent expression with character sets:

Character classes are quite handy and make your expressions much cleaner. We will use them extensively throughout this tutorial, so you can use this table as a reference point and come back here if you forget any of the classes.

Most of the time, we won't care about all the positions in a pattern. The "." class saves us from writing all possible characters in a set.

For example, t.. matches anything that starts with t and any two characters afterwards. This may remind you of the SQL LIKE operator which would use t%% to accomplish the same thing.

match_any

The word "pattern" and "repetition" go hand in hand. If you want to match a 3 digit number you can use \d\d\d . But what if you need to match 11 digits? You could write '\d' 11 times, but a general rule of thumb while writing regex or just doing any kind of programming is that if you find yourself repeating something more than twice, you are probably unaware of some feature.

In regex, you can use quantifiers for this purpose. To match 11 digits, you can simply write the expression \d{11} .

The table below lists the quantifiers you can use in regex:

In this example, the expression can\s+write matches can followed by 1 or more whitespaces followed by write . But you can see 'canwrite' is not matched as \s+ means at least one whitespace needs to be matched. This is useful when you are searching through text which is not trimmed.

match_multi_whitespaces

Can you guess what can\s?write will match?

Capture groups

Capture groups are sub-expressions enclosed in parentheses (). You can have any number of capture groups, and even nested capture groups.

The expression (The ){2} matches 'The ' twice. But without a capture group, the expression The {2} would match 'The' followed by 2 spaces, as the quantifier will be applied on the space character and not on 'The ' as a group.

capture_this

You can match any pattern inside capture groups as you would with any valid regex. Here (is\s+){2} matches if it finds 'is' followed by 1 or more spaces twice.

capture_is

You can use "|" to match multiple patterns. This is (good|bad|sweet) matches 'This is ' followed by any of 'good' or 'bad' or 'sweet'.

or

Again, you must understand the importance of capture groups here. Think about what the expression This is good|bad|sweet would match?

or_no_capture

With a capture group, good|bad|sweet is isolated from This is . But if it's not inside a capture group, the entire regex is only one group. So the expression This is good|bad|sweet will match if the string contains 'This is good' or 'bad' or 'sweet'.

Capture groups can be referenced in the same expression or while performing replacements as you can see on the Replacement tab.

Most tools and languages allow you to reference the nth captured group with '\n'. In this site '$n' is used while referencing on replacement. The syntax for replacement will vary depending on the tools or language you're using. For JavaScript, for example, its '$n', while for Python its '\n'.

In the expression (This) is \1 power , 'This' is captured and then referenced with '\1', effectively matching This is This power .

refer_capture

You can name your capture groups with the syntax (?<name>pattern) and backreference them in the same expression with \k<name> .

On replacement, referencing is done by $<name> . This is the syntax for JavaScript and can vary among languages. You can learn about the differences here . Also note that this feature might not be available in some languages.

In the expression (?<lang>[\w+]+) is the best but \k<lang> .* , the pattern [\w+]+ is captured with the name 'lang' and backreferenced with \k<lang> . This pattern will match any word character or '+' character 1 or more times. The .* at the end of the regex matches any character 0 or more times. And finally on replacement, the referencing is done by $<lang> .

named_capture

How to Use Regex with Command Line Tools

There are good CLI tools available that let you perform regex from your terminal. These tools save you even more time as you can easily test different regex without writing code in some langauge and then compiling or interpreting it.

Some of the well-known tools are grep, sed, and awk. Let's look at a few examples to give you some ideas on how you can leverage these tools.

You can execute the power of regex through grep. Grep can search patterns in a file or perform recursive search.

If you are on Windows, you can install grep using winget. Run this command in powershell:

I will show you the solution to a challenge I created for a CTF competition at my university.

The file attached to the challenge is a zip file that contains multiple levels of directories and a lot of files in it. The name of the competition was Coderush with flag format coderush{flag is here} . So you have to search for the pattern coderush{.*} which will match the flag format coderush{any character here} .

Unzip the file with unzip ripG.zip and cd into it with cd ripG .

huge_files

There are 358 directories and 8731 files. Instead of searching the pattern in the files one by one, you can employ grep like this:

The "-R" flag enables recursive search.

recursive search with grep

You can learn more about grep and its command line options here

You can use sed to perform insertion, deletion, substitution on text files by specifying a regex. If you are on windows, you can get sed from here . Or if you use WSL, tools like grep and sed will already be available.

This is the most common usage of sed:

Here, the option "g" is specified to replace all occurrences.

Some other useful options are -n to suppress the default behaviour of printing all lines and using p instead of g to print only the lines which are affected by the regex.

Let's take a look at the content of texts.txt .

Our task is replacing Henlo number with Hello number only in the lines where "GREP" is present. So, we are searching for the pattern Henlo ([0-9]+) which will match 'Henlo ' followed by 1 or more digits and all the digits are captured. Then our replacement string will be Hello \1 – the '\1' is referencing the capture group containing the digits.

One way to accomplish that would be using grep to grep the lines which have "GREP" present then perform the replacement with sed.

The "-E" option enables extended regex without which you would need to escape the parentheses.

grep_sed

Or you could just use sed. Use /pattern/ to restrict substitution on only the lines where pattern is present.

Advanced Regex: Lookarounds

Lookaheads and Lookbehinds (together known as lookarounds) are features of regex that allow you to check the existence of a pattern without including it in the match.

You can think of them as zero width assertions – they assert the existence of a pattern but do not consume any characters in the match. These are very powerful features, but they're also computationally expensive. So make sure you keep an eye on performance if you are using them often.

Let's say you want to match the word 'linux', but you have 2 conditions.

  • The word 'GNU' must occur before 'linux' occurs. If a line contains 'linux' but doesn't have 'GNU' before it, we want to discard that line.
  • We want to match only linux and nothing else.

We already know how to satisfy the 1st condition. GNU.* will match 'GNU' followed by any number of characters. Then finally we match the word linux . This will match all of GNU-any-characters-linux .

GNU_LINUX

But how do we prevent matching GNU.* while still maintaining the 1st condition?

That's where a positive lookbehind comes in. You can mark a capture group as a positive lookbehind by prefixing it with ?<= . In this example, the expression becomes (?<=GNU.*)linux .

positive_lookbehind

Now only linux is matched and nothing else.

Note that the expressions (?<=GNU.*)linux and linux(?<=GNU.*) will behave exactly the same. In the 2nd expression, although linux is before the lookbehind, there is .* after 'GNU' which matches linux . This means it satisfies the lookbehind.

To make it simpler, think about the pattern without the lookbehind. The pattern GNU.* will match 'GNU' and anything after it, in our case matching linux .

Now we can derive a generalized statement that the expression (?<=C)X will match the pattern X – only if pattern C came before X (and C must not be included in the match).

You can also reverse the 1st condition. Match lines that contains the word linux only if GNU never came before it. This is called a negative lookbehind. The prefix in this case is ?<! . The inverse of the previous expression would be (?<!GNU.*)linux .

negative_lookbehind

Lookaheads are also assertions like lookbehinds, as you saw in the previous example. The only difference is that lookbehinds make an assertion before and lookaheads makes assertion after.

Let's say you have these two conditions:

  • Match Hello only if World comes somewhere after it.
  • Match only Hello and nothing else.

The prefix for a positive lookahead is ?= . The expression Hello(?=.*World) will meet both conditions. This is similar to Hello.*World except that only Hello will be matched whereas Hello.*World will match 'Hello', 'World' and anything in between.

postive_lookahead

Similar to the example in a positive lookbehind, the expressions Hello(?=.*World) and (?=.*World)Hello are equivalent. Because the .* before 'World' matches Hello , satisfying the 1st condition.

A negative lookahead is just the complement of a negative lookbehind. You can use it by prefixing it with ?! . (?!World)Hello will match Hello only if there is no World anywhere after it.

negative_lookahead

Here is a summary of the syntax for lookarounds when you want to match the pattern X with assertion C.

Practical Examples of Regex

In this log file , these are the lines which we care about:

Our task is to extract the training loss and validation loss for purposes such as plotting loss over the epochs. We need to extract the training loss values like 11.30368, 0.96180, 0.04051 and put them in an array.

All the relevant values are prefixed with ' Train loss: ', so we can use this in our regex as it is. To match the float numbers we have to match some digits followed by a " . " and then followed by more digits. You can do this with \d+\.\d+ . Because we want to keep track of these numbers, they should be inside a capture group.

As "." has special purpose in regex, when you want to match a "." character you have to escape it with a backslash. This is applicable for all characters with a special purpose. But you dont have to escape it inside a character set.

Putting it altogether, the expression for extracting training loss is Train loss: (\d+\.\d+) . We can use the same logic to extract validation loss with Valid loss: (\d+\.\d+) .

Here is one way to extract this information using Python:

When there is one capture group, re.findall searches all the lines and returns the values inside the capture group in a list.

Any regex function only return strings, so the values are converted to floats and printed out. Then you can directly use them in another Python script as a list of floats.

This is the result:

extract_loss

You could also use sed, save the output in train_losses.txt, and read from the file. First we use '/Train/' to target only the lines with 'Train' present then we are applying the same regex as before.

".*" is added at the start and end so that sed matches the contents of all the relevant lines. Then the entire line is replaced by the value of the capture group. The tee command is used to redirect the output of sed into train_losses.txt while also printing the contents in the terminal.

extract_loss_sed

Take a moment to think about what would you need to extract the epochs. You have to extract 500 from [500/10000] for all such lines. The array should look like [1, 500, 1000, 1500, ...]. You can follow the same approach as we used for the previous example.

Note that if you want to match " [ " or " ] ", you have to escape it. The answer is given here .

You have these files with some random values as prefixes. You have to rename all files as 1.mp4, 2.mp4 and so on. This is how the files were generated.

create_files

This is a common scenario where you have a list of files which have their sequence number in the name but there are also some other characters that you don't want.

The pattern has to match anything up to Episode then an underscore and then the number and .mp4 at the end.

The relevant value is the number before '.mp4' which we will put inside a capture group. .*Episode_ will match everything up to the number. Then we can capture the number with ([0-9]+) and also match .mp4 with \.mp4 .

So the final regex is .*Episode_([0-9]+)\.mp4 . As we want to keep the .mp4 the replacement string will be \1.mp4 .

This is one way to solve it using sed.

First the new name is saved in a variable and then the mv command is used to rename the file.

bulk_rename

Could we have just used .* in place of .*Episode_ ? In this example, yes. But there might be filenames such as Steins_Gate0.mp4 where the 0 is part of the movie name and you didn't really want to rename this file so its always better to be as specific as possible.

What if some files were named as "Random_Episode6.mp4"? The difference being, there is no underscore after Episode. What change will you need to make?

The answer is that you'll need to add a "?" after the "_" to make it optional. The regex will be .*Episode_?([0-9]+)\.mp4 .

There are all sorts of complicated regex for validating email.

Here is a simple one: ^[^@ ]+@[^@.]+\.\w+$ . It matches the format [email protected]

The table below breaks down this pattern into smaller pieces:

email_validation

In the regexr site, you can enable the multline flag from the Flags tab in the upper right corner. The 'gm' at the end indicates that the multiline flag is enabled.

We can see that line 2,3,5,6 didn't match. Can you find out the reason and which part of the regex is responsible for disqualifying it?

The answer is given here

You can also use regex to impose constraints. Here we will uncover the power of positive lookaheads.

Lets say we want to accept a string only if there is a digit in it. You already know how to find a digit with the '\d' class. To accomplish that, we can use [^\d]*\d . This will match any non-digit character 0 or more times and then match a digit.

We can also use the expression .*\d to match one digit. So if there is no digit in the string then the lookahead will fail and the none of the characters of that string will be matched, returning an empty string "".

When we are using a programming language, we can check if the regex returned an empty string and determine that the constraints are not satisfied.

We will create a regex which imposes the following criteria:

  • Minimum 8 characters and maximum 16 characters.
  • At least one lower case letter.
  • At least one upper case letter.
  • At least one number.

To achieve this, you can use positive lookaheads. This is the regex:

^(?=.*[a-z])(?=.*[A-Z])(?=.*\d).{8,16}$

The table below explains which part of the regex imposes which constraint:

pass_constraints

What modification you would need for imposing at least 5 upper case letters?

You may think (?=.*[A-Z]{5,}) will do the job. But this expression requires all the 5 letters to be together. A string like rand-ABCDE-rand will be matched but 0AxBCDxE0 will not be matched even though it has 5 upper case letters (as they are not adjacent).

Yet again, we have capture groups coming to the rescue. We want to match 5 uppercase letters anywhere in the string. We already know that we can match 1 uppercase letter with .*[A-Z] . Now we will put them inside a capture group and attach a quantifier of minimum 5. The expression will be (.*[A-Z]){5,} .

Here is the final answer:

In place of (?=.*[A-Z]) you will need (?=(.*[A-Z]){5,}) . The expression becomes ^(?=.*[a-z])(?=(.*[A-Z]){5,})(?=.*\d).{8,16}$ .

pass_5_upper

You could also require that the password not contain certain words to enforce stronger passwords.

For example, we want to reject the password if contains pass or 1234 . Negative lookaheads is the tool for this job. The regex would be ^(?!.*(pass|1234)).*$ .

restrict_words-1

In this regex, we put pass and 1234 inside a capture group and used the logical OR operator. This capture group is nested inside another capture group which is prefixed with ?!.* . This makes it a negative lookahead that matches if there are at least 8 characters by .{8,} with the condition that, pass or 1234 can't be present anywhere in the string.

Final Words

I hope you got a good amount of practice while going through this article. It's ok if you forget some syntax. What's important is understanding the core concepts and having a good idea of what's possible with regex. Then, if you forget a pattern, you can just google it or reference a cheatsheet.

The more you practice, the more you will get by without outside help. Eventually you will be able write super complex and effective regex completely offline.

There are already some good regex cheatsheets out there, so I wanted to create something more in-depth here that you can reference for the core concepts and common use cases.

If you're looking for a cheatsheet, the one from QuickRef is helpful. It's a good place to recall the syntax and they also provide some basic overview of regex related functions in various programming languages.

Most regex techniques are the same in all programming languages and tools – but certain tools might offer additional features. So do some research on the tool you are using to pick the best one for you.

My final suggestion would be not to force using regex just because you can. A lot of the times a regular string.find() is enough to get the job done. But if you live in the terminal, you really can do a lot just with regex for sure.

If you like this type of article, you may keep an eye on my blog or twitter.

Final year computer science student | CTF Player | Software Engineer

If you read this far, thank the author to show them you care. Say Thanks

Learn to code for free. freeCodeCamp's open source curriculum has helped more than 40,000 people get jobs as developers. Get started

  • Skip to main content
  • Select language
  • Skip to search
  • Assignment operators

An assignment operator assigns a value to its left operand based on the value of its right operand.

The basic assignment operator is equal ( = ), which assigns the value of its right operand to its left operand. That is, x = y assigns the value of y to x . The other assignment operators are usually shorthand for standard operations, as shown in the following definitions and examples.

Simple assignment operator which assigns a value to a variable. The assignment operation evaluates to the assigned value. Chaining the assignment operator is possible in order to assign a single value to multiple variables. See the example.

Addition assignment

The addition assignment operator adds the value of the right operand to a variable and assigns the result to the variable. The types of the two operands determine the behavior of the addition assignment operator. Addition or concatenation is possible. See the addition operator for more details.

Subtraction assignment

The subtraction assignment operator subtracts the value of the right operand from a variable and assigns the result to the variable. See the subtraction operator for more details.

Multiplication assignment

The multiplication assignment operator multiplies a variable by the value of the right operand and assigns the result to the variable. See the multiplication operator for more details.

Division assignment

The division assignment operator divides a variable by the value of the right operand and assigns the result to the variable. See the division operator for more details.

Remainder assignment

The remainder assignment operator divides a variable by the value of the right operand and assigns the remainder to the variable. See the remainder operator for more details.

Exponentiation assignment

This is an experimental technology, part of the ECMAScript 2016 (ES7) proposal. Because this technology's specification has not stabilized, check the compatibility table for usage in various browsers. Also note that the syntax and behavior of an experimental technology is subject to change in future version of browsers as the spec changes.

The exponentiation assignment operator evaluates to the result of raising first operand to the power second operand. See the exponentiation operator for more details.

Left shift assignment

The left shift assignment operator moves the specified amount of bits to the left and assigns the result to the variable. See the left shift operator for more details.

Right shift assignment

The right shift assignment operator moves the specified amount of bits to the right and assigns the result to the variable. See the right shift operator for more details.

Unsigned right shift assignment

The unsigned right shift assignment operator moves the specified amount of bits to the right and assigns the result to the variable. See the unsigned right shift operator for more details.

Bitwise AND assignment

The bitwise AND assignment operator uses the binary representation of both operands, does a bitwise AND operation on them and assigns the result to the variable. See the bitwise AND operator for more details.

Bitwise XOR assignment

The bitwise XOR assignment operator uses the binary representation of both operands, does a bitwise XOR operation on them and assigns the result to the variable. See the bitwise XOR operator for more details.

Bitwise OR assignment

The bitwise OR assignment operator uses the binary representation of both operands, does a bitwise OR operation on them and assigns the result to the variable. See the bitwise OR operator for more details.

Left operand with another assignment operator

In unusual situations, the assignment operator (e.g. x += y ) is not identical to the meaning expression (here x = x + y ). When the left operand of an assignment operator itself contains an assignment operator, the left operand is evaluated only once. For example:

Specifications

Browser compatibility.

  • Arithmetic operators

Document Tags and Contributors

  • JavaScript basics
  • JavaScript first steps
  • JavaScript building blocks
  • Introducing JavaScript objects
  • Introduction
  • Grammar and types
  • Control flow and error handling
  • Loops and iteration
  • Expressions and operators
  • Numbers and dates
  • Text formatting
  • Regular expressions
  • Indexed collections
  • Keyed collections
  • Working with objects
  • Details of the object model
  • Iterators and generators
  • Meta programming
  • A re-introduction to JavaScript
  • JavaScript data structures
  • Equality comparisons and sameness
  • Inheritance and the prototype chain
  • Strict mode
  • JavaScript typed arrays
  • Memory Management
  • Concurrency model and Event Loop
  • References:
  • ArrayBuffer
  • AsyncFunction
  • Float32Array
  • Float64Array
  • GeneratorFunction
  • InternalError
  • Intl.Collator
  • Intl.DateTimeFormat
  • Intl.NumberFormat
  • ParallelArray
  • ReferenceError
  • SIMD.Bool16x8
  • SIMD.Bool32x4
  • SIMD.Bool64x2
  • SIMD.Bool8x16
  • SIMD.Float32x4
  • SIMD.Float64x2
  • SIMD.Int16x8
  • SIMD.Int32x4
  • SIMD.Int8x16
  • SIMD.Uint16x8
  • SIMD.Uint32x4
  • SIMD.Uint8x16
  • SharedArrayBuffer
  • StopIteration
  • SyntaxError
  • Uint16Array
  • Uint32Array
  • Uint8ClampedArray
  • WebAssembly
  • decodeURI()
  • decodeURIComponent()
  • encodeURI()
  • encodeURIComponent()
  • parseFloat()
  • Array comprehensions
  • Bitwise operators
  • Comma operator
  • Comparison operators
  • Conditional (ternary) Operator
  • Destructuring assignment
  • Expression closures
  • Generator comprehensions
  • Grouping operator
  • Legacy generator function expression
  • Logical Operators
  • Object initializer
  • Operator precedence
  • Property accessors
  • Spread syntax
  • async function expression
  • class expression
  • delete operator
  • function expression
  • function* expression
  • in operator
  • new operator
  • void operator
  • Legacy generator function
  • async function
  • for each...in
  • function declaration
  • try...catch
  • Arguments object
  • Arrow functions
  • Default parameters
  • Method definitions
  • Rest parameters
  • constructor
  • element loaded from a different domain for which you violated the same-origin policy.">Error: Permission denied to access property "x"
  • InternalError: too much recursion
  • RangeError: argument is not a valid code point
  • RangeError: invalid array length
  • RangeError: invalid date
  • RangeError: precision is out of range
  • RangeError: radix must be an integer
  • RangeError: repeat count must be less than infinity
  • RangeError: repeat count must be non-negative
  • ReferenceError: "x" is not defined
  • ReferenceError: assignment to undeclared variable "x"
  • ReferenceError: deprecated caller or arguments usage
  • ReferenceError: invalid assignment left-hand side
  • ReferenceError: reference to undefined property "x"
  • SyntaxError: "0"-prefixed octal literals and octal escape seq. are deprecated
  • SyntaxError: "use strict" not allowed in function with non-simple parameters
  • SyntaxError: "x" is a reserved identifier
  • SyntaxError: JSON.parse: bad parsing
  • SyntaxError: Malformed formal parameter
  • SyntaxError: Unexpected token
  • SyntaxError: Using //@ to indicate sourceURL pragmas is deprecated. Use //# instead
  • SyntaxError: a declaration in the head of a for-of loop can't have an initializer
  • SyntaxError: applying the 'delete' operator to an unqualified name is deprecated
  • SyntaxError: for-in loop head declarations may not have initializers
  • SyntaxError: function statement requires a name
  • SyntaxError: identifier starts immediately after numeric literal
  • SyntaxError: illegal character
  • SyntaxError: invalid regular expression flag "x"
  • SyntaxError: missing ) after argument list
  • SyntaxError: missing ) after condition
  • SyntaxError: missing : after property id
  • SyntaxError: missing ; before statement
  • SyntaxError: missing = in const declaration
  • SyntaxError: missing ] after element list
  • SyntaxError: missing formal parameter
  • SyntaxError: missing name after . operator
  • SyntaxError: missing variable name
  • SyntaxError: missing } after function body
  • SyntaxError: missing } after property list
  • SyntaxError: redeclaration of formal parameter "x"
  • SyntaxError: return not in function
  • SyntaxError: test for equality (==) mistyped as assignment (=)?
  • SyntaxError: unterminated string literal
  • TypeError: "x" has no properties
  • TypeError: "x" is (not) "y"
  • TypeError: "x" is not a constructor
  • TypeError: "x" is not a function
  • TypeError: "x" is not a non-null object
  • TypeError: "x" is read-only
  • TypeError: More arguments needed
  • TypeError: can't access dead object
  • TypeError: can't define property "x": "obj" is not extensible
  • TypeError: can't redefine non-configurable property "x"
  • TypeError: cyclic object value
  • TypeError: invalid 'in' operand "x"
  • TypeError: invalid Array.prototype.sort argument
  • TypeError: invalid arguments
  • TypeError: invalid assignment to const "x"
  • TypeError: property "x" is non-configurable and can't be deleted
  • TypeError: setting getter-only property "x"
  • TypeError: variable "x" redeclares argument
  • URIError: malformed URI sequence
  • Warning: -file- is being assigned a //# sourceMappingURL, but already has one
  • Warning: 08/09 is not a legal ECMA-262 octal constant
  • Warning: Date.prototype.toLocaleFormat is deprecated
  • Warning: JavaScript 1.6's for-each-in loops are deprecated
  • Warning: String.x is deprecated; use String.prototype.x instead
  • Warning: expression closures are deprecated
  • Warning: unreachable code after return statement
  • JavaScript technologies overview
  • Lexical grammar
  • Enumerability and ownership of properties
  • Iteration protocols
  • Transitioning to strict mode
  • Template literals
  • Deprecated features
  • ECMAScript 2015 support in Mozilla
  • ECMAScript 5 support in Mozilla
  • ECMAScript Next support in Mozilla
  • Firefox JavaScript changelog
  • New in JavaScript 1.1
  • New in JavaScript 1.2
  • New in JavaScript 1.3
  • New in JavaScript 1.4
  • New in JavaScript 1.5
  • New in JavaScript 1.6
  • New in JavaScript 1.7
  • New in JavaScript 1.8
  • New in JavaScript 1.8.1
  • New in JavaScript 1.8.5
  • Documentation:
  • All pages index
  • Methods index
  • Properties index
  • Pages tagged "JavaScript"
  • JavaScript doc status
  • The MDN project

Library homepage

  • school Campus Bookshelves
  • menu_book Bookshelves
  • perm_media Learning Objects
  • login Login
  • how_to_reg Request Instructor Account
  • hub Instructor Commons

Margin Size

  • Download Page (PDF)
  • Download Full Book (PDF)
  • Periodic Table
  • Physics Constants
  • Scientific Calculator
  • Reference & Cite
  • Tools expand_more
  • Readability

selected template will load here

This action is not available.

Engineering LibreTexts

17.3: Regex Syntax

  • Last updated
  • Save as PDF
  • Page ID 39676

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\id}{\mathrm{id}}\)

\( \newcommand{\kernel}{\mathrm{null}\,}\)

\( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\)

\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\)

\( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

\( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vectorC}[1]{\textbf{#1}} \)

\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

We will now have a closer look at the syntax of regular expressions as supported by the Regex package.

The simplest regular expression is a single character. It matches exactly that character. A sequence of characters matches a string with exactly the same sequence of characters:

Operators are applied to regular expressions to produce more complex regular expressions. Sequencing (placing expressions one after another) as an operator is, in a certain sense, invisible — yet it is arguably the most common.

We have already seen the Kleene star (*) and the + operator. A regular expression followed by an asterisk matches any number (including 0) of matches of the original expression. For example:

The Kleene star has higher precedence than sequencing. A star applies to the shortest possible subexpression that precedes it. For example, ab* means a followed by zero or more occurrences of b , not zero or more occurrences of ab :

To obtain a regex that matches zero or more occurrences of ab , we must enclose ab in parentheses:

Two other useful operators similar to * are + and ?. + matches one or more instances of the regex it modifies, and ? will match zero or one instance.

As we have seen, the characters *, +, ?, (, and ) have special meaning within regular expressions. If we need to match any of them literally, it should be escaped by preceding it with a backslash \ . Thus, backslash is also special character, and needs to be escaped for a literal match. The same holds for all further special characters we will see.

The last operator is | , which expresses choice between two subexpressions. It matches a string if either of the two subexpressions matches the string. It has the lowest precedence — even lower than sequencing. For example, ab*|ba* means a followed by any number of b’s, or b followed by any number of a’s :

A bit more complex example is the expression c(a|d)+r , which matches the name of any of the Lisp-style car, cdr, caar, cadr, ... functions:

It is possible to write an expression that matches an empty string, for example the expression a| matches an empty string. However, it is an error to apply *, +, or ? to such an expression: (a|)* is invalid.

So far, we have used only characters as the smallest components of regular expressions. There are other, more interesting, components. A character set is a string of characters enclosed in square brackets. It matches any single character if it appears between the brackets. For example, [ 01 ] matches either 0 or 1:

Using plus operator, we can build the following binary number recognizer:

If the first character after the opening bracket is CARET , the set is inverted: it matches any single character not appearing between the brackets:

For convenience, a set may include ranges: pairs of characters separated by a hyphen (-). This is equivalent to listing all characters in between: '[0-9]' is the same as '[0123456789]' . Special characters within a set are CARET , - , and ] , which closes the set. Below are examples how to literally match them in a set:

Thus, empty and universal sets cannot be specified.

Character classes

Regular expressions can also include the following backquote escapes to refer to popular classes of characters: \w to match alphanumeric characters, \d to match digits, and \s to match whitespace. Their upper-case variants, \W , \D and \S , match the complementary characters (non-alphanumerics, non-digits and non-whitespace). Here is a summary of the syntax seen so far:

As mentioned in the introduction, regular expressions are especially useful for validating user input, and character classes turn out to be especially useful for defining such regexes. For example, non-negative numbers can be matched with the regex \d+ :

Better yet, we might want to specify that non-zero numbers should not start with the digit 0:

We can check for negative and positive numbers as well:

Floating point numbers should require at least one digit after the dot:

For dessert, here is a recognizer for a general number format: anything like 999 , or 999.999 , or -999.999e+21 .

Character classes can also include the following grep(1)-compatible elements:

Note that these elements are components of the character classes, i.e. , they have to be enclosed in an extra set of square brackets to form a valid regular expression. For example, a non-empty string of digits would be represented as [[:digit:]]+ . The above primitive expressions and operators are common to many implementations of regular expressions.

Special character classes

The next primitive expression is unique to this Smalltalk implementation. A sequence of characters between colons is treated as a unary selector which is supposed to be understood by characters. A character matches such an expression if it answers true to a message with that selector. This allows a more readable and efficient way of specifying character classes. For example, [0-9] is equivalent to :isDigit: , but the latter is more efficient. Analogously to character sets, character classes can be negated: :CARETisDigit: matches a character that answers false to isDigit , and is therefore equivalent to [CARET0-9] .

So far we have seen the following equivalent ways to write a regular expression that matches a non-empty string of digits: [0-9]+ , \d+ , [\d]+ , [[:digit:]]+ , :isDigit:+ .

Matching boundaries

The last group of special primitive expressions shown next is used to match boundaries of strings.

Python Enhancement Proposals

  • Python »
  • PEP Index »

PEP 572 – Assignment Expressions

The importance of real code, exceptional cases, scope of the target, relative precedence of :=, change to evaluation order, differences between assignment expressions and assignment statements, specification changes during implementation, _pydecimal.py, datetime.py, sysconfig.py, simplifying list comprehensions, capturing condition values, changing the scope rules for comprehensions, alternative spellings, special-casing conditional statements, special-casing comprehensions, lowering operator precedence, allowing commas to the right, always requiring parentheses, why not just turn existing assignment into an expression, with assignment expressions, why bother with assignment statements, why not use a sublocal scope and prevent namespace pollution, style guide recommendations, acknowledgements, a numeric example, appendix b: rough code translations for comprehensions, appendix c: no changes to scope semantics.

This is a proposal for creating a way to assign to variables within an expression using the notation NAME := expr .

As part of this change, there is also an update to dictionary comprehension evaluation order to ensure key expressions are executed before value expressions (allowing the key to be bound to a name and then re-used as part of calculating the corresponding value).

During discussion of this PEP, the operator became informally known as “the walrus operator”. The construct’s formal name is “Assignment Expressions” (as per the PEP title), but they may also be referred to as “Named Expressions” (e.g. the CPython reference implementation uses that name internally).

Naming the result of an expression is an important part of programming, allowing a descriptive name to be used in place of a longer expression, and permitting reuse. Currently, this feature is available only in statement form, making it unavailable in list comprehensions and other expression contexts.

Additionally, naming sub-parts of a large expression can assist an interactive debugger, providing useful display hooks and partial results. Without a way to capture sub-expressions inline, this would require refactoring of the original code; with assignment expressions, this merely requires the insertion of a few name := markers. Removing the need to refactor reduces the likelihood that the code be inadvertently changed as part of debugging (a common cause of Heisenbugs), and is easier to dictate to another programmer.

During the development of this PEP many people (supporters and critics both) have had a tendency to focus on toy examples on the one hand, and on overly complex examples on the other.

The danger of toy examples is twofold: they are often too abstract to make anyone go “ooh, that’s compelling”, and they are easily refuted with “I would never write it that way anyway”.

The danger of overly complex examples is that they provide a convenient strawman for critics of the proposal to shoot down (“that’s obfuscated”).

Yet there is some use for both extremely simple and extremely complex examples: they are helpful to clarify the intended semantics. Therefore, there will be some of each below.

However, in order to be compelling , examples should be rooted in real code, i.e. code that was written without any thought of this PEP, as part of a useful application, however large or small. Tim Peters has been extremely helpful by going over his own personal code repository and picking examples of code he had written that (in his view) would have been clearer if rewritten with (sparing) use of assignment expressions. His conclusion: the current proposal would have allowed a modest but clear improvement in quite a few bits of code.

Another use of real code is to observe indirectly how much value programmers place on compactness. Guido van Rossum searched through a Dropbox code base and discovered some evidence that programmers value writing fewer lines over shorter lines.

Case in point: Guido found several examples where a programmer repeated a subexpression, slowing down the program, in order to save one line of code, e.g. instead of writing:

they would write:

Another example illustrates that programmers sometimes do more work to save an extra level of indentation:

This code tries to match pattern2 even if pattern1 has a match (in which case the match on pattern2 is never used). The more efficient rewrite would have been:

Syntax and semantics

In most contexts where arbitrary Python expressions can be used, a named expression can appear. This is of the form NAME := expr where expr is any valid Python expression other than an unparenthesized tuple, and NAME is an identifier.

The value of such a named expression is the same as the incorporated expression, with the additional side-effect that the target is assigned that value:

There are a few places where assignment expressions are not allowed, in order to avoid ambiguities or user confusion:

This rule is included to simplify the choice for the user between an assignment statement and an assignment expression – there is no syntactic position where both are valid.

Again, this rule is included to avoid two visually similar ways of saying the same thing.

This rule is included to disallow excessively confusing code, and because parsing keyword arguments is complex enough already.

This rule is included to discourage side effects in a position whose exact semantics are already confusing to many users (cf. the common style recommendation against mutable default values), and also to echo the similar prohibition in calls (the previous bullet).

The reasoning here is similar to the two previous cases; this ungrouped assortment of symbols and operators composed of : and = is hard to read correctly.

This allows lambda to always bind less tightly than := ; having a name binding at the top level inside a lambda function is unlikely to be of value, as there is no way to make use of it. In cases where the name will be used more than once, the expression is likely to need parenthesizing anyway, so this prohibition will rarely affect code.

This shows that what looks like an assignment operator in an f-string is not always an assignment operator. The f-string parser uses : to indicate formatting options. To preserve backwards compatibility, assignment operator usage inside of f-strings must be parenthesized. As noted above, this usage of the assignment operator is not recommended.

An assignment expression does not introduce a new scope. In most cases the scope in which the target will be bound is self-explanatory: it is the current scope. If this scope contains a nonlocal or global declaration for the target, the assignment expression honors that. A lambda (being an explicit, if anonymous, function definition) counts as a scope for this purpose.

There is one special case: an assignment expression occurring in a list, set or dict comprehension or in a generator expression (below collectively referred to as “comprehensions”) binds the target in the containing scope, honoring a nonlocal or global declaration for the target in that scope, if one exists. For the purpose of this rule the containing scope of a nested comprehension is the scope that contains the outermost comprehension. A lambda counts as a containing scope.

The motivation for this special case is twofold. First, it allows us to conveniently capture a “witness” for an any() expression, or a counterexample for all() , for example:

Second, it allows a compact way of updating mutable state from a comprehension, for example:

However, an assignment expression target name cannot be the same as a for -target name appearing in any comprehension containing the assignment expression. The latter names are local to the comprehension in which they appear, so it would be contradictory for a contained use of the same name to refer to the scope containing the outermost comprehension instead.

For example, [i := i+1 for i in range(5)] is invalid: the for i part establishes that i is local to the comprehension, but the i := part insists that i is not local to the comprehension. The same reason makes these examples invalid too:

While it’s technically possible to assign consistent semantics to these cases, it’s difficult to determine whether those semantics actually make sense in the absence of real use cases. Accordingly, the reference implementation [1] will ensure that such cases raise SyntaxError , rather than executing with implementation defined behaviour.

This restriction applies even if the assignment expression is never executed:

For the comprehension body (the part before the first “for” keyword) and the filter expression (the part after “if” and before any nested “for”), this restriction applies solely to target names that are also used as iteration variables in the comprehension. Lambda expressions appearing in these positions introduce a new explicit function scope, and hence may use assignment expressions with no additional restrictions.

Due to design constraints in the reference implementation (the symbol table analyser cannot easily detect when names are re-used between the leftmost comprehension iterable expression and the rest of the comprehension), named expressions are disallowed entirely as part of comprehension iterable expressions (the part after each “in”, and before any subsequent “if” or “for” keyword):

A further exception applies when an assignment expression occurs in a comprehension whose containing scope is a class scope. If the rules above were to result in the target being assigned in that class’s scope, the assignment expression is expressly invalid. This case also raises SyntaxError :

(The reason for the latter exception is the implicit function scope created for comprehensions – there is currently no runtime mechanism for a function to refer to a variable in the containing class scope, and we do not want to add such a mechanism. If this issue ever gets resolved this special case may be removed from the specification of assignment expressions. Note that the problem already exists for using a variable defined in the class scope from a comprehension.)

See Appendix B for some examples of how the rules for targets in comprehensions translate to equivalent code.

The := operator groups more tightly than a comma in all syntactic positions where it is legal, but less tightly than all other operators, including or , and , not , and conditional expressions ( A if C else B ). As follows from section “Exceptional cases” above, it is never allowed at the same level as = . In case a different grouping is desired, parentheses should be used.

The := operator may be used directly in a positional function call argument; however it is invalid directly in a keyword argument.

Some examples to clarify what’s technically valid or invalid:

Most of the “valid” examples above are not recommended, since human readers of Python source code who are quickly glancing at some code may miss the distinction. But simple cases are not objectionable:

This PEP recommends always putting spaces around := , similar to PEP 8 ’s recommendation for = when used for assignment, whereas the latter disallows spaces around = used for keyword arguments.)

In order to have precisely defined semantics, the proposal requires evaluation order to be well-defined. This is technically not a new requirement, as function calls may already have side effects. Python already has a rule that subexpressions are generally evaluated from left to right. However, assignment expressions make these side effects more visible, and we propose a single change to the current evaluation order:

  • In a dict comprehension {X: Y for ...} , Y is currently evaluated before X . We propose to change this so that X is evaluated before Y . (In a dict display like {X: Y} this is already the case, and also in dict((X, Y) for ...) which should clearly be equivalent to the dict comprehension.)

Most importantly, since := is an expression, it can be used in contexts where statements are illegal, including lambda functions and comprehensions.

Conversely, assignment expressions don’t support the advanced features found in assignment statements:

  • Multiple targets are not directly supported: x = y = z = 0 # Equivalent: (z := (y := (x := 0)))
  • Single assignment targets other than a single NAME are not supported: # No equivalent a [ i ] = x self . rest = []
  • Priority around commas is different: x = 1 , 2 # Sets x to (1, 2) ( x := 1 , 2 ) # Sets x to 1
  • Iterable packing and unpacking (both regular or extended forms) are not supported: # Equivalent needs extra parentheses loc = x , y # Use (loc := (x, y)) info = name , phone , * rest # Use (info := (name, phone, *rest)) # No equivalent px , py , pz = position name , phone , email , * other_info = contact
  • Inline type annotations are not supported: # Closest equivalent is "p: Optional[int]" as a separate declaration p : Optional [ int ] = None
  • Augmented assignment is not supported: total += tax # Equivalent: (total := total + tax)

The following changes have been made based on implementation experience and additional review after the PEP was first accepted and before Python 3.8 was released:

  • for consistency with other similar exceptions, and to avoid locking in an exception name that is not necessarily going to improve clarity for end users, the originally proposed TargetScopeError subclass of SyntaxError was dropped in favour of just raising SyntaxError directly. [3]
  • due to a limitation in CPython’s symbol table analysis process, the reference implementation raises SyntaxError for all uses of named expressions inside comprehension iterable expressions, rather than only raising them when the named expression target conflicts with one of the iteration variables in the comprehension. This could be revisited given sufficiently compelling examples, but the extra complexity needed to implement the more selective restriction doesn’t seem worthwhile for purely hypothetical use cases.

Examples from the Python standard library

env_base is only used on these lines, putting its assignment on the if moves it as the “header” of the block.

  • Current: env_base = os . environ . get ( "PYTHONUSERBASE" , None ) if env_base : return env_base
  • Improved: if env_base := os . environ . get ( "PYTHONUSERBASE" , None ): return env_base

Avoid nested if and remove one indentation level.

  • Current: if self . _is_special : ans = self . _check_nans ( context = context ) if ans : return ans
  • Improved: if self . _is_special and ( ans := self . _check_nans ( context = context )): return ans

Code looks more regular and avoid multiple nested if. (See Appendix A for the origin of this example.)

  • Current: reductor = dispatch_table . get ( cls ) if reductor : rv = reductor ( x ) else : reductor = getattr ( x , "__reduce_ex__" , None ) if reductor : rv = reductor ( 4 ) else : reductor = getattr ( x , "__reduce__" , None ) if reductor : rv = reductor () else : raise Error ( "un(deep)copyable object of type %s " % cls )
  • Improved: if reductor := dispatch_table . get ( cls ): rv = reductor ( x ) elif reductor := getattr ( x , "__reduce_ex__" , None ): rv = reductor ( 4 ) elif reductor := getattr ( x , "__reduce__" , None ): rv = reductor () else : raise Error ( "un(deep)copyable object of type %s " % cls )

tz is only used for s += tz , moving its assignment inside the if helps to show its scope.

  • Current: s = _format_time ( self . _hour , self . _minute , self . _second , self . _microsecond , timespec ) tz = self . _tzstr () if tz : s += tz return s
  • Improved: s = _format_time ( self . _hour , self . _minute , self . _second , self . _microsecond , timespec ) if tz := self . _tzstr (): s += tz return s

Calling fp.readline() in the while condition and calling .match() on the if lines make the code more compact without making it harder to understand.

  • Current: while True : line = fp . readline () if not line : break m = define_rx . match ( line ) if m : n , v = m . group ( 1 , 2 ) try : v = int ( v ) except ValueError : pass vars [ n ] = v else : m = undef_rx . match ( line ) if m : vars [ m . group ( 1 )] = 0
  • Improved: while line := fp . readline (): if m := define_rx . match ( line ): n , v = m . group ( 1 , 2 ) try : v = int ( v ) except ValueError : pass vars [ n ] = v elif m := undef_rx . match ( line ): vars [ m . group ( 1 )] = 0

A list comprehension can map and filter efficiently by capturing the condition:

Similarly, a subexpression can be reused within the main expression, by giving it a name on first use:

Note that in both cases the variable y is bound in the containing scope (i.e. at the same level as results or stuff ).

Assignment expressions can be used to good effect in the header of an if or while statement:

Particularly with the while loop, this can remove the need to have an infinite loop, an assignment, and a condition. It also creates a smooth parallel between a loop which simply uses a function call as its condition, and one which uses that as its condition but also uses the actual value.

An example from the low-level UNIX world:

Rejected alternative proposals

Proposals broadly similar to this one have come up frequently on python-ideas. Below are a number of alternative syntaxes, some of them specific to comprehensions, which have been rejected in favour of the one given above.

A previous version of this PEP proposed subtle changes to the scope rules for comprehensions, to make them more usable in class scope and to unify the scope of the “outermost iterable” and the rest of the comprehension. However, this part of the proposal would have caused backwards incompatibilities, and has been withdrawn so the PEP can focus on assignment expressions.

Broadly the same semantics as the current proposal, but spelled differently.

Since EXPR as NAME already has meaning in import , except and with statements (with different semantics), this would create unnecessary confusion or require special-casing (e.g. to forbid assignment within the headers of these statements).

(Note that with EXPR as VAR does not simply assign the value of EXPR to VAR – it calls EXPR.__enter__() and assigns the result of that to VAR .)

Additional reasons to prefer := over this spelling include:

  • In if f(x) as y the assignment target doesn’t jump out at you – it just reads like if f x blah blah and it is too similar visually to if f(x) and y .
  • import foo as bar
  • except Exc as var
  • with ctxmgr() as var

To the contrary, the assignment expression does not belong to the if or while that starts the line, and we intentionally allow assignment expressions in other contexts as well.

  • NAME = EXPR
  • if NAME := EXPR

reinforces the visual recognition of assignment expressions.

This syntax is inspired by languages such as R and Haskell, and some programmable calculators. (Note that a left-facing arrow y <- f(x) is not possible in Python, as it would be interpreted as less-than and unary minus.) This syntax has a slight advantage over ‘as’ in that it does not conflict with with , except and import , but otherwise is equivalent. But it is entirely unrelated to Python’s other use of -> (function return type annotations), and compared to := (which dates back to Algol-58) it has a much weaker tradition.

This has the advantage that leaked usage can be readily detected, removing some forms of syntactic ambiguity. However, this would be the only place in Python where a variable’s scope is encoded into its name, making refactoring harder.

Execution order is inverted (the indented body is performed first, followed by the “header”). This requires a new keyword, unless an existing keyword is repurposed (most likely with: ). See PEP 3150 for prior discussion on this subject (with the proposed keyword being given: ).

This syntax has fewer conflicts than as does (conflicting only with the raise Exc from Exc notation), but is otherwise comparable to it. Instead of paralleling with expr as target: (which can be useful but can also be confusing), this has no parallels, but is evocative.

One of the most popular use-cases is if and while statements. Instead of a more general solution, this proposal enhances the syntax of these two statements to add a means of capturing the compared value:

This works beautifully if and ONLY if the desired condition is based on the truthiness of the captured value. It is thus effective for specific use-cases (regex matches, socket reads that return '' when done), and completely useless in more complicated cases (e.g. where the condition is f(x) < 0 and you want to capture the value of f(x) ). It also has no benefit to list comprehensions.

Advantages: No syntactic ambiguities. Disadvantages: Answers only a fraction of possible use-cases, even in if / while statements.

Another common use-case is comprehensions (list/set/dict, and genexps). As above, proposals have been made for comprehension-specific solutions.

This brings the subexpression to a location in between the ‘for’ loop and the expression. It introduces an additional language keyword, which creates conflicts. Of the three, where reads the most cleanly, but also has the greatest potential for conflict (e.g. SQLAlchemy and numpy have where methods, as does tkinter.dnd.Icon in the standard library).

As above, but reusing the with keyword. Doesn’t read too badly, and needs no additional language keyword. Is restricted to comprehensions, though, and cannot as easily be transformed into “longhand” for-loop syntax. Has the C problem that an equals sign in an expression can now create a name binding, rather than performing a comparison. Would raise the question of why “with NAME = EXPR:” cannot be used as a statement on its own.

As per option 2, but using as rather than an equals sign. Aligns syntactically with other uses of as for name binding, but a simple transformation to for-loop longhand would create drastically different semantics; the meaning of with inside a comprehension would be completely different from the meaning as a stand-alone statement, while retaining identical syntax.

Regardless of the spelling chosen, this introduces a stark difference between comprehensions and the equivalent unrolled long-hand form of the loop. It is no longer possible to unwrap the loop into statement form without reworking any name bindings. The only keyword that can be repurposed to this task is with , thus giving it sneakily different semantics in a comprehension than in a statement; alternatively, a new keyword is needed, with all the costs therein.

There are two logical precedences for the := operator. Either it should bind as loosely as possible, as does statement-assignment; or it should bind more tightly than comparison operators. Placing its precedence between the comparison and arithmetic operators (to be precise: just lower than bitwise OR) allows most uses inside while and if conditions to be spelled without parentheses, as it is most likely that you wish to capture the value of something, then perform a comparison on it:

Once find() returns -1, the loop terminates. If := binds as loosely as = does, this would capture the result of the comparison (generally either True or False ), which is less useful.

While this behaviour would be convenient in many situations, it is also harder to explain than “the := operator behaves just like the assignment statement”, and as such, the precedence for := has been made as close as possible to that of = (with the exception that it binds tighter than comma).

Some critics have claimed that the assignment expressions should allow unparenthesized tuples on the right, so that these two would be equivalent:

(With the current version of the proposal, the latter would be equivalent to ((point := x), y) .)

However, adopting this stance would logically lead to the conclusion that when used in a function call, assignment expressions also bind less tight than comma, so we’d have the following confusing equivalence:

The less confusing option is to make := bind more tightly than comma.

It’s been proposed to just always require parentheses around an assignment expression. This would resolve many ambiguities, and indeed parentheses will frequently be needed to extract the desired subexpression. But in the following cases the extra parentheses feel redundant:

Frequently Raised Objections

C and its derivatives define the = operator as an expression, rather than a statement as is Python’s way. This allows assignments in more contexts, including contexts where comparisons are more common. The syntactic similarity between if (x == y) and if (x = y) belies their drastically different semantics. Thus this proposal uses := to clarify the distinction.

The two forms have different flexibilities. The := operator can be used inside a larger expression; the = statement can be augmented to += and its friends, can be chained, and can assign to attributes and subscripts.

Previous revisions of this proposal involved sublocal scope (restricted to a single statement), preventing name leakage and namespace pollution. While a definite advantage in a number of situations, this increases complexity in many others, and the costs are not justified by the benefits. In the interests of language simplicity, the name bindings created here are exactly equivalent to any other name bindings, including that usage at class or module scope will create externally-visible names. This is no different from for loops or other constructs, and can be solved the same way: del the name once it is no longer needed, or prefix it with an underscore.

(The author wishes to thank Guido van Rossum and Christoph Groth for their suggestions to move the proposal in this direction. [2] )

As expression assignments can sometimes be used equivalently to statement assignments, the question of which should be preferred will arise. For the benefit of style guides such as PEP 8 , two recommendations are suggested.

  • If either assignment statements or assignment expressions can be used, prefer statements; they are a clear declaration of intent.
  • If using assignment expressions would lead to ambiguity about execution order, restructure it to use statements instead.

The authors wish to thank Alyssa Coghlan and Steven D’Aprano for their considerable contributions to this proposal, and members of the core-mentorship mailing list for assistance with implementation.

Appendix A: Tim Peters’s findings

Here’s a brief essay Tim Peters wrote on the topic.

I dislike “busy” lines of code, and also dislike putting conceptually unrelated logic on a single line. So, for example, instead of:

instead. So I suspected I’d find few places I’d want to use assignment expressions. I didn’t even consider them for lines already stretching halfway across the screen. In other cases, “unrelated” ruled:

is a vast improvement over the briefer:

The original two statements are doing entirely different conceptual things, and slamming them together is conceptually insane.

In other cases, combining related logic made it harder to understand, such as rewriting:

as the briefer:

The while test there is too subtle, crucially relying on strict left-to-right evaluation in a non-short-circuiting or method-chaining context. My brain isn’t wired that way.

But cases like that were rare. Name binding is very frequent, and “sparse is better than dense” does not mean “almost empty is better than sparse”. For example, I have many functions that return None or 0 to communicate “I have nothing useful to return in this case, but since that’s expected often I’m not going to annoy you with an exception”. This is essentially the same as regular expression search functions returning None when there is no match. So there was lots of code of the form:

I find that clearer, and certainly a bit less typing and pattern-matching reading, as:

It’s also nice to trade away a small amount of horizontal whitespace to get another _line_ of surrounding code on screen. I didn’t give much weight to this at first, but it was so very frequent it added up, and I soon enough became annoyed that I couldn’t actually run the briefer code. That surprised me!

There are other cases where assignment expressions really shine. Rather than pick another from my code, Kirill Balunov gave a lovely example from the standard library’s copy() function in copy.py :

The ever-increasing indentation is semantically misleading: the logic is conceptually flat, “the first test that succeeds wins”:

Using easy assignment expressions allows the visual structure of the code to emphasize the conceptual flatness of the logic; ever-increasing indentation obscured it.

A smaller example from my code delighted me, both allowing to put inherently related logic in a single line, and allowing to remove an annoying “artificial” indentation level:

That if is about as long as I want my lines to get, but remains easy to follow.

So, in all, in most lines binding a name, I wouldn’t use assignment expressions, but because that construct is so very frequent, that leaves many places I would. In most of the latter, I found a small win that adds up due to how often it occurs, and in the rest I found a moderate to major win. I’d certainly use it more often than ternary if , but significantly less often than augmented assignment.

I have another example that quite impressed me at the time.

Where all variables are positive integers, and a is at least as large as the n’th root of x, this algorithm returns the floor of the n’th root of x (and roughly doubling the number of accurate bits per iteration):

It’s not obvious why that works, but is no more obvious in the “loop and a half” form. It’s hard to prove correctness without building on the right insight (the “arithmetic mean - geometric mean inequality”), and knowing some non-trivial things about how nested floor functions behave. That is, the challenges are in the math, not really in the coding.

If you do know all that, then the assignment-expression form is easily read as “while the current guess is too large, get a smaller guess”, where the “too large?” test and the new guess share an expensive sub-expression.

To my eyes, the original form is harder to understand:

This appendix attempts to clarify (though not specify) the rules when a target occurs in a comprehension or in a generator expression. For a number of illustrative examples we show the original code, containing a comprehension, and the translation, where the comprehension has been replaced by an equivalent generator function plus some scaffolding.

Since [x for ...] is equivalent to list(x for ...) these examples all use list comprehensions without loss of generality. And since these examples are meant to clarify edge cases of the rules, they aren’t trying to look like real code.

Note: comprehensions are already implemented via synthesizing nested generator functions like those in this appendix. The new part is adding appropriate declarations to establish the intended scope of assignment expression targets (the same scope they resolve to as if the assignment were performed in the block containing the outermost comprehension). For type inference purposes, these illustrative expansions do not imply that assignment expression targets are always Optional (but they do indicate the target binding scope).

Let’s start with a reminder of what code is generated for a generator expression without assignment expression.

  • Original code (EXPR usually references VAR): def f (): a = [ EXPR for VAR in ITERABLE ]
  • Translation (let’s not worry about name conflicts): def f (): def genexpr ( iterator ): for VAR in iterator : yield EXPR a = list ( genexpr ( iter ( ITERABLE )))

Let’s add a simple assignment expression.

  • Original code: def f (): a = [ TARGET := EXPR for VAR in ITERABLE ]
  • Translation: def f (): if False : TARGET = None # Dead code to ensure TARGET is a local variable def genexpr ( iterator ): nonlocal TARGET for VAR in iterator : TARGET = EXPR yield TARGET a = list ( genexpr ( iter ( ITERABLE )))

Let’s add a global TARGET declaration in f() .

  • Original code: def f (): global TARGET a = [ TARGET := EXPR for VAR in ITERABLE ]
  • Translation: def f (): global TARGET def genexpr ( iterator ): global TARGET for VAR in iterator : TARGET = EXPR yield TARGET a = list ( genexpr ( iter ( ITERABLE )))

Or instead let’s add a nonlocal TARGET declaration in f() .

  • Original code: def g (): TARGET = ... def f (): nonlocal TARGET a = [ TARGET := EXPR for VAR in ITERABLE ]
  • Translation: def g (): TARGET = ... def f (): nonlocal TARGET def genexpr ( iterator ): nonlocal TARGET for VAR in iterator : TARGET = EXPR yield TARGET a = list ( genexpr ( iter ( ITERABLE )))

Finally, let’s nest two comprehensions.

  • Original code: def f (): a = [[ TARGET := i for i in range ( 3 )] for j in range ( 2 )] # I.e., a = [[0, 1, 2], [0, 1, 2]] print ( TARGET ) # prints 2
  • Translation: def f (): if False : TARGET = None def outer_genexpr ( outer_iterator ): nonlocal TARGET def inner_generator ( inner_iterator ): nonlocal TARGET for i in inner_iterator : TARGET = i yield i for j in outer_iterator : yield list ( inner_generator ( range ( 3 ))) a = list ( outer_genexpr ( range ( 2 ))) print ( TARGET )

Because it has been a point of confusion, note that nothing about Python’s scoping semantics is changed. Function-local scopes continue to be resolved at compile time, and to have indefinite temporal extent at run time (“full closures”). Example:

This document has been placed in the public domain.

Source: https://github.com/python/peps/blob/main/peps/pep-0572.rst

Last modified: 2023-10-11 12:05:51 GMT

  • Java Arrays
  • Java Strings
  • Java Collection
  • Java 8 Tutorial
  • Java Multithreading
  • Java Exception Handling
  • Java Programs
  • Java Project
  • Java Collections Interview
  • Java Interview Questions
  • Spring Boot
  • Java Tutorial

Overview of Java

  • Introduction to Java
  • The Complete History of Java Programming Language
  • C++ vs Java vs Python
  • How to Download and Install Java for 64 bit machine?
  • Setting up the environment in Java
  • How to Download and Install Eclipse on Windows?
  • JDK in Java
  • How JVM Works - JVM Architecture?
  • Differences between JDK, JRE and JVM
  • Just In Time Compiler
  • Difference between JIT and JVM in Java
  • Difference between Byte Code and Machine Code
  • How is Java platform independent?

Basics of Java

  • Java Basic Syntax
  • Java Hello World Program
  • Java Data Types
  • Primitive data type vs. Object data type in Java with Examples
  • Java Identifiers

Operators in Java

  • Java Variables
  • Scope of Variables In Java

Wrapper Classes in Java

Input/output in java.

  • How to Take Input From User in Java?
  • Scanner Class in Java
  • Java.io.BufferedReader Class in Java
  • Difference Between Scanner and BufferedReader Class in Java
  • Ways to read input from console in Java
  • System.out.println in Java
  • Difference between print() and println() in Java
  • Formatted Output in Java using printf()
  • Fast I/O in Java in Competitive Programming

Flow Control in Java

  • Decision Making in Java (if, if-else, switch, break, continue, jump)
  • Java if statement with Examples
  • Java if-else
  • Java if-else-if ladder with Examples
  • Loops in Java
  • For Loop in Java
  • Java while loop with Examples
  • Java do-while loop with Examples
  • For-each loop in Java
  • Continue Statement in Java
  • Break statement in Java
  • Usage of Break keyword in Java
  • return keyword in Java
  • Java Arithmetic Operators with Examples
  • Java Unary Operator with Examples
  • Java Assignment Operators with Examples
  • Java Relational Operators with Examples
  • Java Logical Operators with Examples
  • Java Ternary Operator with Examples
  • Bitwise Operators in Java
  • Strings in Java
  • String class in Java
  • Java.lang.String class in Java | Set 2
  • Why Java Strings are Immutable?
  • StringBuffer class in Java
  • StringBuilder Class in Java with Examples
  • String vs StringBuilder vs StringBuffer in Java
  • StringTokenizer Class in Java
  • StringTokenizer Methods in Java with Examples | Set 2
  • StringJoiner Class in Java
  • Arrays in Java
  • Arrays class in Java
  • Multidimensional Arrays in Java
  • Different Ways To Declare And Initialize 2-D Array in Java
  • Jagged Array in Java
  • Final Arrays in Java
  • Reflection Array Class in Java
  • util.Arrays vs reflect.Array in Java with Examples

OOPS in Java

  • Object Oriented Programming (OOPs) Concept in Java
  • Why Java is not a purely Object-Oriented Language?
  • Classes and Objects in Java
  • Naming Conventions in Java
  • Java Methods

Access Modifiers in Java

  • Java Constructors
  • Four Main Object Oriented Programming Concepts of Java

Inheritance in Java

Abstraction in java, encapsulation in java, polymorphism in java, interfaces in java.

  • 'this' reference in Java
  • Inheritance and Constructors in Java
  • Java and Multiple Inheritance
  • Interfaces and Inheritance in Java
  • Association, Composition and Aggregation in Java
  • Comparison of Inheritance in C++ and Java
  • abstract keyword in java
  • Abstract Class in Java
  • Difference between Abstract Class and Interface in Java
  • Control Abstraction in Java with Examples
  • Difference Between Data Hiding and Abstraction in Java
  • Difference between Abstraction and Encapsulation in Java with Examples
  • Difference between Inheritance and Polymorphism
  • Dynamic Method Dispatch or Runtime Polymorphism in Java
  • Difference between Compile-time and Run-time Polymorphism in Java

Constructors in Java

  • Copy Constructor in Java
  • Constructor Overloading in Java
  • Constructor Chaining In Java with Examples
  • Private Constructors and Singleton Classes in Java

Methods in Java

  • Static methods vs Instance methods in Java
  • Abstract Method in Java with Examples
  • Overriding in Java
  • Method Overloading in Java
  • Difference Between Method Overloading and Method Overriding in Java
  • Differences between Interface and Class in Java
  • Functional Interfaces in Java
  • Nested Interface in Java
  • Marker interface in Java
  • Comparator Interface in Java with Examples
  • Need of Wrapper Classes in Java
  • Different Ways to Create the Instances of Wrapper Classes in Java
  • Character Class in Java
  • Java.Lang.Byte class in Java
  • Java.Lang.Short class in Java
  • Java.lang.Integer class in Java
  • Java.Lang.Long class in Java
  • Java.Lang.Float class in Java
  • Java.Lang.Double Class in Java
  • Java.lang.Boolean Class in Java
  • Autoboxing and Unboxing in Java
  • Type conversion in Java with Examples

Keywords in Java

  • Java Keywords
  • Important Keywords in Java
  • Super Keyword in Java
  • final Keyword in Java
  • static Keyword in Java
  • enum in Java
  • transient keyword in Java
  • volatile Keyword in Java
  • final, finally and finalize in Java
  • Public vs Protected vs Package vs Private Access Modifier in Java
  • Access and Non Access Modifiers in Java

Memory Allocation in Java

  • Java Memory Management
  • How are Java objects stored in memory?
  • Stack vs Heap Memory Allocation
  • How many types of memory areas are allocated by JVM?
  • Garbage Collection in Java
  • Types of JVM Garbage Collectors in Java with implementation details
  • Memory leaks in Java
  • Java Virtual Machine (JVM) Stack Area

Classes of Java

  • Understanding Classes and Objects in Java
  • Singleton Method Design Pattern in Java
  • Object Class in Java
  • Inner Class in Java
  • Throwable Class in Java with Examples

Packages in Java

  • Packages In Java
  • How to Create a Package in Java?
  • Java.util Package in Java
  • Java.lang package in Java
  • Java.io Package in Java
  • Java Collection Tutorial

Exception Handling in Java

  • Exceptions in Java
  • Types of Exception in Java with Examples
  • Checked vs Unchecked Exceptions in Java
  • Java Try Catch Block
  • Flow control in try catch finally in Java
  • throw and throws in Java
  • User-defined Custom Exception in Java
  • Chained Exceptions in Java
  • Null Pointer Exception In Java
  • Exception Handling with Method Overriding in Java
  • Multithreading in Java
  • Lifecycle and States of a Thread in Java
  • Java Thread Priority in Multithreading
  • Main thread in Java
  • Java.lang.Thread Class in Java
  • Runnable interface in Java
  • Naming a thread and fetching name of current thread in Java
  • What does start() function do in multithreading in Java?
  • Difference between Thread.start() and Thread.run() in Java
  • Thread.sleep() Method in Java With Examples
  • Synchronization in Java
  • Importance of Thread Synchronization in Java
  • Method and Block Synchronization in Java
  • Lock framework vs Thread synchronization in Java
  • Difference Between Atomic, Volatile and Synchronized in Java
  • Deadlock in Java Multithreading
  • Deadlock Prevention And Avoidance
  • Difference Between Lock and Monitor in Java Concurrency
  • Reentrant Lock in Java

File Handling in Java

  • Java.io.File Class in Java
  • Java Program to Create a New File
  • Different ways of Reading a text file in Java
  • Java Program to Write into a File
  • Delete a File Using Java
  • File Permissions in Java
  • FileWriter Class in Java
  • Java.io.FileDescriptor in Java
  • Java.io.RandomAccessFile Class Method | Set 1

Regular Expressions in Java

  • Regex Tutorial - How to write Regular Expressions?
  • Matcher pattern() method in Java with Examples
  • Pattern pattern() method in Java with Examples
  • Quantifiers in Java
  • java.lang.Character class methods | Set 1
  • Java IO : Input-output in Java with Examples
  • Java.io.Reader class in Java
  • Java.io.Writer Class in Java
  • Java.io.FileInputStream Class in Java
  • FileOutputStream in Java
  • Java.io.BufferedOutputStream class in Java
  • Java Networking
  • TCP/IP Model
  • User Datagram Protocol (UDP)
  • Differences between IPv4 and IPv6
  • Difference between Connection-oriented and Connection-less Services
  • Socket Programming in Java
  • java.net.ServerSocket Class in Java
  • URL Class in Java with Examples

JDBC - Java Database Connectivity

  • Introduction to JDBC (Java Database Connectivity)
  • JDBC Drivers
  • Establishing JDBC Connection in Java
  • Types of Statements in JDBC
  • JDBC Tutorial
  • Java 8 Features - Complete Tutorial

In Java, Regular Expressions or Regex (in short) in Java is an API for defining String patterns that can be used for searching, manipulating, and editing a string in Java. Email validation and passwords are a few areas of strings where Regex is widely used to define the constraints. Regular Expressions in Java are provided under java.util.regex package. This consists of 3 classes and 1 interface . The java.util.regex package primarily consists of the following three classes as depicted below in tabular format as follows:

Regex Classes and Interfaces

Regex in Java provides 3 classes and 1 interface which are as follows:

Pattern Class

  • Matcher Class
  • PatternSyntaxException Class
  • MatchResult Interface

More understanding can be interpreted from the image provided below as follows:

Java Regex

This class is a compilation of regular expressions that can be used to define various types of patterns, providing no public constructors. This can be created by invoking the compile() method which accepts a regular expression as the first argument, thus returning a pattern after execution.

Example: Pattern class 

Matcher class

This object is used to perform match operations for an input string in Java, thus interpreting the previously explained patterns. This too defines no public constructors. This can be implemented by invoking a matcher() on any pattern object.

Note: T Pattern.matches() checks if the whole text matches with a pattern or not. Other methods (demonstrated below) are mainly used to find multiple occurrences of patterns in the text.

Let us do discuss a few sample programs as we did for the Pattern class. Here we will be discussing a few Java programs that demonstrate the workings of compile(), find(), start(), end(), and split() in order to get a better understanding of the Matcher class.

Example 1: Pattern Searching 

Regex Character classes

Below is the implementation of the above topic:

Regex Metacharacters

Below is the implementation of Regex Metacharacters:

Java Regex Finder Example

Below is the implementation of the Java Regex Finder:

Lastly, let us do discuss some of the important observations as retrieved from the above article

  • We create a pattern object by calling Pattern.compile(), there is no constructor. compile() is a static method in the Pattern class.
  • Like above, we create a Matcher object using matcher() on objects of the Pattern class.
  • Pattern.matches() is also a static method that is used to check if a given text as a whole matches the pattern or not.
  • find() is used to find multiple occurrences of patterns in the text.
  • We can split a text based on a delimiter pattern using the split() method

FAQs in Java Regex

Q1. what are regular expressions in java.

Regular Expressions in java are used for string patterns that can be used for searching, manipulating, and editing a string in Java.

Q2. What is a simple example of regular expression in Java?

A simple example of a regular expression in java is mentioned below: Java // Java Program to check on Regex import java.io.*; import java.util.regex.*;   // Driver class class GFG {      // Main function      public static void main(String[] args)      {          // Checks if the string matches with the regex          // Should be single character a to z          System.out.println(Pattern.matches( "[a-z]" , "g" ));             // Check if all the elements are non-numbers          System.out.println(Pattern.matches( "\\D+" , "Gfg" ));            // Check if all the elements are non-spaces          System.out.println(Pattern.matches( "\\S+" , "gfg" ));      } } Output true true true

Please Login to comment...

Similar reads.

  • java-regular-expression
  • Technical Scripter

Improve your Coding Skills with Practice

 alt=

What kind of Experience do you want to share?

Learn Regular Expressions by Building a Spam Filter - Step 5

Tell us what’s happening:

Describe your issue in detail here. confused ive with just = and += and it wont pass

Your code so far

Your browser information:.

User Agent is: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36

Specifications

Browser compatibility.

An assignment operator assigns a value to its left operand based on the value of its right operand.

The source for this interactive example is stored in a GitHub repository. If you'd like to contribute to the interactive examples project, please clone https://github.com/mdn/interactive-examples and send us a pull request.

The basic assignment operator is equal ( = ), which assigns the value of its right operand to its left operand. That is, x = y assigns the value of y to x . The other assignment operators are usually shorthand for standard operations, as shown in the following definitions and examples.

Simple assignment operator is used to assign a value to a variable. The assignment operation evaluates to the assigned value. Chaining the assignment operator is possible in order to assign a single value to multiple variables. See the example.

Addition assignment

The addition assignment operator adds the value of the right operand to a variable and assigns the result to the variable. The types of the two operands determine the behavior of the addition assignment operator. Addition or concatenation is possible. See the addition operator for more details.

Subtraction assignment

The subtraction assignment operator subtracts the value of the right operand from a variable and assigns the result to the variable. See the subtraction operator for more details.

Multiplication assignment

The multiplication assignment operator multiplies a variable by the value of the right operand and assigns the result to the variable. See the multiplication operator for more details.

Division assignment

The division assignment operator divides a variable by the value of the right operand and assigns the result to the variable. See the division operator for more details.

Remainder assignment

The remainder assignment operator divides a variable by the value of the right operand and assigns the remainder to the variable. See the remainder operator for more details.

Exponentiation assignment

The exponentiation assignment operator evaluates to the result of raising first operand to the power second operand. See the exponentiation operator for more details.

Left shift assignment

The left shift assignment operator moves the specified amount of bits to the left and assigns the result to the variable. See the left shift operator for more details.

Right shift assignment

The right shift assignment operator moves the specified amount of bits to the right and assigns the result to the variable. See the right shift operator for more details.

Unsigned right shift assignment

The unsigned right shift assignment operator moves the specified amount of bits to the right and assigns the result to the variable. See the unsigned right shift operator for more details.

Bitwise AND assignment

The bitwise AND assignment operator uses the binary representation of both operands, does a bitwise AND operation on them and assigns the result to the variable. See the bitwise AND operator for more details.

Bitwise XOR assignment

The bitwise XOR assignment operator uses the binary representation of both operands, does a bitwise XOR operation on them and assigns the result to the variable. See the bitwise XOR operator for more details.

Bitwise OR assignment

The bitwise OR assignment operator uses the binary representation of both operands, does a bitwise OR operation on them and assigns the result to the variable. See the bitwise OR operator for more details.

Left operand with another assignment operator

In unusual situations, the assignment operator (e.g. x += y ) is not identical to the meaning expression (here x = x + y ). When the left operand of an assignment operator itself contains an assignment operator, the left operand is evaluated only once. For example:

  • Arithmetic operators

Document Tags and Contributors

  • JavaScript basics
  • JavaScript first steps
  • JavaScript building blocks
  • Introducing JavaScript objects
  • Introduction
  • Grammar and types
  • Control flow and error handling
  • Loops and iteration
  • Expressions and operators
  • Numbers and dates
  • Text formatting
  • Regular expressions
  • Indexed collections
  • Keyed collections
  • Working with objects
  • Details of the object model
  • Using promises
  • Iterators and generators
  • Meta programming
  • JavaScript modules
  • Client-side web APIs
  • A re-introduction to JavaScript
  • JavaScript data structures
  • Equality comparisons and sameness
  • Inheritance and the prototype chain
  • Strict mode
  • JavaScript typed arrays
  • Memory Management
  • Concurrency model and Event Loop
  • References:
  • ArrayBuffer
  • AsyncFunction
  • Float32Array
  • Float64Array
  • GeneratorFunction
  • InternalError
  • Intl.Collator
  • Intl.DateTimeFormat
  • Intl.ListFormat
  • Intl.Locale
  • Intl.NumberFormat
  • Intl.PluralRules
  • Intl.RelativeTimeFormat
  • ReferenceError
  • SharedArrayBuffer
  • SyntaxError
  • Uint16Array
  • Uint32Array
  • Uint8ClampedArray
  • WebAssembly
  • decodeURI()
  • decodeURIComponent()
  • encodeURI()
  • encodeURIComponent()
  • parseFloat()
  • Array comprehensions
  • Bitwise operators
  • Comma operator
  • Comparison operators
  • Conditional (ternary) operator
  • Destructuring assignment
  • Expression closures
  • Generator comprehensions
  • Grouping operator
  • Legacy generator function expression
  • Logical operators
  • Object initializer
  • Operator precedence
  • (currently at stage 1) pipes the value of an expression into a function. This allows the creation of chained function calls in a readable manner. The result is syntactic sugar in which a function call with a single argument can be written like this:">Pipeline operator
  • Property accessors
  • Spread syntax
  • async function expression
  • class expression
  • delete operator
  • function expression
  • function* expression
  • in operator
  • new operator
  • void operator
  • Legacy generator function
  • async function
  • for await...of
  • for each...in
  • function declaration
  • import.meta
  • try...catch
  • Arrow functions
  • Default parameters
  • Method definitions
  • Rest parameters
  • The arguments object
  • constructor
  • element loaded from a different domain for which you violated the same-origin policy.">Error: Permission denied to access property "x"
  • InternalError: too much recursion
  • RangeError: argument is not a valid code point
  • RangeError: invalid array length
  • RangeError: invalid date
  • RangeError: precision is out of range
  • RangeError: radix must be an integer
  • RangeError: repeat count must be less than infinity
  • RangeError: repeat count must be non-negative
  • ReferenceError: "x" is not defined
  • ReferenceError: assignment to undeclared variable "x"
  • ReferenceError: can't access lexical declaration`X' before initialization
  • ReferenceError: deprecated caller or arguments usage
  • ReferenceError: invalid assignment left-hand side
  • ReferenceError: reference to undefined property "x"
  • SyntaxError: "0"-prefixed octal literals and octal escape seq. are deprecated
  • SyntaxError: "use strict" not allowed in function with non-simple parameters
  • SyntaxError: "x" is a reserved identifier
  • SyntaxError: JSON.parse: bad parsing
  • SyntaxError: Malformed formal parameter
  • SyntaxError: Unexpected token
  • SyntaxError: Using //@ to indicate sourceURL pragmas is deprecated. Use //# instead
  • SyntaxError: a declaration in the head of a for-of loop can't have an initializer
  • SyntaxError: applying the 'delete' operator to an unqualified name is deprecated
  • SyntaxError: for-in loop head declarations may not have initializers
  • SyntaxError: function statement requires a name
  • SyntaxError: identifier starts immediately after numeric literal
  • SyntaxError: illegal character
  • SyntaxError: invalid regular expression flag "x"
  • SyntaxError: missing ) after argument list
  • SyntaxError: missing ) after condition
  • SyntaxError: missing : after property id
  • SyntaxError: missing ; before statement
  • SyntaxError: missing = in const declaration
  • SyntaxError: missing ] after element list
  • SyntaxError: missing formal parameter
  • SyntaxError: missing name after . operator
  • SyntaxError: missing variable name
  • SyntaxError: missing } after function body
  • SyntaxError: missing } after property list
  • SyntaxError: redeclaration of formal parameter "x"
  • SyntaxError: return not in function
  • SyntaxError: test for equality (==) mistyped as assignment (=)?
  • SyntaxError: unterminated string literal
  • TypeError: "x" has no properties
  • TypeError: "x" is (not) "y"
  • TypeError: "x" is not a constructor
  • TypeError: "x" is not a function
  • TypeError: "x" is not a non-null object
  • TypeError: "x" is read-only
  • TypeError: 'x' is not iterable
  • TypeError: More arguments needed
  • TypeError: Reduce of empty array with no initial value
  • TypeError: can't access dead object
  • TypeError: can't access property "x" of "y"
  • TypeError: can't assign to property "x" on "y": not an object
  • TypeError: can't define property "x": "obj" is not extensible
  • TypeError: can't delete non-configurable array element
  • TypeError: can't redefine non-configurable property "x"
  • TypeError: cannot use 'in' operator to search for 'x' in 'y'
  • TypeError: cyclic object value
  • TypeError: invalid 'instanceof' operand 'x'
  • TypeError: invalid Array.prototype.sort argument
  • TypeError: invalid arguments
  • TypeError: invalid assignment to const "x"
  • TypeError: property "x" is non-configurable and can't be deleted
  • TypeError: setting getter-only property "x"
  • TypeError: variable "x" redeclares argument
  • URIError: malformed URI sequence
  • Warning: -file- is being assigned a //# sourceMappingURL, but already has one
  • Warning: 08/09 is not a legal ECMA-262 octal constant
  • Warning: Date.prototype.toLocaleFormat is deprecated
  • Warning: JavaScript 1.6's for-each-in loops are deprecated
  • Warning: String.x is deprecated; use String.prototype.x instead
  • Warning: expression closures are deprecated
  • Warning: unreachable code after return statement
  • X.prototype.y called on incompatible type
  • JavaScript technologies overview
  • Lexical grammar
  • Enumerability and ownership of properties
  • Iteration protocols
  • Transitioning to strict mode
  • Template literals
  • Deprecated features
  • ECMAScript 2015 support in Mozilla
  • ECMAScript 5 support in Mozilla
  • Firefox JavaScript changelog
  • New in JavaScript 1.1
  • New in JavaScript 1.2
  • New in JavaScript 1.3
  • New in JavaScript 1.4
  • New in JavaScript 1.5
  • New in JavaScript 1.6
  • New in JavaScript 1.7
  • New in JavaScript 1.8
  • New in JavaScript 1.8.1
  • New in JavaScript 1.8.5
  • Documentation:
  • All pages index
  • Methods index
  • Properties index
  • Pages tagged "JavaScript"
  • JavaScript doc status
  • The MDN project

Learn the best of web development

Get the latest and greatest from MDN delivered straight to your inbox.

Thanks! Please check your inbox to confirm your subscription.

If you haven’t previously confirmed a subscription to a Mozilla-related newsletter you may have to do so. Please check your inbox or your spam filter for an email from us.

  • précédent |
  • Python »
  • 3.14.0a0 Documentation »
  • La bibliothèque standard »
  • Services de Manipulation de Texte »
  • re --- Regular expression operations
  • Theme Auto Light Dark |

re --- Regular expression operations ¶

Code source : Lib/re/

Ce module fournit des opérations sur les expressions rationnelles similaires à celles que l'on trouve dans Perl.

Both patterns and strings to be searched can be Unicode strings ( str ) as well as 8-bit strings ( bytes ). However, Unicode strings and 8-bit strings cannot be mixed: that is, you cannot match a Unicode string with a bytes pattern or vice-versa; similarly, when asking for a substitution, the replacement string must be of the same type as both the pattern and the search string.

Regular expressions use the backslash character ( '\' ) to indicate special forms or to allow special characters to be used without invoking their special meaning. This collides with Python's usage of the same character for the same purpose in string literals; for example, to match a literal backslash, one might have to write '\\\\' as the pattern string, because the regular expression must be \\ , and each backslash must be expressed as \\ inside a regular Python string literal. Also, please note that any invalid escape sequences in Python's usage of the backslash in string literals now generate a SyntaxWarning and in the future this will become a SyntaxError . This behaviour will happen even if it is a valid escape sequence for a regular expression.

La solution est d'utiliser la notation des chaînes brutes en Python pour les expressions rationnelles ; Les backslashs ne provoquent aucun traitement spécifique dans les chaînes littérales préfixées par 'r' . Ainsi, r"\n" est une chaîne de deux caractères contenant '\' et 'n' , tandis que "\n" est une chaîne contenant un unique caractère : un saut de ligne. Généralement, les motifs seront exprimés en Python à l'aide de chaînes brutes.

Il est important de noter que la plupart des opérations sur les expressions rationnelles sont disponibles comme fonctions au niveau du module et comme méthodes des expressions rationnelles compilées . Les fonctions sont des raccourcis qui ne vous obligent pas à d'abord compiler un objet regex , mais auxquelles manquent certains paramètres de configuration fine.

The third-party regex module, which has an API compatible with the standard library re module, but offers additional functionality and a more thorough Unicode support.

Syntaxe des expressions rationnelles ¶

Une expression rationnelle ( regular expression ou RE ) spécifie un ensemble de chaînes de caractères qui lui correspondent ; les fonctions de ce module vous permettent de vérifier si une chaîne particulière correspond à une expression rationnelle donnée (ou si un expression rationnelle donnée correspond à une chaîne particulière, ce qui revient à la même chose).

Les expressions rationnelles peuvent être concaténées pour former de nouvelles expressions : si A et B sont deux expressions rationnelles, alors AB est aussi une expression rationnelle. En général, si une chaîne p valide A et qu'une autre chaîne q valide B , la chaîne pq validera AB. Cela est vrai tant que A et B ne contiennent pas d'opérations de priorité ; de conditions de frontière entre A et B  ; ou de références vers des groupes numérotés. Ainsi, des expressions complexes peuvent facilement être construites depuis de plus simples expressions primitives comme celles décrites ici. Pour plus de détails sur la théorie et l'implémentation des expressions rationnelles, consultez le livre de Friedl [Frie09] , ou à peu près n'importe quel livre dédié à la construction de compilateurs.

Une brève explication sur le format des expressions rationnelles suit. Pour de plus amples informations et une présentation plus simple, référez-vous au Guide des expressions régulières .

Les expressions rationnelles peuvent contenir à la fois des caractères spéciaux et ordinaires. Les plus ordinaires, comme 'A' , 'a' ou '0' sont les expressions rationnelles les plus simples : elles correspondent simplement à elles-mêmes. Vous pouvez concaténer des caractères ordinaires, ainsi last correspond à la chaîne 'last' . (Dans la suite de cette section, nous écrirons les expressions rationnelles dans ce style spécifique , généralement sans guillemets, et les chaînes à tester 'entourées de simples guillemets' .)

Certains caractères, comme '|' ou '(' , sont spéciaux. Des caractères spéciaux peuvent aussi exister pour les classes de caractères ordinaires, ou affecter comment les expressions rationnelles autour d'eux seront interprétées.

Les caractères de répétition ou quantificateurs ( * , + , ? , {m,n} , etc.) ne peuvent être directement imbriqués. Cela empêche l'ambiguïté avec le suffixe modificateur non gourmand ? et avec les autres modificateurs dans d'autres implémentations. Pour appliquer une seconde répétition à une première, des parenthèses peuvent être utilisées. Par exemple, l'expression (?:a{6})* valide toutes les chaînes composées d'un nombre de caractères 'a' multiple de six.

Les caractères spéciaux sont :

(Point.) Dans le mode par défaut, il valide tout caractère à l'exception du saut de ligne. Si l'option DOTALL a été spécifiée, il valide tout caractère, saut de ligne compris.

(Accent circonflexe.) Valide le début d'une chaîne de caractères, ainsi que ce qui suit chaque saut de ligne en mode MULTILINE .

Valide la fin d'une chaîne de caractères, ou juste avant le saut de ligne à la fin de la chaîne, ainsi qu'avant chaque saut de ligne en mode MULTILINE . foo valide à la fois foo et foobar , tandis que l'expression rationnelle foo$ ne correspond qu'à 'foo' . Plus intéressant, chercher foo.$ dans 'foo1\nfoo2\n' trouve normalement 'foo2' , mais 'foo1' en mode MULTILINE  ; chercher un simple $ dans 'foo\n' trouvera deux correspondances (vides) : une juste avant le saut de ligne, et une à la fin de la chaîne.

Fait valider par l'expression rationnelle résultante 0 répétition ou plus de l'expression qui précède, avec autant de répétitions que possible. ab* validera 'a', 'ab' ou 'a' suivi de n'importe quel nombre de 'b'.

Fait valider par l'expression rationnelle résultante 1 répétition ou plus de l'expression qui précède. ab+ validera 'a' suivi de n'importe quel nombre non nul de 'b' ; cela ne validera pas la chaîne 'a'.

Fait valider par l'expression rationnelle résultante 0 ou 1 répétition de l'expression qui précède. ab? correspondra à 'a' ou 'ab'.

Les quantificateurs '*' , '+' et '?' sont tous greedy (gourmands) ; ils valident autant de texte que possible. Parfois ce comportement n'est pas désiré ; si l'expression rationnelle <.*> est testée avec la chaîne '<a> b <c>' , cela correspondra à la chaîne entière, et non juste à '<a>' . Ajouter ? derrière le quantificateur lui fait réaliser l'opération de façon non-greedy (ou minimal ) ; le moins de caractères possibles seront validés. Utiliser l'expression rationnelle <.*?> validera uniquement '<a>' .

Like the '*' , '+' , and '?' quantifiers, those where '+' is appended also match as many times as possible. However, unlike the true greedy quantifiers, these do not allow back-tracking when the expression following it fails to match. These are known as possessive quantifiers. For example, a*a will match 'aaaa' because the a* will match all 4 'a' s, but, when the final 'a' is encountered, the expression is backtracked so that in the end the a* ends up matching 3 'a' s total, and the fourth 'a' is matched by the final 'a' . However, when a*+a is used to match 'aaaa' , the a*+ will match all 4 'a' , but when the final 'a' fails to find any more characters to match, the expression cannot be backtracked and will thus fail to match. x*+ , x++ and x?+ are equivalent to (?>x*) , (?>x+) and (?>x?) correspondingly.

Added in version 3.11.

Spécifie qu'exactement m copies de l'expression rationnelle qui précède devront être validées ; un nombre plus faible de correspondances empêche l'expression entière de correspondre. Par exemple, a{6} correspondra exactement à six caractères 'a' , mais pas à cinq.

Fait valider par l'expression rationnelle résultante entre m et n répétitions de l'expression qui précède, cherchant à en valider le plus possible. Par exemple, a{3,5} validera entre 3 et 5 caractères 'a' . Omettre m revient à spécifier 0 comme borne inférieure, et omettre n à avoir une borne supérieure infinie. Par exemple, a{4,}b correspondra à 'aaaab' ou à un millier de caractères 'a' suivis d'un 'b' , mais pas à 'aaab' . La virgule ne doit pas être omise, auquel cas le modificateur serait confondu avec la forme décrite précédemment.

Fait valider l'expression rationnelle résultante entre m et n répétitions de l'expression qui précède, cherchant à en valider le moins possible. Il s'agit de la version non gourmande du précédent quantificateur. Par exemple, dans la chaîne de 6 caractères 'aaaaaa' , a{3,5} trouvera 5 caractères 'a' , alors que a{3,5}? n'en trouvera que 3.

Causes the resulting RE to match from m to n repetitions of the preceding RE, attempting to match as many repetitions as possible without establishing any backtracking points. This is the possessive version of the quantifier above. For example, on the 6-character string 'aaaaaa' , a{3,5}+aa attempt to match 5 'a' characters, then, requiring 2 more 'a' s, will need more characters than available and thus fail, while a{3,5}aa will match with a{3,5} capturing 5, then 4 'a' s by backtracking and then the final 2 'a' s are matched by the final aa in the pattern. x{m,n}+ is equivalent to (?>x{m,n}) .

Échappe les caractères spéciaux (permettant d’identifier des caractères comme '*' , '?' et autres) ou signale une séquence spéciale ; les séquences spéciales sont décrites ci-dessous.

Si vous n'utilisez pas de chaînes brutes pour exprimer le motif, souvenez-vous que Python utilise aussi le backslash comme une séquence d'échappement dans les chaînes littérales ; si la séquence d'échappement n'est pas reconnue par l'interpréteur Python, le backslash et les caractères qui le suivent sont inclus dans la chaîne renvoyée. Cependant, si Python reconnait la séquence, le backslash doit être doublé (pour ne plus être reconnu). C'est assez compliqué et difficile à comprendre, c'est pourquoi il est hautement recommandé d'utiliser des chaînes brutes pour tout sauf les expressions les plus simples.

Utilisé pour indiquer un ensemble de caractères. Dans un ensemble :

Les caractères peuvent être listés individuellement, e.g. [amk] correspondra à 'a' , 'm' ou 'k' .

Des intervalles de caractères peuvent être indiqués en donnant deux caractères et les séparant par un '-' , par exemple [a-z] correspondra à toute lettre minuscule ASCII , [0-5][0-9] à tous nombres de deux chiffres entre 00 et 59 , et [0-9A-Fa-f] correspondra à n'importe quel chiffre hexadécimal. Si '-' est échappé ( [a\-z] ) ou s'il est placé comme premier ou dernier caractère (e.g. [-a] ou [a-] ), il correspondra à un '-' littéral.

Les caractères spéciaux perdent leur sens à l'intérieur des ensembles. Par exemple, [(+*)] validera chacun des caractères littéraux '(' , '+' , '*' ou ')' .

Character classes such as \w or \S (defined below) are also accepted inside a set, although the characters they match depend on the flags used.

Les caractères qui ne sont pas dans un intervalle peuvent être trouvés avec l'ensemble complémentaire ( complementing ). Si le premier caractère de l'ensemble est '^' , tous les caractères qui ne sont pas dans l'ensemble seront validés. Par exemple, [^5] correspondra à tout caractère autre que '5' et [^^] validera n'importe quel caractère excepté '^' . ^ n'a pas de sens particulier s'il n'est pas le premier caractère de l'ensemble.

To match a literal ']' inside a set, precede it with a backslash, or place it at the beginning of the set. For example, both [()[\]{}] and []()[{}] will match a right bracket, as well as left bracket, braces, and parentheses.

La gestion des ensembles inclus l'un dans l'autre et les opérations d'ensemble comme dans Unicode Technical Standard #18 pourrait être ajoutée par la suite. Ceci changerait la syntaxe, donc pour faciliter ce changement, une exception FutureWarning sera levée dans les cas ambigus pour le moment. Ceci inclut les ensembles commençant avec le caractère '[' ou contenant les séquences de caractères '--' , '&&' , '~~' et '||' . Pour éviter un message d'avertissement, échapper les séquences avec le caractère antislash ( "\" ).

Modifié dans la version 3.7: L'exception FutureWarning est levée si un ensemble de caractères contient une construction dont la sémantique changera dans le futur.

A|B , où A et B peuvent être deux expressions rationnelles arbitraires, crée une expression rationnelle qui validera soit A soit B . Un nombre arbitraire d'expressions peuvent être séparées de cette façon par des '|' . Cela peut aussi être utilisé au sein de groupes (voir ci-dessous). Quand une chaîne cible est analysée, les expressions séparées par '|' sont essayées de la gauche vers la droite. Quand un motif correspond complètement, cette branche est acceptée. Cela signifie qu'une fois que A correspond, B ne sera pas testée plus loin, même si elle pourrait provoquer une plus ample correspondance. En d'autres termes, l'opérateur '|' n'est jamais gourmand. Pour valider un '|' littéral, utilisez \| , ou enveloppez-le dans une classe de caractères, comme [|] .

Valide n'importe quelle expression rationnelle comprise entre les parenthèses, et indique le début et la fin d'un groupe ; le contenu d'un groupe peut être récupéré après qu'une analyse a été effectuée et peut être réutilisé plus loin dans la chaîne avec une séquence spéciale \number , décrite ci-dessous. Pour écrire des '(' ou ')' littéraux, utilisez \( ou \) , ou enveloppez-les dans une classe de caractères : [(] , [)] .

Il s'agit d'une notation pour les extensions (un '?' suivant une '(' n'a pas de sens autrement). Le premier caractère après le '?' détermine quel sens donner à l'expression. Les extensions ne créent généralement pas de nouveaux groupes ; (?P<name>...) est la seule exception à la règle. Retrouvez ci-dessous la liste des extensions actuellement supportées.

(One or more letters from the set 'a' , 'i' , 'L' , 'm' , 's' , 'u' , 'x' .) The group matches the empty string; the letters set the corresponding flags for the entire regular expression:

re.A (ASCII-only matching)

re.I (ignore case)

re.L (locale dependent)

re.M (multi-line)

re.S (dot matches all)

re.U (Unicode matching)

re.X (verbose)

(The flags are described in Contenu du module .) This is useful if you wish to include the flags as part of the regular expression, instead of passing a flag argument to the re.compile() function. Flags should be used first in the expression string.

Modifié dans la version 3.11: Cette construction ne peut être utilisée qu'au début d'une chaîne de caractères.

Une version sans capture des parenthèses habituelles. Valide n'importe quelle expression rationnelle à l'intérieur des parenthèses, mais la sous-chaîne correspondant au groupe ne peut pas être récupérée après l'analyse ou être référencée plus loin dans le motif.

(Zero or more letters from the set 'a' , 'i' , 'L' , 'm' , 's' , 'u' , 'x' , optionally followed by '-' followed by one or more letters from the 'i' , 'm' , 's' , 'x' .) The letters set or remove the corresponding flags for the part of the expression:

(The flags are described in Contenu du module .)

The letters 'a' , 'L' and 'u' are mutually exclusive when used as inline flags, so they can't be combined or follow '-' . Instead, when one of them appears in an inline group, it overrides the matching mode in the enclosing group. In Unicode patterns (?a:...) switches to ASCII-only matching, and (?u:...) switches to Unicode matching (default). In bytes patterns (?L:...) switches to locale dependent matching, and (?a:...) switches to ASCII-only matching (default). This override is only in effect for the narrow inline group, and the original matching mode is restored outside of the group.

Added in version 3.6.

Modifié dans la version 3.7: Les lettres 'a' , 'L' et 'u' peuvent aussi être utilisées dans un groupe.

Attempts to match ... as if it was a separate regular expression, and if successful, continues to match the rest of the pattern following it. If the subsequent pattern fails to match, the stack can only be unwound to a point before the (?>...) because once exited, the expression, known as an atomic group , has thrown away all stack points within itself. Thus, (?>.*). would never match anything because first the .* would match all characters possible, then, having nothing left to match, the final . would fail to match. Since there are no stack points saved in the Atomic Group, and there is no stack point before it, the entire expression would thus fail to match.

Similar to regular parentheses, but the substring matched by the group is accessible via the symbolic group name name . Group names must be valid Python identifiers, and in bytes patterns they can only contain bytes in the ASCII range. Each group name must be defined only once within a regular expression. A symbolic group is also a numbered group, just as if the group were not named.

Les groupes nommés peuvent être référencés dans trois contextes. Si le motif est (?P<quote>['"]).*?(?P=quote) (c.-à-d. correspondant à une chaîne entourée de guillemets simples ou doubles) :

Modifié dans la version 3.12: In bytes patterns, group name can only contain bytes in the ASCII range ( b'\x00' - b'\x7f' ).

Une référence arrière à un groupe nommé ; elle correspond à n'importe quel texte validé plus tôt par le groupe nommé name .

Un commentaire ; le contenu des parenthèses est simplement ignoré.

Valide si ... valide la suite, mais ne consomme rien de la chaîne. On appelle cela une assertion lookahead . Par exemple, Isaac (?=Asimov) correspondra à la chaîne 'Isaac ' seulement si elle est suivie par 'Asimov' .

Valide si ... ne valide pas la suite. C'est une assertion negative lookahead . Par exemple, Isaac (?!Asimov) correspondra à la chaîne 'Isaac ' seulement si elle n'est pas suivie par 'Asimov' .

Valide si la position courante dans la chaîne est précédée par une correspondance sur ... qui se termine à la position courante. On appelle cela une positive lookbehind assertion . (?<=abc)def cherchera une correspondance dans 'abcdef' , puisque le lookbehind* mettra de côté 3 caractères et vérifiera que le motif contenu correspond. Le motif ne devra correspondre qu'à des chaînes de taille fixe, cela veut dire que abc ou a|b sont autorisées, mais pas a* ou a{3,4} . Notez que les motifs qui commencent par des assertions lookbehind positives ne peuvent pas correspondre au début de la chaîne analysée ; vous préférerez sûrement utiliser la fonction search() plutôt que la fonction match()  :

Cet exemple recherche un mot suivi d'un trait d'union :

Modifié dans la version 3.5: Ajout du support des références aux groupes de taille fixe.

Valide si la position courante dans la chaîne n'est pas précédée par une correspondance sur ... . On appelle cela une negative lookbehind assertion . À la manière des assertions lookbehind positives, le motif contenu ne peut que correspondre à des chaînes de taille fixe. Les motifs débutant par une assertion lookbehind négative peuvent correspondre au début de la chaîne analysée.

Essaiera de faire la correspondance avec yes-pattern si le groupe indiqué par id ou name existe, et avec no-pattern s'il n'existe pas. no-pattern est optionnel et peut être omis. Par exemple, (<)?(\w+@\w+(?:\.\w+)+)(?(1)>|$) est un motif simpliste pour identifier une adresse courriel, qui validera '<user@host.com>' ainsi que 'user@host.com' mais pas '<user@host.com' ni 'user@host.com>' .

Modifié dans la version 3.12: Group id can only contain ASCII digits. In bytes patterns, group name can only contain bytes in the ASCII range ( b'\x00' - b'\x7f' ).

Les séquences spéciales sont composées de '\' et d'un caractère de la liste qui suit. Si le caractère ordinaire n'est pas un chiffre ASCII ou une lettre ASCII , alors l'expression rationnelle résultante validera le second caractère de la séquence. Par exemple, \$ correspond au caractère '$' .

Correspond au contenu du groupe du même nombre. Les groupes sont numérotés à partir de 1. Par exemple, (.+) \1 correspond à 'the the' ou '55 55' , mais pas à 'thethe' (notez l'espace après le groupe). Cette séquence spéciale ne peut être utilisée que pour faire référence aux 99 premiers groupes. Si le premier chiffre de number est 0, ou si number est un nombre octal de 3 chiffres, il ne sera pas interprété comme une référence à un groupe, mais comme le caractère à la valeur octale number . À l'intérieur des '[' et ']' d'une classe de caractères, tous les échappements numériques sont traités comme des caractères.

Correspond uniquement au début d'une chaîne de caractères.

Matches the empty string, but only at the beginning or end of a word. A word is defined as a sequence of word characters. Note that formally, \b is defined as the boundary between a \w and a \W character (or vice versa), or between \w and the beginning or end of the string. This means that r'\bat\b' matches 'at' , 'at.' , '(at)' , and 'as at ay' but not 'attempt' or 'atlas' .

The default word characters in Unicode (str) patterns are Unicode alphanumerics and the underscore, but this can be changed by using the ASCII flag. Word boundaries are determined by the current locale if the LOCALE flag is used.

Inside a character range, \b represents the backspace character, for compatibility with Python's string literals.

Matches the empty string, but only when it is not at the beginning or end of a word. This means that r'at\B' matches 'athens' , 'atom' , 'attorney' , but not 'at' , 'at.' , or 'at!' . \B is the opposite of \b , so word characters in Unicode (str) patterns are Unicode alphanumerics or the underscore, although this can be changed by using the ASCII flag. Word boundaries are determined by the current locale if the LOCALE flag is used.

Matches any Unicode decimal digit (that is, any character in Unicode character category [Nd] ). This includes [0-9] , and also many other digit characters.

Matches [0-9] if the ASCII flag is used.

Matches any decimal digit in the ASCII character set; this is equivalent to [0-9] .

Matches any character which is not a decimal digit. This is the opposite of \d .

Matches [^0-9] if the ASCII flag is used.

Matches Unicode whitespace characters (which includes [ \t\n\r\f\v] , and also many other characters, for example the non-breaking spaces mandated by typography rules in many languages).

Matches [ \t\n\r\f\v] if the ASCII flag is used.

Valide les caractères considérés comme des espacements dans la table ASCII ; équivalent à [ \t\n\r\f\v] .

Matches any character which is not a whitespace character. This is the opposite of \s .

Matches [^ \t\n\r\f\v] if the ASCII flag is used.

Matches Unicode word characters; this includes all Unicode alphanumeric characters (as defined by str.isalnum() ), as well as the underscore ( _ ).

Matches [a-zA-Z0-9_] if the ASCII flag is used.

Matches characters considered alphanumeric in the ASCII character set; this is equivalent to [a-zA-Z0-9_] . If the LOCALE flag is used, matches characters considered alphanumeric in the current locale and the underscore.

Matches any character which is not a word character. This is the opposite of \w . By default, matches non-underscore ( _ ) characters for which str.isalnum() returns False .

Matches [^a-zA-Z0-9_] if the ASCII flag is used.

If the LOCALE flag is used, matches characters which are neither alphanumeric in the current locale nor the underscore.

Correspond uniquement à la fin d'une chaîne de caractères.

Most of the escape sequences supported by Python string literals are also accepted by the regular expression parser:

(Notez que \b est utilisé pour représenter les bornes d'un mot, et signifie «  retour arrière  » uniquement à l'intérieur d'une classe de caractères)

'\u' , '\U' , and '\N' escape sequences are only recognized in Unicode (str) patterns. In bytes patterns they are errors. Unknown escapes of ASCII letters are reserved for future use and treated as errors.

Les séquences octales d'échappement sont incluses dans une forme limitée. Si le premier chiffre est un 0, ou s'il y a trois chiffres octaux, la séquence est considérée comme octale. Autrement, il s'agit d'une référence vers un groupe. Comme pour les chaînes littérales, les séquences octales ne font jamais plus de 3 caractères de long.

Modifié dans la version 3.3: Les séquences d'échappement '\u' et '\U' ont été ajoutées.

Modifié dans la version 3.6: Les séquences inconnues composées de '\' et d'une lettre ASCII sont maintenant des erreurs.

Modifié dans la version 3.8: The '\N{ name }' escape sequence has been added. As in string literals, it expands to the named Unicode character (e.g. '\N{EM DASH}' ).

Contenu du module ¶

Le module définit plusieurs fonctions, constantes, et une exception. Certaines fonctions sont des versions simplifiées des méthodes plus complètes des expressions rationnelles compilées. La plupart des applications non triviales utilisent toujours la version compilée.

Modifié dans la version 3.6: Les constantes d'options sont maintenant des instances de RegexFlag , sous-classe de enum.IntFlag .

An enum.IntFlag class containing the regex options listed below.

Added in version 3.11: - added to __all__

Make \w , \W , \b , \B , \d , \D , \s and \S perform ASCII-only matching instead of full Unicode matching. This is only meaningful for Unicode (str) patterns, and is ignored for bytes patterns.

Corresponds to the inline flag (?a) .

The U flag still exists for backward compatibility, but is redundant in Python 3 since matches are Unicode by default for str patterns, and Unicode matching isn't allowed for bytes patterns. UNICODE and the inline flag (?u) are similarly redundant.

Display debug information about compiled expression.

No corresponding inline flag.

Perform case-insensitive matching; expressions like [A-Z] will also match lowercase letters. Full Unicode matching (such as Ü matching ü ) also works unless the ASCII flag is used to disable non-ASCII matches. The current locale does not change the effect of this flag unless the LOCALE flag is also used.

Corresponds to the inline flag (?i) .

Note that when the Unicode patterns [a-z] or [A-Z] are used in combination with the IGNORECASE flag, they will match the 52 ASCII letters and 4 additional non-ASCII letters: 'İ' (U+0130, Latin capital letter I with dot above), 'ı' (U+0131, Latin small letter dotless i), 'ſ' (U+017F, Latin small letter long s) and 'K' (U+212A, Kelvin sign). If the ASCII flag is used, only letters 'a' to 'z' and 'A' to 'Z' are matched.

Make \w , \W , \b , \B and case-insensitive matching dependent on the current locale. This flag can be used only with bytes patterns.

Corresponds to the inline flag (?L) .

Avertissement

This flag is discouraged; consider Unicode matching instead. The locale mechanism is very unreliable as it only handles one "culture" at a time and only works with 8-bit locales. Unicode matching is enabled by default for Unicode (str) patterns and it is able to handle different locales and languages.

Modifié dans la version 3.6: LOCALE can be used only with bytes patterns and is not compatible with ASCII .

Modifié dans la version 3.7: Compiled regular expression objects with the LOCALE flag no longer depend on the locale at compile time. Only the locale at matching time affects the result of matching.

When specified, the pattern character '^' matches at the beginning of the string and at the beginning of each line (immediately following each newline); and the pattern character '$' matches at the end of the string and at the end of each line (immediately preceding each newline). By default, '^' matches only at the beginning of the string, and '$' only at the end of the string and immediately before the newline (if any) at the end of the string.

Corresponds to the inline flag (?m) .

Indicates no flag being applied, the value is 0 . This flag may be used as a default value for a function keyword argument or as a base value that will be conditionally ORed with other flags. Example of use as a default value:

Make the '.' special character match any character at all, including a newline; without this flag, '.' will match anything except a newline.

Corresponds to the inline flag (?s) .

In Python 3, Unicode characters are matched by default for str patterns. This flag is therefore redundant with no effect and is only kept for backward compatibility.

See ASCII to restrict matching to ASCII characters instead.

Cette option vous autorise à écrire des expressions rationnelles qui présentent mieux et sont plus lisibles en vous permettant de séparer visuellement les sections logiques du motif et d'ajouter des commentaires. Les caractères d'espacement à l'intérieur du motif sont ignorés, sauf à l'intérieur des classes de caractères ou quand ils sont précédés d'un backslash non échappé, ou dans des séquences comme *? , (?: ou (?P<...> .Par exemple, (? : et * ? ne sont pas autorisés. Quand une ligne contient un # qui n'est ni dans une classe de caractères, ni précédé d'un backslash non échappé, tous les caractères depuis le # le plus à gauche jusqu'à la fin de la ligne sont ignorés.

Cela signifie que les deux expressions rationnelles suivantes qui valident un nombre décimal sont fonctionnellement égales :

Correspond à l'option de groupe (?x) .

Functions ¶

Compile un motif vers une expression rationnelle compilée, dont les méthodes match() et search() , décrites ci-dessous, peuvent être utilisées pour analyser des textes.

The expression's behaviour can be modified by specifying a flags value. Values can be any of the flags variables, combined using bitwise OR (the | operator).

La séquence

est équivalente à

mais utiliser re.compile() et sauvegarder l'expression rationnelle renvoyée pour la réutiliser est plus efficace quand l'expression est amenée à être utilisée plusieurs fois dans un même programme.

Les versions compilées des motifs les plus récents passés à re.compile() et autres fonctions d'analyse du module sont mises en cache, ainsi les programmes qui n'utilisent que quelques expressions rationnelles en même temps n'ont pas à s'inquiéter de la compilation de ces expressions.

Scan through string looking for the first location where the regular expression pattern produces a match, and return a corresponding Match . Return None if no position in the string matches the pattern; note that this is different from finding a zero-length match at some point in the string.

If zero or more characters at the beginning of string match the regular expression pattern , return a corresponding Match . Return None if the string does not match the pattern; note that this is different from a zero-length match.

Notez que même en mode MULTILINE , re.match() ne validera qu'au début de la chaîne et non au début de chaque ligne.

Si vous voulez trouver une correspondance n'importe où dans string , utilisez plutôt search() (voir aussi Comparaison de search() et match() ).

If the whole string matches the regular expression pattern , return a corresponding Match . Return None if the string does not match the pattern; note that this is different from a zero-length match.

Added in version 3.4.

Sépare string selon les occurrences de pattern . Si des parenthèses de capture sont utilisées dans pattern , alors les textes des groupes du motif sont aussi renvoyés comme éléments de la liste résultante. Si maxsplit est différent de zéro, il ne pourra y avoir plus de maxsplit séparations, et le reste de la chaîne sera renvoyé comme le dernier élément de la liste.

S'il y a des groupes de capture dans le séparateur et qu'ils trouvent une correspondance au début de la chaîne, le résultat commencera par une chaîne vide. La même chose se produit pour la fin de la chaîne :

De cette manière, les séparateurs sont toujours trouvés aux mêmes indices relatifs dans la liste résultante.

Les correspondances vides pour le motif scindent la chaîne de caractères seulement lorsqu'ils ne sont pas adjacents à une correspondance vide précédente.

Modifié dans la version 3.1: ajout de l'argument optionnel flags .

Modifié dans la version 3.7: Gestion du découpage avec un motif qui pourrait correspondre à une chaine de caractère vide.

Obsolète depuis la version 3.13: Passing maxsplit and flags as positional arguments is deprecated. In future Python versions they will be keyword-only parameters .

Renvoie toutes les correspondances, sans chevauchements, entre le motif pattern et la chaîne string , comme une liste de chaînes ou de n -uplets. La chaîne string est examinée de gauche à droite, et les correspondances sont données dans cet ordre. Le résultat peut contenir des correspondances vides.

Le type du résultat dépend du nombre de groupes capturants dans le motif. S'il n'y en a pas, le résultat est une liste de sous-chaînes de caractères qui correspondent au motif. S'il y a exactement un groupe, le résultat est une liste constituée des sous-chaînes qui correspondaient à ce groupe pour chaque correspondance entre le motif et la chaîne. S'il y a plusieurs groupes, le résultat est formé de n -uplets avec les sous-chaînes correspondant aux différents groupes.

Modifié dans la version 3.7: Les correspondances non vides peuvent maintenant démarrer juste après une correspondance vide précédente.

Return an iterator yielding Match objects over all non-overlapping matches for the RE pattern in string . The string is scanned left-to-right, and matches are returned in the order found. Empty matches are included in the result.

Renvoie la chaîne obtenue en remplaçant les occurrences (sans chevauchement) les plus à gauche de pattern dans string par le remplacement repl . Si le motif n'est pas trouvé, string est renvoyée inchangée. repl peut être une chaîne de caractères ou une fonction ; si c'est une chaîne, toutes les séquences d'échappement qu'elle contient sont traduites. Ainsi, \n est convertie en un simple saut de ligne, \r en un retour chariot, et ainsi de suite. Les échappements inconnus de lettres ASCII sont réservés pour une utilisation future et sont considérés comme des erreurs. Les autres échappements tels que \& sont laissés intacts. Les références arrières, telles que \6 , sont remplacées par la sous-chaîne correspondant au groupe 6 dans le motif. Par exemple :

If repl is a function, it is called for every non-overlapping occurrence of pattern . The function takes a single Match argument, and returns the replacement string. For example:

The pattern may be a string or a Pattern .

L'argument optionnel count est le nombre maximum d'occurrences du motif à remplacer : count ne doit pas être un nombre négatif. Si omis ou nul, toutes les occurrences seront remplacées. Les correspondances vides avec le motif sont remplacées uniquement quand elles ne sont pas adjacentes à une précédente correspondance, ainsi sub('x*', '-', 'abxd') renvoie '-a-b--d-' .

Dans les arguments repl de type string , en plus des séquences d'échappement et références arrières décrites au-dessus, \g<name> utilisera la sous-chaîne correspondant au groupe nommé name , comme défini par la syntaxe (?P<name>...) . \g<number> utilise le groupe numéroté associé ; \g<2> est ainsi équivalent à \2 , mais n'est pas ambigu dans un remplacement tel que \g<2>0 , \20 serait interprété comme une référence au groupe 20, et non une référence au groupe 2 suivie par un caractère littéral '0' . La référence arrière \g<0> est remplacée par la sous-chaîne entière validée par l'expression rationnelle.

Modifié dans la version 3.5: Les groupes sans correspondance sont remplacés par une chaîne vide.

Modifié dans la version 3.6: Les séquences d'échappement inconnues dans pattern formées par '\' et une lettre ASCII sont maintenant des erreurs.

Modifié dans la version 3.7: Unknown escapes in repl consisting of '\' and an ASCII letter now are errors. Empty matches for the pattern are replaced when adjacent to a previous non-empty match.

Modifié dans la version 3.12: Group id can only contain ASCII digits. In bytes replacement strings, group name can only contain bytes in the ASCII range ( b'\x00' - b'\x7f' ).

Obsolète depuis la version 3.13: Passing count and flags as positional arguments is deprecated. In future Python versions they will be keyword-only parameters .

Réalise la même opération que sub() , mais renvoie une paire (nouvelle_chaîne, nombre_de_substitutions_réalisées) .

Échappe tous les caractères spéciaux de pattern . Cela est utile si vous voulez valider une quelconque chaîne littérale qui pourrait contenir des métacaractères d'expressions rationnelles. Par exemple :

Cette fonction ne doit pas être utilisée pour la chaîne de remplacement dans sub() et subn() , seuls les antislashs devraient être échappés. Par exemple :

Modifié dans la version 3.3: Le caractère '_' n'est plus échappé.

Modifié dans la version 3.7: Seuls les caractères qui peuvent avoir une signification spéciale dans une expression rationnelle sont échappés. De ce fait, '!' , '"' , '%' , "'" , ',' , '/' , ':' , ';' , '<' , '=' , '>' , '@' , et "`" ne sont plus échappés.

Vide le cache d'expressions rationnelles.

Exceptions ¶

Exception raised when a string passed to one of the functions here is not a valid regular expression (for example, it might contain unmatched parentheses) or when some other error occurs during compilation or matching. It is never an error if a string contains no match for a pattern. The PatternError instance has the following additional attributes:

Le message d'erreur non formaté.

Le motif d'expression rationnelle.

L'index dans pattern où la compilation a échoué (peut valoir None ).

La ligne correspondant à pos (peut valoir None ).

La colonne correspondant à pos (peut valoir None ).

Modifié dans la version 3.5: Ajout des attributs additionnels.

Modifié dans la version 3.13: PatternError was originally named error ; the latter is kept as an alias for backward compatibility.

Objets d'expressions rationnelles ¶

Compiled regular expression object returned by re.compile() .

Modifié dans la version 3.9: re.Pattern supports [] to indicate a Unicode (str) or bytes pattern. See Type Alias générique .

Scan through string looking for the first location where this regular expression produces a match, and return a corresponding Match . Return None if no position in the string matches the pattern; note that this is different from finding a zero-length match at some point in the string.

Le second paramètre pos (optionnel) donne l'index dans la chaîne où la recherche doit débuter ; il vaut 0 par défaut. Cela n'est pas complètement équivalent à un slicing sur la chaîne ; le caractère de motif '^' correspond au début réel de la chaîne et aux positions juste après un saut de ligne, mais pas nécessairement à l'index où la recherche commence.

Le paramètre optionnel endpos limite la longueur sur laquelle la chaîne sera analysée ; ce sera comme si la chaîne faisait endpos caractères de long, donc uniquement les caractères de pos à endpos - 1 seront analysés pour trouver une correspondance. Si endpos est inférieur à pos , aucune correspondance ne sera trouvée ; dit autrement, avec rx une expression rationnelle compilée, rx.search(string, 0, 50) est équivalent à rx.search(string[:50], 0) .

If zero or more characters at the beginning of string match this regular expression, return a corresponding Match . Return None if the string does not match the pattern; note that this is different from a zero-length match.

Les paramètres optionnels pos et endpos ont le même sens que pour la méthode search() .

Si vous voulez une recherche n'importe où dans string , utilisez plutôt search() (voir aussi Comparaison de search() et match() ).

If the whole string matches this regular expression, return a corresponding Match . Return None if the string does not match the pattern; note that this is different from a zero-length match.

Identique à la fonction split() , en utilisant le motif compilé.

Similaire à la fonction findall() , en utilisant le motif compilé, mais accepte aussi des paramètres pos et endpos optionnels qui limitent la région de recherche comme pour search() .

Similaire à la fonction finditer() , en utilisant le motif compilé, mais accepte aussi des paramètres pos et endpos optionnels qui limitent la région de recherche comme pour search() .

Identique à la fonction sub() , en utilisant le motif compilé.

Identique à la fonction subn() , en utilisant le motif compilé.

The regex matching flags. This is a combination of the flags given to compile() , any (?...) inline flags in the pattern, and implicit flags such as UNICODE if the pattern is a Unicode string.

Le nombre de groupes de capture dans le motif.

Un dictionnaire associant les noms de groupes symboliques définis par (?P<id>) aux groupes numérotés. Le dictionnaire est vide si aucun groupe symbolique n'est utilisé dans le motif.

La chaîne de motif depuis laquelle l'objet motif a été compilé.

Modifié dans la version 3.7: Ajout du support des fonctions copy.copy() et copy.deepcopy() . Les expressions régulières compilées sont considérées atomiques.

Objets de correspondance ¶

Les objets de correspondance ont toujours une valeur booléenne True . Puisque match() et search() renvoient None quand il n'y a pas de correspondance, vous pouvez tester s'il y a eu correspondance avec une simple instruction if  :

Match object returned by successful match es and search es.

Modifié dans la version 3.9: re.Match supports [] to indicate a Unicode (str) or bytes match. See Type Alias générique .

Return the string obtained by doing backslash substitution on the template string template , as done by the sub() method. Escapes such as \n are converted to the appropriate characters, and numeric backreferences ( \1 , \2 ) and named backreferences ( \g<1> , \g<name> ) are replaced by the contents of the corresponding group. The backreference \g<0> will be replaced by the entire match.

Renvoie un ou plus sous-groupes de la correspondance. Si un seul argument est donné, le résultat est une chaîne simple ; s'il y a plusieurs arguments, le résultat est un n -uplet comprenant un élément par argument. Sans arguments, group1 vaut par défaut zéro (la correspondance entière est renvoyée). Si un argument groupN vaut zéro, l'élément associé sera la chaîne de correspondance entière ; s'il est dans l'intervalle fermé [1..99], c'est la correspondance avec le groupe de parenthèses associé. Si un numéro de groupe est négatif ou supérieur au nombre de groupes définis dans le motif, une exception indexError est levée. Si un groupe est contenu dans une partie du motif qui n'a aucune correspondance, l'élément associé sera None . Si un groupe est contenu dans une partie du motif qui a plusieurs correspondances, seule la dernière correspondance est renvoyée.

Si l'expression rationnelle utilise la syntaxe (?P<name>...) , les arguments groupN peuvent alors aussi être des chaînes identifiant les groupes par leurs noms. Si une chaîne donnée en argument n'est pas utilisée comme nom de groupe dans le motif, une exception IndexError est levée.

Un exemple modérément compliqué :

Les groupes nommés peuvent aussi être référencés par leur index :

Si un groupe a plusieurs correspondances, seule la dernière est accessible :

Cela est identique à m.group(g) . Cela permet un accès plus facile à un groupe individuel depuis une correspondance :

Named groups are supported as well:

Renvoie un n -uplet contenant tous les sous-groupes de la correspondance, de 1 jusqu'au nombre de groupes dans le motif. L'argument default est utilisé pour les groupes sans correspondance ; il vaut None par défaut.

Par exemple :

Si on rend la partie décimale et tout ce qui la suit optionnels, tous les groupes ne figureront pas dans la correspondance. Ces groupes sans correspondance vaudront None sauf si une autre valeur est donnée à l'argument default  :

Renvoie un dictionnaire contenant tous les sous-groupes nommés de la correspondance, accessibles par leurs noms. L'argument default est utilisé pour les groupes qui ne figurent pas dans la correspondance ; il vaut None par défaut. Par exemple :

Renvoie les indices de début et de fin de la sous-chaîne correspondant au groupe group  ; group vaut par défaut zéro (pour récupérer les indices de la correspondance complète). Renvoie -1 si group existe mais ne figure pas dans la correspondance. Pour un objet de correspondance m , et un groupe g qui y figure, la sous-chaîne correspondant au groupe g (équivalente à m.group(g) ) est

Notez que m.start(group) sera égal à m.end(group) si group correspond à une chaîne vide. Par exemple, après m = re.search('b(c?)', 'cba') , m.start(0) vaut 1, m.end(0) vaut 2, m.start(1) et m.end(1) valent tous deux 2, et m.start(2) lève une exception IndexError .

Un exemple qui supprimera remove_this d'une adresse mail :

Pour un objet de correspondance m , renvoie la paire (m.start(group), m.end(group)) . Notez que si group ne figure pas dans la correspondance, (-1, -1) est renvoyé. group vaut par défaut zéro, pour la correspondance entière.

La valeur de pos qui a été passée à la méthode search() ou match() d'un objet expression rationnelle . C'est l'index dans la chaîne à partir duquel le moteur d'expressions rationnelles recherche une correspondance.

La valeur de endpos qui a été passée à la méthode search() ou match() d'un objet expression rationnelle . C'est l'index dans la chaîne que le moteur d'expressions rationnelles ne dépassera pas.

L'index entier du dernier groupe de capture validé, ou None si aucun groupe ne correspondait. Par exemple, les expressions (a)b , ((a)(b)) et ((ab)) auront un lastindex == 1 si appliquées à la chaîne 'ab' , alors que l'expression (a)(b) aura un lastindex == 2 si appliquée à la même chaîne.

Le nom du dernier groupe capturant validé, ou None si le groupe n'a pas de nom, ou si aucun groupe ne correspondait.

L'expression rationnelle dont la méthode match() ou search() a produit cet objet de correspondance.

La chaîne passée à match() ou search() .

Modifié dans la version 3.7: Ajout du support des fonctions copy.copy() et copy.deepcopy() . Les objets correspondants sont considérés atomiques.

Exemples d'expressions rationnelles ¶

Rechercher une paire ¶.

Dans cet exemple, nous nous aidons de la fonction suivante pour afficher de manière plus jolie les objets qui correspondent :

Supposez que vous écriviez un jeu de poker où la main d'un joueur est représentée par une chaîne de 5 caractères avec chaque caractère représentant une carte, « a » pour l'as, « k » pour le roi ( king ), « q » pour la reine ( queen ), « j » pour le valet ( jack ), « t » pour 10 ( ten ), et les caractères de « 2 » à « 9 » représentant les cartes avec ces valeurs.

Pour vérifier qu'une chaîne donnée est une main valide, on pourrait faire comme suit :

La dernière main, "727ak" , contenait une paire, deux cartes de la même valeur. Pour valider cela avec une expression rationnelle, on pourrait utiliser des références arrière comme :

Pour trouver de quelle carte est composée la paire, on pourrait utiliser la méthode group() de l'objet de correspondance de la manière suivante :

Simuler scanf() ¶

Python does not currently have an equivalent to scanf() . Regular expressions are generally more powerful, though also more verbose, than scanf() format strings. The table below offers some more-or-less equivalent mappings between scanf() format tokens and regular expressions.

Pour extraire le nom de fichier et les nombres depuis une chaîne comme

you would use a scanf() format like

L'expression rationnelle équivalente serait

Comparaison de search() et match() ¶

Python offre différentes opérations primitives basées sur des expressions régulières :

re.match() cherche une correspondance uniquement au début de la chaîne de caractères

re.search() cherche une correspondance n'importe où dans la chaîne de caractères (ce que fait Perl par défaut)

re.fullmatch() cherche une correspondance avec l'intégralité de la chaîne de caractères.

Les expressions rationnelles commençant par '^' peuvent être utilisées avec search() pour restreindre la recherche au début de la chaîne :

Notez cependant qu'en mode MULTILINE , match() ne recherche qu'au début de la chaîne, alors que search() avec une expression rationnelle commençant par '^' recherchera au début de chaque ligne.

Construire un répertoire téléphonique ¶

split() découpe une chaîne en une liste délimitée par le motif donné. La méthode est inestimable pour convertir des données textuelles vers des structures de données qui peuvent être lues et modifiées par Python comme démontré dans l'exemple suivant qui crée un répertoire téléphonique.

Tout d'abord, voici l'entrée. Elle provient normalement d'un fichier, nous utilisons ici une chaîne à guillemets triples

Les entrées sont séparées par un saut de ligne ou plus. Nous convertissons maintenant la chaîne en une liste où chaque ligne non vide aura sa propre entrée :

Finalement, on sépare chaque entrée en une liste avec prénom, nom, numéro de téléphone et adresse. Nous utilisons le paramètre maxsplit de split() parce que l'adresse contient des espaces, qui sont notre motif de séparation :

Le motif :? trouve les deux points derrière le nom de famille, pour qu'ils n'apparaissent pas dans la liste résultante. Avec un maxsplit de 4 , nous pourrions séparer le numéro du nom de la rue :

Mélanger les lettres des mots ¶

sub() remplace toutes les occurrences d'un motif par une chaîne ou le résultat d'une fonction. Cet exemple le montre, en utilisant sub() avec une fonction qui mélange aléatoirement les caractères de chaque mot dans une phrase (à l'exception des premiers et derniers caractères) :

Trouver tous les adverbes ¶

findall() trouve toutes les occurrences d'un motif, pas juste la première comme le fait search() . Par exemple, si un écrivain voulait trouver tous les adverbes dans un texte, il devrait utiliser findall() de la manière suivante :

Trouver tous les adverbes et leurs positions ¶

If one wants more information about all matches of a pattern than the matched text, finditer() is useful as it provides Match objects instead of strings. Continuing with the previous example, if a writer wanted to find all of the adverbs and their positions in some text, they would use finditer() in the following manner:

Notation brute de chaînes ¶

La notation brute de chaînes ( r"text" ) garde saines les expressions rationnelles. Sans elle, chaque backslash ( '\' ) dans une expression rationnelle devrait être préfixé d'un autre backslash pour l'échapper. Par exemple, les deux lignes de code suivantes sont fonctionnellement identiques :

Pour rechercher un backslash littéral, il faut l'échapper dans l'expression rationnelle. Avec la notation brute, cela signifie r"\\" . Sans elle, il faudrait utiliser "\\\\" , faisant que les deux lignes de code suivantes sont fonctionnellement identiques :

Écrire un analyseur lexical ¶

Un analyseur lexical ou scanner analyse une chaîne pour catégoriser les groupes de caractères. C'est une première étape utile dans l'écriture d'un compilateur ou d'un interpréteur.

Les catégories de texte sont spécifiées par des expressions rationnelles. La technique est de les combiner dans une unique expression rationnelle maîtresse, et de boucler sur les correspondances successives :

L'analyseur produit la sortie suivante :

Friedl, Jeffrey. Mastering Regular Expressions. 3rd ed., O'Reilly Media, 2009 . La troisième édition de ce livre ne couvre plus du tout Python, mais la première version explique en détails comment écrire de bonnes expressions rationnelles.

Table des matières

  • Syntaxe des expressions rationnelles
  • Objets d'expressions rationnelles
  • Objets de correspondance
  • Rechercher une paire
  • Simuler scanf()
  • Comparaison de search() et match()
  • Construire un répertoire téléphonique
  • Mélanger les lettres des mots
  • Trouver tous les adverbes
  • Trouver tous les adverbes et leurs positions
  • Notation brute de chaînes
  • Écrire un analyseur lexical

Sujet précédent

string --- Common string operations

Sujet suivant

difflib --- Helpers for computing deltas

  • Signalement de bogue
  • Voir la source

IMAGES

  1. PPT

    regular expression assignment operator

  2. Regular Expression

    regular expression assignment operator

  3. PPT

    regular expression assignment operator

  4. LS35.4

    regular expression assignment operator

  5. Assignment Operators in C

    regular expression assignment operator

  6. PPT

    regular expression assignment operator

VIDEO

  1. Facial expression assignment

  2. Regular expression to DFA direct Method Part 1

  3. Regular Expression

  4. Army Deployment Force #ABR

  5. 06_Ruby basics ( Assignment Operators)

  6. 2023 Java Training Session 03 Datatype Variable Operator Literal Unary Binary Ternary

COMMENTS

  1. Find "assignment" (=) operator in a string with regular expression

    I'm looking for the assignment operator in strings (code). It doesn't have the be the world's most robust, but needs to be better than "go find the first =". ... Regular expressions, matching operator using a string variable in Perl. 0. regex match for a string. 1.

  2. Regular expression syntax cheat sheet

    Regular expression syntax cheat sheet. This page provides an overall cheat sheet of all the capabilities of RegExp syntax by aggregating the content of the articles in the RegExp guide. If you need more information on a specific topic, please follow the link on the corresponding heading to access the full article or head to the guide.

  3. Regular expressions

    A regular expression pattern is composed of simple characters, such as /abc/, or a combination of simple and special characters, such as /ab*c/ or /Chapter (\d+)\.\d*/ . The last example includes parentheses, which are used as a memory device. The match made with this part of the pattern is remembered for later use, as described in Using groups .

  4. Regular expressions

    A regular expression (regex for short) allow developers to match strings against a pattern, extract submatch information, or simply test if the string conforms to that pattern. Regular expressions are used in many programming languages, and JavaScript's syntax is inspired by Perl.

  5. What are the differences between "=" and "<-" assignment operators?

    The operators <- and = assign into the environment in which they are evaluated. The operator <- can be used anywhere, whereas the operator = is only allowed at the top level (e.g., in the complete expression typed at the command prompt) or as one of the subexpressions in a braced list of expressions.

  6. Python's Assignment Operator: Write Robust Assignments

    This expression looks like a regular assignment. However, instead of using the assignment operator (=), it uses the walrus operator (:=). For the expression to work correctly, the enclosing parentheses are required in most use cases. However, there are certain situations in which these parentheses are superfluous. Either way, they won't hurt you.

  7. A Practical Guide to Regular Expressions

    Some practical examples of using regex are batch file renaming, parsing logs, validating forms, making mass edits in a codebase, and recursive search. In this tutorial, we're going to cover regex basics with the help of this site. Later on, I will introduce some regex challenges that you'll solve using Python.

  8. Regular Expressions

    Regular expressions are patterns used to match character combinations in strings. In JavaScript, regular expressions are also objects. These patterns are used with the exec and test methods of RegExp, and with the match, replace , search, and split methods of String. This chapter describes JavaScript regular expressions.

  9. Expressions and operators

    Regular expression literal syntax. ( ) Grouping operator. Left-hand-side expressions. Left values are the destination of an assignment. Property accessors Member operators provide access to a property or method of an object (object.property and object["property"]). new The new operator creates an instance of a constructor. new.target

  10. Regex match if statements that contain assignment operator

    1. My requirement is to match all if statements that erroneously contain assignment operator instead of an equality operator (==). I am sure my regex lacks a lot but the first problem that I notice is that I am having trouble containing the regex to stop searching after an if statement closes e.g. ') {'. Note: lines containing >= or != must not ...

  11. Assignment operators

    An assignment operator assigns a value to its left operand based on the value of its right operand.. Overview. The basic assignment operator is equal (=), which assigns the value of its right operand to its left operand.That is, x = y assigns the value of y to x.The other assignment operators are usually shorthand for standard operations, as shown in the following definitions and examples.

  12. 17.3: Regex Syntax

    Operators are applied to regular expressions to produce more complex regular expressions. Sequencing (placing expressions one after another) as an operator is, in a certain sense, invisible — yet it is arguably the most common. We have already seen the Kleene star (*) and the + operator. A regular expression followed by an asterisk matches ...

  13. Regex Tutorial

    Here's how to write regular expressions: Start by understanding the special characters used in regex, such as ".", "*", "+", "?", and more. Choose a programming language or tool that supports regex, such as Python, Perl, or grep. Write your pattern using the special characters and literal characters. Use the appropriate ...

  14. PEP 572

    An assignment expression does not introduce a new scope. In most cases the scope in which the target will be bound is self-explanatory: it is the current scope. If this scope contains a nonlocal or global declaration for the target, the assignment expression honors that. A lambda (being an explicit, if anonymous, function definition) counts as ...

  15. Expressions and operators

    This chapter describes JavaScript's expressions and operators, including assignment, comparison, arithmetic, bitwise, logical, string, ternary and more. At a high level, an expression is a valid unit of code that resolves to a value. There are two types of expressions: those that have side effects (such as assigning values) and those that ...

  16. Assignment Operators in Programming

    Assignment operators are used in programming to assign values to variables. We use an assignment operator to store and update data within a program. They enable programmers to store data in variables and manipulate that data. The most common assignment operator is the equals sign (=), which assigns the value on the right side of the operator to ...

  17. PDF CS 2112 Lab 7: Regular Expressions

    Regex BasicsBasic PatternsJavaExercise Regex Overview I Regular Expressions, also known as 'regex' or 'regexps' are a common scheme for pattern matching in strings I A regular expression is represented as a single string and de nes a set of matching strings I The set of strings matched by a regex is the language of the regular expression.

  18. Regular Expressions in Java

    Regular Expressions in Java. Last Updated : 21 Sep, 2023. In Java, Regular Expressions or Regex (in short) in Java is an API for defining String patterns that can be used for searching, manipulating, and editing a string in Java. Email validation and passwords are a few areas of strings where Regex is widely used to define the constraints.

  19. Learn Regular Expressions by Building a Spam Filter

    result.textContent = isSpam (messageInput.value)?"Oh no! This looks like a spam message.": "This message does not seem to contain any spam."; messageInput.value="". }); it worked this way assignment operator was causing issue but instruction was to use asssignment operator. 1 Like. ilenia January 11, 2024, 10:21am 10.

  20. Destructuring assignment

    Unpacking values from a regular expression match. When the regular expression exec() method finds a match, it returns an array containing first the entire matched portion of the string and then the portions of the string that matched each parenthesized group in the regular expression. Destructuring assignment allows you to unpack the parts out of this array easily, ignoring the full match if ...

  21. Learn Regular Expressions by Building a Spam Filter

    Hi @GavinEscanilla. You should use the assignment operator to set the textContent property of the result element. Looks like you need to assign the expression to the outcome of the ternary operator. Tell us what's happening: Describe your issue in detail here. confused ive with just = and += and it wont pass Your code so far /* file: script ...

  22. Assignment operators

    The basic assignment operator is equal ( = ), which assigns the value of its right operand to its left operand. That is, x = y assigns the value of y to x. The other assignment operators are usually shorthand for standard operations, as shown in the following definitions and examples. Name. Shorthand operator.

  23. re --- Regular expression operations

    Syntaxe des expressions rationnelles¶. Une expression rationnelle (regular expression ou RE) spécifie un ensemble de chaînes de caractères qui lui correspondent ; les fonctions de ce module vous permettent de vérifier si une chaîne particulière correspond à une expression rationnelle donnée (ou si un expression rationnelle donnée correspond à une chaîne particulière, ce qui ...

  24. Expressions and operators

    Basic keywords and general expressions in JavaScript. These expressions have the highest precedence (higher than operators ). The this keyword refers to a special property of an execution context. Basic null, boolean, number, and string literals. Array initializer/literal syntax. Object initializer/literal syntax.

  25. Using classes

    JavaScript is a prototype-based language — an object's behaviors are specified by its own properties and its prototype's properties. However, with the addition of classes, the creation of hierarchies of objects and the inheritance of properties and their values are much more in line with other object-oriented languages such as Java. In this section, we will demonstrate how objects can be ...