Understand How to Use gettext() in Beautifulsoup

gettext() is a Beatifoulsoup method that uses to get all child strings concatenated using the given separator. In this tutorial, we will learn how to use gettext() with examples, and we'll also know the difference between gettext() and the .string property.

Let's get started.

gettext() Syntax

Separator : identify the delimiter to split.
Strip : removes space at the beginning and the end.

separator=u""
strip=False

And all of these arguments are Optional

How to use gettext()

Let's see an example to understand how to use the get_text() method. In the following example, we'll get all child text of the .

 child 1
 child 2
 child 3
  ''' soup = BeautifulSoup(html_source, 'html.parser') # 👉️ Parsing el = soup.find("div") # 👉️ Find TAG g_txt = el.get_text() # 👉️ Get text of the print(g_txt) # 👉️ Print output

As you can see in the code, we've used get_text() with no arguments.

If you want to remove the newlines \n from the output, set strip=True in the parameter like the example below.

 child 1
 child 2
 child 3
  ''' soup = BeautifulSoup(html_source, 'html.parser') # 👉️ Parsing el = soup.find("div") # 👉️ Find TAG g_txt = el.get_text(strip=True) # 👉️ Get Text of the and Remove newline from the output print(g_txt) # 👉️ Print output

To add space between strings, set separator parameters like the example below.

 child 1
 child 2
 child 3
  ''' soup = BeautifulSoup(html_source, 'html.parser') # 👉️ Parsing el = soup.find("div") # 👉️ Find TAG g_txt = el.get_text(strip=True, separator=" ") # 👉️ Set separator an dstript print(g_txt) # 👉️ Print output

Now, we'll split the response by \n and strip it.

 child 1
 child 2
 child 3
  ''' soup = BeautifulSoup(html_source, 'html.parser') # 👉️ Parsing el = soup.find("div") # 👉️ find TAG g_txt = el.get_text(strip=True, separator="\n") # 👉️ Set separator and strip print(g_txt) # 👉️ Print output

The difference between get_text() and .string

Let's see some examples to figure out the difference between the get_text() method and the .string property.

Example -1:

 child 1
 child 2
 child 3
  ''' soup = BeautifulSoup(html_source, 'html.parser') # 👉️ Parsing el = soup.find("div") # 👉️ Find TAG print(el.get_text()) # 👉️ Get content of div using get_text() print(el.string) # 👉️ Get Content of using .string

Output of get_text() :

Output of .string :

As you can see, the get_text returns the text of div children instead of the .string property. That is because .string is used for getting the text of the given element. And the div tag have no text.

Example -2:

''' soup = BeautifulSoup(html_source, 'html.parser') # 👉️ Parsing el = soup.find("div") # 👉️ Find TAG print(el.get_text()) # 👉️ Get Content of empty using .string print(el.string) # 👉️ Get content of empty using .string

Output of get_text() :

Output of .string :

get_text() returns empty value
.string returns None

Conclusion

To summarize this article, I'd like to say you should use the get_text() method to get all text inside an element.

For more articles about Beatifoulsoup, scroll down and happy learning