Open In App

How to extract paragraph from a website and save it as a text file?

Last Updated : 13 Jan, 2023
Improve
Improve
Like Article
Like
Save
Share
Report

Perquisites:  

Scraping is an essential technique which helps us to retrieve useful data from a URL or a html file that can be used in another manner. The given article shows how to extract paragraph from a URL and save it as a text file.

Modules Needed

bs4: Beautiful Soup(bs4) is a Python library used for getting data from HTML and XML files. It can be installed as follows:

pip install bs4

urllib: urllib is a package that collects several modules for working with URLs. It can also be installed the same way, it is most of the in-built in the environment itself.

pip install urllib

Approach:

  • Create a text file.
  • Now for the program, import required module and pass URL and **.txt file path. This will make a copy of html code of that URL in your local machine.
  • Make requests instance and pass into URL
  • Open file in read mode and pass required parameter(s)
  • Pass the requests into a Beautifulsoup() function.
  • Create another file(or you can also write/append in existing file).
  • Then we can iterate, and find all the ‘p’ tags, and print each of the paragraph in our text file.

The implementation is given below:

Example:

Python3




import urllib.request
from bs4 import BeautifulSoup
 
# here we have to pass url and path
# (where you want to save your text file)
                           "/home/gpt/PycharmProjects/pythonProject1/test/text_file.txt")
 
file = open("text_file.txt", "r")
contents = file.read()
soup = BeautifulSoup(contents, 'html.parser')
 
f = open("test1.txt", "w")
 
# traverse paragraphs from soup
for data in soup.find_all("p"):
    sum = data.get_text()
    f.writelines(sum)
 
f.close()


Output:


Previous Article
Next Article

Similar Reads

NumPy save() Method | Save Array to a File
The NumPy save() method is used to store the input array in a binary file with the 'npy extension' (.npy). Example: C/C++ Code import numpy as np a = np.arange(5) np.save('array_file', a) SyntaxSyntax: numpy.save(file, arr, allow_pickle=True, fix_imports=True) Parameters: file: File or filename to which the data is saved. If the file is a string or
2 min read
How to save a NumPy array to a text file?
Let us see how to save a numpy array to a text file. Method 1: Using File handling Creating a text file using the in-built open() function and then converting the array into string and writing it into the text file using the write() function. Finally closing the file using close() function. Below are some programs of the this approach: Example 1: C
3 min read
Ways To Save Python Terminal Output To A Text File
Python provides various methods for redirecting and saving the output generated in the terminal to a text file. This functionality can be useful for logging, debugging, or simply capturing the results of a script. In this article, we'll explore different ways to save Python terminal output to a text file. Save Python Terminal Output to a Text File
2 min read
How to save file with file name from user using Python?
Prerequisites: File Handling in PythonReading and Writing to text files in Python Saving a file with the user's custom name can be achieved using python file handling concepts. Python provides inbuilt functions for working with files. The file can be saved with the user preferred name by creating a new file, renaming the existing file, making a cop
5 min read
Extract Video Frames from Webcam and Save to Images using Python
There are two libraries you can use: OpenCV and ImageIO. Which one to choose is situation-dependent and it is usually best to use the one you are already more familiar with. If you are new to both then ImageIO is easier to learn, so it could be a good starting point. Whichever one you choose, you can find examples for both below: ImageIOInstallatio
2 min read
Extract numbers from a text file and add them using Python
Python too supports file handling and allows users to handle files i.e., to read and write files, along with many other file handling options, to operate on files. Data file handling in Python is done in two types of files: Text file (.txt extension) Binary file (.bin extension) Here we are operating on the .txt file in Python. Through this program
4 min read
Python program to extract Email-id from URL text file
Prerequisite : Pattern Matching with Python Regex Given the URL text-file, the task is to extract all the email-ids from that text file and print the urllib.request library can be used to handle all the URL related work. Example : Input : Hello This is Geeksforgeeks review-team@geeksforgeeks.org review-team@geeksforgeeks.org GfG is a portal for gee
1 min read
Extract text from PDF File using Python
All of you must be familiar with what PDFs are. In fact, they are one of the most important and widely used digital media. PDF stands for Portable Document Format. It uses .pdf extension. It is used to present and exchange documents reliably, independent of software, hardware, or operating system. We will extract text from pdf files using two Pytho
3 min read
Scrape and Save Table Data in CSV file using Selenium in Python
Selenium WebDriver is an open-source API that allows you to interact with a browser in the same way a real user would and its scripts are written in various languages i.e. Python, Java, C#, etc. Here we will be working with python to scrape data from tables on the web and store it as a CSV file. As Google Chrome is the most popular browser, to make
3 min read
How To Save The Network In XML File Using PyBrain
In this article, we are going to see how to save the network in an XML file using PyBrain in Python. A network consists of several modules. These modules are generally connected with connections. PyBrain provides programmers with the support of neural networks. A network can be interpreted as an acyclic directed graph where each module serves the p
2 min read