Open In App

Python | Check for URL in a String

Last Updated : 25 Apr, 2023
Improve
Improve
Like Article
Like
Save
Share
Report

Prerequisite: Pattern matching with Regular Expression In this article, we will need to accept a string and we need to check if the string contains any URL in it. If the URL is present in the string, we will say URL’s been found or not and print the respective URL present in the string. We will use the concept of Regular Expression of Python to solve the problem.

Examples:

Input : string = 'My Profile: 
https://auth.geeksforgeeks.org/user/Chinmoy%20Lenka/articles 
in the portal of https://www.geeksforgeeks.org/'

Output : URLs :  ['https://auth.geeksforgeeks.org/user/Chinmoy%20Lenka/articles',
'https://www.geeksforgeeks.org/']

Input : string = 'I am a blogger at https://geeksforgeeks.org'
Output : URL :  ['https://geeksforgeeks.org']

To find the URLs in a given string we have used the findall() function from the regular expression module of Python. This return all non-overlapping matches of pattern in string, as a list of strings. The string is scanned left to right, and matches are returned in the order found. 

Python3




# Python code to find the URL from an input string
# Using the regular expression
import re
 
 
def Find(string):
 
    # findall() has been used
    # with valid conditions for urls in string
    regex = r"(?i)\b((?:https?://|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'\".,<>?«»“”‘’]))"
    url = re.findall(regex, string)
    return [x[0] for x in url]
 
 
# Driver Code
print("Urls: ", Find(string))


Output

Urls:  ['https://auth.geeksforgeeks.org/user/Chinmoy%20Lenka/articles', 'https://www.geeksforgeeks.org/']

Time complexity: O(n) where n is the length of the input string, as the findall function of the re library iterates through the entire string once to find the URLs.
Auxiliary space: O(n), where n is the length of the input string, as the function Find stores all the URLs found in the input string into a list which is returned at the end.

Method #2: Using startswith() method

Python3




# Python code to find the URL from an input string
 
def Find(string):
    x=string.split()
    res=[]
    for i in x:
        if i.startswith("https:") or i.startswith("http:"):
            res.append(i)
    return res
             
# Driver Code
print("Urls: ", Find(string))


Output

Urls:  ['https://auth.geeksforgeeks.org/user/Chinmoy%20Lenka/articles', 'https://www.geeksforgeeks.org/']

Time Complexity: O(n), where n is the number of words in the input string.
Auxiliary Space: O(n), where n is the number of URLs found in the input string. The auxiliary space is used to store the found URLs in the res list.

Method #3 : Using find() method

Python3




# Python code to find the URL from an input string
 
def Find(string):
    x=string.split()
    res=[]
    for i in x:
        if i.find("https:")==0 or i.find("http:")==0:
            res.append(i)
    return res
             
# Driver Code
print("Urls: ", Find(string))


Output

Urls:  ['https://auth.geeksforgeeks.org/user/Chinmoy%20Lenka/articles', 'https://www.geeksforgeeks.org/']

Time Complexity : O(N)
Auxiliary Space : O(N)

METHOD 4:Using the urlparse() function from urllib.parse

APPROACH:

The urlparse() function from the urllib.parse module in Python can be used to extract various components of a URL, such as the scheme, netloc, path, query, and fragment

ALGORITHM:

1.Import the required modules – urllib.parse for urlparse() and split() methods from Python’s inbuilt module string
2.Initialize the input string.
3.Split the input string into individual words using the split() method.
4.Initialize an empty list urls to store the extracted URLs.
5.Iterate through each word in the words list.
6.Use the urlparse() function to extract the scheme and netloc components of the URL.
7.Check if both scheme and netloc are present in the URL, indicating that it is a valid URL.
8.If the URL is valid, add it to the urls list.
9.Print the final list of extracted URLs.

Python3




from urllib.parse import urlparse
 
 
# Split the string into words
words = string.split()
 
# Extract URLs from the words using urlparse()
urls = []
for word in words:
    parsed = urlparse(word)
    if parsed.scheme and parsed.netloc:
        urls.append(word)
 
# Print the extracted URLs
print("URLs:", urls)


Output

URLs: ['https://auth.geeksforgeeks.org/user/Chinmoy%20Lenka/articles', 'https://www.geeksforgeeks.org/']

The time complexity of this approach is O(n), where n is the length of the input string

The space complexity of this approach is O(n + k), where n is the length of the input string and k is the number of extracted URLs..

METHOD 5: Using reduce():

Algorithm :

  1. Define a function merge_url_lists(url_list1, url_list2) that takes two lists of URLs and returns their concatenation.
  2. Define a function find_urls_in_string(string) that takes a string as input and returns a list of URLs found in the string.
  3. Define two input strings, string1 and string2, and put them into a list called string_list.
  4. Call map(find_urls_in_string, string_list) to generate a list of lists of URLs found in each string.
  5. Call reduce(merge_url_lists, map(find_urls_in_string, string_list)) to concatenate all the lists of URLs found into a single list.
  6. Print the resulting list of URLs.

Python3




from functools import reduce
 
def merge_url_lists(url_list1, url_list2):
    return url_list1 + url_list2
 
def find_urls_in_string(string):
    x = string.split()
    return [i for i in x if i.find("https:")==0 or i.find("http:")==0]
 
string2 = 'Some more text without URLs'
 
string_list = [string1, string2]
 
url_list = reduce(merge_url_lists, map(find_urls_in_string, string_list))
 
print("Urls:", url_list)
#This code is contributed by Rayudu.


Output

Urls: ['https://auth.geeksforgeeks.org/user/Chinmoy%20Lenka/articles', 'https://www.geeksforgeeks.org/']

The time complexity: O(n*m), where n is the number of strings in the string_list list and m is the maximum number of words in a string. This is because the find_urls_in_string() function needs to split each string into words and check each word for the presence of a URL.

The space complexity : O(n*m), because it stores all the words in all the strings in memory, as well as all the URLs found.



Similar Reads

Building CLI to check status of URL using Python
In this article, we will build a CLI(command-line interface) program to verify the status of a URL using Python. The python CLI takes one or more URLs as arguments and checks whether the URL is accessible (or)not. Stepwise ImplementationStep 1: Setting up files and Installing requirements First, create a directory named "urlcheck" and create a new
4 min read
Check if an URL is valid or not using Regular Expression
Given a URL as a character string str of size N.The task is to check if the given URL is valid or not.Examples : Input : str = "https://www.geeksforgeeks.org/" Output : Yes Explanation : The above URL is a valid URL.Input : str = "https:// www.geeksforgeeks.org/" Output : No Explanation : Note that there is a space after https://, hence the URL is
4 min read
Python | URL shortener using tinyurl API
There are multiple APIs (e.g-bitly API, etc.) available for URL shortening service but in this article, we will be using Tinyurl API to shorten URLs. Though tinyURL has not officially released its API but we are going to use it unofficially. Here, we can put as input any number of URLs at a time and get the shortened URLs at a time. For Example, if
3 min read
Python program to extract Email-id from URL text file
Prerequisite : Pattern Matching with Python Regex Given the URL text-file, the task is to extract all the email-ids from that text file and print the urllib.request library can be used to handle all the URL related work. Example : Input : Hello This is Geeksforgeeks review-team@geeksforgeeks.org review-team@geeksforgeeks.org GfG is a portal for gee
1 min read
URL Shorteners and its API in Python | Set-2
Prerequisite: URL Shorteners and its API | Set-1 Let's discuss few more URL shorteners using the same pyshorteners module. Bitly URL Shortener: We have seen Bitly API implementation. Now let's see how this module can be used to shorten a URL using Bitly Services. Code to Shorten URL: from pyshorteners import Shortener ACCESS_TOKEN = '82e8156..e4e12
2 min read
Python | Extract URL from HTML using lxml
Link extraction is a very common task when dealing with the HTML parsing. For every general web crawler that's the most important function to perform. Out of all the Python libraries present out there, lxml is one of the best to work with. As explained in this article, lxml provides a number of helper function in order to extract the links. lxml in
4 min read
Django URL patterns | Python
Prerequisites: Views in Django In Django, views are Python functions which take a URL request as parameter and return an HTTP response or throw an exception like 404. Each view needs to be mapped to a corresponding URL pattern. This is done via a Python module called URLConf(URL configuration) Let the project name be myProject. The Python module to
2 min read
Python | Sorting URL on basis of Top Level Domain
Given a list of URL, the task is to sort the URL in the list based on the top-level domain. A top-level domain (TLD) is one of the domains at the highest level in the hierarchical Domain Name System of the Internet. Example - org, com, edu. This is mostly used in a case where we have to scrap the pages and sort URL according to top-level domain. It
3 min read
Python | Key-Value to URL Parameter Conversion
Many times, while working in the web development domain, we can encounter a problem in which we require to set as URL parameters some of the key-value pairs we have, either in form of tuples, or a key and value list. Let's discuss a solution for both cases. Method #1: Using urllib.urlencode() ( with tuples ) The urlencode function is root function
5 min read
response.url - Python requests
response.url returns the URL of the response. It will show the main url which has returned the content, after all redirections, if done. Python requests are generally used to fetch the content from a particular resource URI. Whenever we make a request to a specified URI through Python, it returns a response object. Now, this response object would b
2 min read