Different approaches to find verified domains using Python

Different approaches to find verified domains using Python

What is the domain?

A domain name is an online address that offers a user-friendly way to access a website. In the context of Verified domains Python, this refers to verifying that a domain is legitimate and active using Python programming techniques. In the internet world IP address is a unique string of numbers and other characters used to access websites from any device or location. However, the IP address is hard to remember and type correctly, so the domain name represents it with a word-based format that is much easier for users to handle. When a user types a domain name into a browser search bar, it uses the IP address it represents to access the site.

The Domain Name System (DNS) maps human-readable domain names (in URLs or email addresses) to IP addresses. This is the unique identity of any website or company/organization which makes any website unique and verified, It’s still possible for someone to type an IP address into a browser to reach a website, but most people want an internet address to consist of easy-to-remember words, called domain names for example: Google. , Amazon.  Etc. and domain names come with different domain extensions for example: Amazon. in, Google.com

A domain also serves several important purposes on the internet. Here are some key reasons why a domain is necessary:

  • Identification: Domain names are easier to remember than IP addresses, making it simpler to locate resources online.
  • Branding: A domain name is vital for building a professional online identity, reflecting the nature and purpose of a business.
  • Credibility: Owning a domain enhances professionalism, showing commitment to a unique online presence.
  • Email Address: A personalized email linked to a domain looks more professional and builds trust.
  • Control: Domain ownership gives you control over hosting, email management, and associated content.
  • SEO: A relevant, keyword-rich domain can improve search engine visibility.
  • Portability: Owning a domain allows you to change hosting providers while keeping the same web address, ensuring consistency.

Why do we need domain verification?

Verifying a domain name is a key step for businesses and individuals looking to establish credibility, and control over their content, and enhance their presence on digital platforms.

Let’s Understand this using the example:

Verifying your domain helps Facebook to allow rightful parties to edit link previews directly to your content.

This allows you to manage editing permissions over links and contents and prevents misuse of your domain. This includes both organic and paid content.

These verified editing permissions ensure that only trusted employees and partners represent your brand.

Domain Verification Techniques:

Domain verification is a crucial step to make sure your domain is active and not expired. When a domain is verified, users are automatically added to the Universal Directory, so they don’t have to wait for personal approval to log in. This process helps confirm that the domain is legitimate and prevents issues related to fake or misused domains. These are some techniques through which we can verify our domain.

  • WHOIS Lookup
  • Requests & Sockets
  • DNS Verification

Let’s see how we can verify valid domains to find verified domains using Python, you can employ several approaches listed below.

1) WHOIS Lookup:

  • Use the WHOIS module in Python to perform a WHOIS lookup on a domain. This method provides information about the domain registration, including the registrar’s details and registration date.
  • Install the whois module using pip install python-whois.

2) Request & Socket

  • Use Python’s request lib and socket to find verified domains For this we need to install these python dependencies requests & socket

In above function output should be True if provided url is correct

And contains correct domain name otherwise it will return false

Here we are passing hostname as a parameter and socket.gethostbyname(hostname) will give us the IP address for the host socket.create_connection((ip_address, 80)) is used for the socket to bind as a source address before making the connection. When we pass hostname or domain name with the correct extension to this function for example as given in the above function i.e “google.net” it will return True
And if the hostname/domain is incorrect it will return false.

To verify a domain in Python, you can use various approaches depending on the type of verification required. Here, is one of the common methods: DNS verification

DNS Verification:

DNS verification involves checking if a specific DNS record exists for the domain. For example, you might check for a TXT record with a specific value.

This is a Valid example of the above function where the domain is “google.com”, the function returns True when the record type is “TXT” and the expected value matches Google’s SPF TXT record. If no match is found or if the domain does not exist (it will give an NXDOMAIN exception), it returns False.

Check out the GitHub repository for more details
https://github.com/jjadhav-dj/DNS-Verification-using-python.git

Conclusion:

 A domain name is a crucial component of your online identity, providing a way for people to find and remember your website or online services. Whether for personal use, business, or any other online endeavor, having a domain name is an essential part of establishing a presence on the internet.

Each approach serves a distinct purpose in verifying a domain’s legitimacy. Choose the verification method based on your specific use case and requirements. Verified domains Python methods like DNS verification are often used for domain ownership verification, while WHOIS Lookup provides essential registration details.

Click here to read more blogs like this and learn new tricks and techniques of software testing.

How to do web scraping/crawling using Python with Selenium

How to do web scraping/crawling using Python with Selenium

Have you ever wondered how online shopping sites such as Amazon, Flipkart, and Meesho suggest or recommend us products depending on our search or browsing history, how they do? This is because their server indexes all the information in their records so they can return the most relevant search-based results. Web crawlers use to handle this process.

Data is becoming the key to growth for any business over the past decade most successful organizations used data-driven decision-making.

with 5 billion users creating billions of data points per second. They get data primarily for price and brand monitoring, price comparison, and big data analysis that serve their decision-making process and business strategy

Web scraping/ crawling is used to find meaningful insights (Data) that will help in making decisions for business growth. Let’s see how we can achieve this.

Web scraping is used to gather a large amount of data from Websites. Doing such a thing manually is very difficult to manage because data available on the web is in an unstructured manner with the help of web scraping we can avoid this. Scraping stores data in a structured manner.

Example – Python web scraping/crawling for Flipkart

Prerequisites – We need the following lib to achieve the scraping of Flipkart, so to install these packages on your system, simply open cmd and run the following commands.

1.    Pip install python 3+

2.    Pip install selenium

3.    Pip install requests

4.    Pip install lxml

Once we install all required lib then we are good to go. Now we need to add request headers to scrap the information from the web. To find request headers on the web page follow the steps given below.

Step1:

Open the URL of the webpage you want to scrap/crawl and search for the product name in the search bar you want the information about, products list is displayed on the page, and then click on any product and right-click on the page and click on “inspect”.

Step2:

Now the Html format page is open then select the “Network” option, under this click on the first checkbox it will show all requests and response headers. Just copy the request headers that we need for scraping here.

Step 3:-

Create a python file (file name with .py extension) and import all required libraries which we are going to use.

Here create a file with name >python_scrapy_demo.py

import requests

from lxml import html

from csv import DictWriter

headers={‘Accept’: ‘text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9’,

‘Accept-Encoding’: ‘gzip, deflate, br’,

‘Accept-Language’: ‘en-US,en;q=0.9,mr;q=0.8’,

‘Cache-Control’: ‘max-age=0’,

‘Connection’: ‘keep-alive’,

‘Cookie’: ‘T=TI164041787357700171479106209992677828199995896121444018506202609225; _pxvid=9615f9eb-6555-11ec-a0e5-544468794e4b; Network-Type=4g; pxcts=e13bea52-9c71-11ec-8fbe-4e794558716a; _pxff_tm=1; AMCVS_17EB401053DAF4840A490D4C%40AdobeOrg=1; AMCV_17EB401053DAF4840A490D4C%40AdobeOrg=-227196251%7CMCIDTS%7C19057%7CMCMID%7C60020900103419608715489104597694502461%7CMCAID%7CNONE%7CMCOPTOUT-1646484547s%7CNONE%7CMCAAMLH-1647082147%7C3%7CMCAAMB-1647082147%7Cj8Odv6LonN4r3an7LhD3WZrU1bUpAkFkkiY1ncBR96t2PTI; _px3=517dd86b669bed026967b6bdfbfac15a6893b3fb6a0a48639f8c8cac65b3cd64:OFKVhuX/QOYMMgqjXTNst5364SHIk+eTiaOVpjTfYKc6cnY+68dfTvg1NUCBE2W7jjH0hr7tgdk6UkBvsJVm9A==:1000:rga8uP2RMWp7ee1XTv8PVYgqr/ZlUn4jscKqdAKTIK9OFsmlF4QbPjfaDpAcMZn18Eip7z8FZsgO3j/KJ5x3m7BeObZLpMhgigTALVggsTCobVWml0DqL55ZTywnb5ezOslK6Q9axT+/y3CK7meTirkm9bumQWlOwMSMinGilSmpFCek9gBrinbeKWgdDCzFIKhH9ZOdRDiYGKa0DUOu7w==; SN=VI1A3BE7DC80484037A949D48CB6847E12.TOKB37A0AFBA76F46BB84B8BC39EEE0C132.1646477414.LI; s_sq=flipkart-prd%3D%2526pid%253Dwww.flipkart.com%25253A%2526pidt%253D1%2526oid%253Dhttps%25253A%25252F%25252Fwww.flipkart.com%25252Fsearch%25253Fq%25253Diphone%252526sid%25253Dtyy%2525252C4io%252526as%25253Don%252526as-show%25253Don%252526otracker%25253DAS_QueryStore_Organ%2526ot%253DA; S=d1t13Pz8VPxZJPxkMPwA/GT8/P8fOp+2+MT5EGsfvlEGAzgqQGy0f0O82o91FZOXzCPWn/Wqqo3+892JiBWn5oEFyQg==; qH=0b3f45b266a97d70’,

‘Host’: ‘www.flipkart.com’,

‘Referrer’: ‘https://www.google.com/’,

‘sec-ch-ua’: ‘” Not A;Brand”;v=”99″, “Chromium”;v=”99″, “Google Chrome”;v=”99″‘,

‘sec-ch-ua-mobile’: ‘?0’,

‘sec-ch-ua-platform’: “Windows”,

‘Sec-Fetch-Dest’: ‘document’,

‘Sec-Fetch-Mode’: ‘navigate’,

‘Sec-Fetch-Site’: ‘same-origin’,

‘Sec-Fetch-User’: ‘?1’,

‘Upgrade-Insecure-Requests’: ‘1’,

‘User-Agent’: ‘Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.51 Safari/537.36’ }

#write method to save html page data in the file

def save_html_page(page_data): 

with open(‘flipkart_page.html’,’w+’,encoding=’utf-8′) as fl:

   fl.write(page_data)

#write method to save the data in the csv file

def data_save_csv(dt):

    headings={‘product_name’,’product_price’}

with open(‘flipkartdata.csv’,’a+’,encoding=’utf-8′) as file:

        writer=DictWriter(file,fieldnames=headings)

        writer.writeheader()

        writer.writerow(dt)

     file.close()

In the above code, we are saving html page and writing data into a file, and then we save that data into csv file. Here we write a method for saving html page we use the file open() method to open the file and we use “w+”,encoding=’utf-8”  to write data into Unicode transformation. For extracting data (i.e here we extract product_name and product_price) follow the methods given below. We can extract the different types of data by using this code here just need to add xpath of what type of product description we need to extract and it will return the data. 

def crawling_data():

response=requests.get(url=’https://www.flipkart.com/search?q=iphone&otracker=search&otracker1=search&marketplace=FLIPKART&as-show=on&as=off’,headers=headers,timeout=30)

# print(response.text)

save_html_page(page_data=response.text)

if response.status_code==200:

  tree=html.fromstring(response.text)     

        product_name=tree.xpath(‘//div[@class=”_3pLy-c row”]/div[@class=”col col-7-12″]/div[@class=”_4rR01T”]/text()’)                                

        prod_price=tree.xpath(‘//div[@class=”col col-5-12 nlI3QM”]/div[@class=”_3tbKJL”]/div[@class=”_25b18c”]/div[@class=”_30jeq3 _1_WHN1″]/text()’)

        all_data=list(zip(product_name,prod_price))

     # print(all_data)

     product={}

     for item in all_data:

            product[‘product_name’]=item[0]

            product[‘product_price’]=item[1].replace(‘₹’,”) # regex

          print(product)

            data_save_csv(dt=product)

crawling_data()

Conclusion

We live in a world where technology continues to develop, particularly in the computer industry. The current market scenario and client demands change every second. Hence to satisfy customer needs and business growth simultaneously we need to make changes in business and it can be achieved using web scraping/crawling.

Read more blogs here