How to click on an element with Sikuli using an image?

Introduction:

In the realm of automation testing, the conventional practice of identifying locators such as XPath, CSS, and ID is widely employed. However, there are scenarios where substantial time is expended in locating elements within diverse components, such as popup windows and Microsoft Foundation Class (MFC) windows. Additionally, there are cases where element location proves to be impossible. These challenges often impede progress and create bottlenecks. Hence, here in this blog, my aim is to propose a solution for addressing these issues and optimizing time allocation.

What if there was a way to bypass the traditional locator-finding technique and still identify and interact with elements?

Well, it is indeed possible using Sikuli. Sikuli offers an alternative approach to automation by leveraging visual patterns, allowing users to interact with elements on the screen without relying on traditional locator-based techniques.

Let’s understand What is Sikuli:

Sikuli is an open-source and powerful test automation tool that excels when there is limited access to a GUI’s internal or source code. Instead of relying on XPath, CSS, or ID, Sikuli employs image recognition and GUI component control to identify objects displayed on the screen. It is operate as a separate tool to employees’ image recognition mechanism with some action perform on the element.

Sikuli is a versatile tool that integrates seamlessly with popular programming languages like Python and Java. It is compatible with various operating systems, including Windows, Mac, and Linux as well as integrating with Selenium and Pycharm. By adopting this approach, we significantly reduce the time required for element location, simplifying the automation process.

Pre-requisite For Sikuli:

To get started using Sikuli, we need to install the following things.

Download and Installed any IDE as per your preference. Here we are using Intellij Idea
Create a new maven project using IntelliJ Idea
Download the Sikuli dependencies or jar file from https://mvnrepository.com/artifact/org.sikuli and installed it in your POM.xml file.
Install Other required dependencies like selenium, web driver, etc
Create a folder to store screenshots in a project.
To take a screenshot, you can use a built-in snippet tool available on your system. Alternatively, you can install tools like Inspector, PowerShell, or AutoIT, which provide x and y coordinates. For more information on these tools, you can refer to this blog: https://spurqlabs.com/different-tools-to-inspect-desktop-app-elements/
Using x and y coordinates, we take a screenshot during execution and store it in a specific path. We have written the code below:
Create one Java class.
Build your project.

Architecture of Sikuli:

Sikuli is a framework that assists in automating various elements on web pages.
The framework utilizes an image recognition mechanism to identify elements on a webpage.
Image recognition is achieved by comparing the elements on the webpage with provided images.
If a provided image is not found on the webpage, Sikuli raises an exception.
In specific scenarios, it is advisable to select an appropriate image that precisely highlights a single element on the webpage.
Selecting a precise image helps to ensure greater accuracy in element identification.
The Sikuli framework offers different methods to execute actions on web pages.
These methods provide versatility and flexibility in achieving automation objectives.

Screen Class:

The Sikuli framework has an inbuilt Screen class, a predefined method for performing actions on web elements using images. To access methods of the Screen class, we need to declare a reference to this class and initialize it.

Screen screen = new Screen();

In the above code, the variable “screen” is declared as an instance of the Screen class, and the new keyword is used to create a new object of the Screen class.

Here are some of the methods available in the Screen class that can be used efficiently:

Click on Element- Image:

To perform a left click on an element, provide an image to locate/identify the element to be clicked.

Ex: screen.click(“image path”);

Right-click on the element:

This method is used to perform a right-click on an element by providing an image to locate/identify the element to be clicked.

Ex: s.rightClick(“Image Path”);

Double-click on the element:

We use this method to perform a double-click action on an element. It first locates the element on the screen and then performs a double left click on the element.

Ex. s.doubbleClick(“Image Path”);

Type on Element :

In the Sikuli framework, you use the Type method to send Keys by providing an image path and sending text as a method argument.

Ex: screen.type(“Image path”, ”Send Key”);

Find() :

We can use this method to check the element’s visibility on a webpage.

EX. screen.find(“Image Path”);

DragDrop :

Users use this method to perform the action as drag and drop. We provide a source image and target image to the drag-drop method argument.

Ex-screen.dragDrop(“source image”,”target image”);

Hover() :

We use this method to hover our cursor on a web element and validate upcoming popup messages.

Ex-screen.hover(image path);

Add this dependency in the pom.xml file to use the screen class of sikuli.

<dependencies>
<dependency>
<groupId>com.sikuliX</groupId>
<artifactId>sikulixapi</artifactId>
<version>2.0.5</version>
</dependency>

How to integrate sikuli with selenium:

In the world of automation testing, we use an Integrated Development Environment (IDE) to write code. Nowadays, it has become common to create Maven projects to facilitate collaboration with various add-ons. As a widely used automation tool, Selenium supports integration with many add-ons. To integrate Sikuli into the Selenium framework, we need to add the required dependencies in the pom.xml file of our project.

To find the Sikuli dependencies, we can search the Maven repository at https://mvnrepository.com/artifact/org.sikuli. From this repository, we can copy the necessary dependencies and paste them into the pom.xml file of our project.

By adding the Sikuli dependencies to the pom.xml file, we ensure that the required libraries and resources are properly imported and utilized within our Selenium-Sikuli integration. This allows us to leverage the capabilities of Sikuli for image recognition and interaction within our Selenium automation framework.

We are creating sikuli funcion

1. Create a maven project, create a class with the main method where a set a browser and launch a browser:

public static void main (String [] argos){
WebDriverManager.chromedriver().setup();
ChromeDriver driver = new ChromeDriver();
driver.get(“https://demoqa.com/”);
driver.manage().window().maximize();}

2. Take a screenshot and store it in a specific location:

We are well aware that the Snipping Tool is a reliable tool for capturing screenshots. By utilizing this tool, we can capture customized screenshots and save them within the project. folder.

From the above image, we are cropping a single element image and saving it in the project screenshot folder.

How to take screenshots by using x,y coordinates:

There is an alternative method to capture screenshots without relying on external tools.
We can utilize the Robot class and its methods to capture screenshots based on x and y coordinates.
To capture a rectangular screenshot, we need two sets of x and y coordinates.
The first set represents the top-left corner of the rectangle, and the second set represents the bottom-right corner.
By specifying these coordinates, we can define the area of the screen to capture.
An example code snippet captures a screenshot based on the specified coordinates.
Our framework saves the captured screenshot to a specific location.

String fileName1 = "";
        try {
            Robot robot = new Robot();
            String imgeFormat = ".png";
            StringBuilder str = new StringBuilder("imageFolderPath" +   
System.currentTimeMillis() + image format);
fileName1 = str.toString();
            Rectangle captureRect = new Rectangle(xStart, yStart, xEnd - xStart, yEnd - yStart);
            BufferedImage screenFullImage = robot.createScreenCapture(captureRect);
            format = "png";
            System.out.println(" Path is " + fileName1);
            ImageIO.write(screenFullImage, format, new File(fileName1));
            System.out.println("A partial screenshot saved!");
  } catch (AWTException | IOException ex) {
            System.err.println(ex);
        }

Explanation of the above code:

Declares a variable fileName1 of type String and initializes it as an empty string.
The try-catch block handles potential exceptions that may occur during execution.
Creates a new instance of the Robot class, which allows for programmatic control of the mouse and keyboard
Declare an image format variable that assigns value as ‘.png’
Constructs a StringBuilder object to create the file path for the screenshot. It concatenates the image folder path, the current system time in milliseconds, and the image format.
Converts the StringBuilder object to a String and assigns it to the ‘fileName1’ variable.
Defines a Rectangle object that represents the area of the screen to be captured. It takes the starting coordinates (xStart, yStart) and the width and height calculated from (xEnd – xStart) and ( yEnd – yStart)
Uses the ‘createScreenCapture(captureRect)’ method of the Robot class to capture the screen within the specified Rectangle area. It returns a BufferedImage object representing the captured image.
Writes the captured image to the file specified by fileName1 using the write() method from the ImageIO class.

3. Click on Element by using the previous taking screenshot:

As we mentioned above sikuli methods, by using this method we will do multiple actions performed on elements.

Screen s = new Screen();
s.find(fileName1);
s.click(fileName1);

Limitations:

Manage a number of screenshots:

Managing a large number of screenshots can be a complex and time-consuming process. Locating a specific screenshot among many can become challenging. To simplify this process, a recommended solution is to establish a specific naming convention for the screenshots.

Two similar images are available on the webpage:

If there is more than one image available on the webpage, Sikuli cannot accurately categorize and distinguish a specific image. If it’s not a recognized image then it throws an exception.

Conclusion:

To overcome the challenges of locating elements in automation testing, especially within popup windows and MFC windows, we have successfully implemented Sikuli as a solution. So by adopting Sikuli, we can eliminate the need for traditional locators, leading to enhanced execution time and improved efficiency in our automation efforts. Sikuli’s visual recognition capabilities can help users swiftly identify and interact with GUI elements, enabling faster automation execution. Overall, Sikuli proves to be a valuable alternative in scenarios where traditional locators are insufficient or inaccessible.

How to click on an element with Sikuli using an image?

Introduction:

Let’s understand What is Sikuli:

Pre-requisite For Sikuli:

Architecture of Sikuli:

Screen Class:

Here are some of the methods available in the Screen class that can be used efficiently:

How to integrate sikuli with selenium:

1. Create a maven project, create a class with the main method where a set a browser and launch a browser:

2. Take a screenshot and store it in a specific location:

Explanation of the above code:

3. Click on Element by using the previous taking screenshot:

Limitations:

Conclusion:

Recent Posts

Contact Us

Our Blogs

Software Testing Skills: BalancingTechnical and Soft Skills

Automate TOTP 2-Factor Authentication (2FA) with Playwright

Creating Executable Jar File to execute Cucumber Scenarios

Subscribe Now