Parsing a site to fill your online store, adding descriptions, attributes, photos and video product reviews to your products
How the Elbuz Data Parser Works
All sites use the HTML hypertext markup language, so all sites use the same tags for different blocks, for example, the "a" tag is used for links. To create a block of information, the div tag is intended, which allows you to select a section with visual content on the site.
HTML tags can use style names to visually display information on the site, for example, a given block style allows you to display bold text or green color for some element. Based on these data in the Elbuz system, you can configure the parser for any site to get the information you need, the Elbuz parser uses CSS selectors (site design styles) or XPath (query language for site elements) to receive data.
Attention! To get started, you need to install the extension for the Google Chrome browser, to do this, follow this link. Searching for product cards is only possible in the Google Chrome browser. If the Chrome Store link doesn't work, install the extension manually.
Creating a new parser
To add a new site parser, open the "Products of the base catalog" window, click the "Search content for products" button (1), in the window that opens, click the "Add site" button (2).
Specify the site address for parsing and the search string
What is the purpose of the search link?
To perform an automatic search for your products on the site of the parsing source. The program needs to know at what address the site searches for goods, the name of your product will be added to this address, then the site will display search results, you will only have to select the desired product from the list to save the description, attributes, photo and other information.
How to find out the address of the link to search for your products?
Consider an example, in this example the link to the search: https://www.ozon.ru/search/?text
- Specify the text on the site in the search bar
- Click the "Search" button
- The site will open a page with search results, while in the address bar of the browser there will be a link that will contain the text that was entered for the search. This is the link to the search page we are looking for, it is this link that needs to be copied into the window for adding a new site parser, but without your text.
When creating a new site parser, the data must be entered in this form
Parser settings for receiving data from the site
After adding the site parser, the settings window will open
The setup table contains the types of operations and the list of fields to store data in them. Operation types are the stages of the parser to get data from the site.
For example, to get the attributes of a product from the site, you need to get a link to the product so that the parser can open the page to get the attributes, so the first operation that the parser will use is "List of product links", it is in this operation that the link to the search will be used, which You specified when creating the parser.
Operation types:
- List of links to products. Used to get product links from search results.
- Card Product. Used to get product information. When performing this operation, you can get the product name, manufacturer's SKU, model, warranty, manufacturer's name, photos, video reviews and other information from the site.
- Product attributes. Used to get product attributes.
Description of the grid columns for setting up the parser
- Operation selector. Sign of the main selector for receiving data from the site to perform the operation.
- Field name. The name of the operation or field to store data in.
- Selector #1-4. The Elbuz parser uses CSS selectors (site styles) or XPath (query language for site elements) to receive data from site pages. The selector fields specify the conditions for finding the blocks you need on the site and getting information from them.
- Link for testing. Link to the site page for testing data acquisition. For each operation, a link to a separate section of the site is indicated, for example, for the "List of links to goods" operation, a link to the list of products that the site issued when searching for the text you specified (product name) is indicated. To test the receipt of products attributes for the "Item card" operation, a link to the products is specified.
- Text to clean up. Keywords to clean up when getting data. For example, in the product name on the site there is extra text that you do not want to receive from the site, you can set this text in the "Text to clean up" field to remove it.
- The text on the page to move to the next operation. When a product search is launched, the "List of links to products" operation type starts working to get links to products from the search result, but some sites, when searching for a product, immediately open a product card, instead of a list of found products, but the program waits for a list of links if it does not find it then there is no description for the product. To solve this problem, this column is used, in which for Ch. selector, the search text is set so that we can determine where we are, the text that is only in the product card is indicated, if the program finds it, then it will go to the next operation "Product card" and download the photo, attributes, description.
- Note. A note for a setting string, for example, you can save yourself a reminder of what this setting means.
Stage number 1. Getting a list of product links from search results
To get a list of links to products, you need to find out its selector from the search results page, for this, copy the link with the search results in the "Link for testing" field and click the "T" button
The Download Testing tab will open, displaying the page at the link you specified, it should show search results with a list of products. The results of the parser's work are displayed on the left. Your task is to get a list of links to products from the site, if the parser is successfully configured, you will see a list of links on the left side of the screen.
You need to find a product link selector. To do this, right-click on the name of any product from the search result and select "View Code", after which a browser window will open with the source code of the site. You can position it as you like, for example, on the left or at the bottom of the screen
You can also open the link in a separate browser tab if you need more screen space to search for the product link selector and do the same there.
We are looking for blocks of products and a link in them
Your task is to find blocks of products and links to products in the search results. After you have selected the "View code" item, the browser will open the source code of the site in the place where the right mouse button was pressed, in this example we clicked on the product name and we see that the product links are located in the "div" and "a" tags .
That is, each product in the search results has a "div" block and contains an "a" link in it, while the "div" block has the tile style name (class="tile").
That is, each product in the list is distinguished by the same style called tile, which is what we will use to get links to each product from the search results.
Let's write the selectors in the parser settings in this form (we specify the style name through a dot and the "a" tag separated by a space)
We check the result, for this we press the "T" button. As you can see in the example, we received 28 links to products, that is, our parser already knows how to search for your products on a third-party site
Stage number 2. Getting data from the product card.
By analogy with the search for a selector of links to products from the search results, you need to find selectors for the fields you need in the product card, for this we write a link to the test product in the "Link for testing" field and open it
You need to right-click on the product name and select the "View code" item, after which a browser window will open with the source code of the site.
For example, the product name is in the h1 tag
Let's write the selector h1 in the settings table
Next, we are looking for a selector for the description of the product
Write the selector like this
div[itemprop="description"]
For links to photos, we prescribe such a selector
div. image img::attr(src)
Checking the result
Stage number 3. Getting product attributes.
To get product attributes, you must specify a selector for the entire attribute block (table) and a row selector that contains the attribute name and value.
Procedure:
- In the "Selector No. 1" field, specify the selector for the attribute block
- In the "Selector No. 2" field, specify the selector for the block that contains the name and value of the attribute (that is, for the row of the attribute table)
- In the "Attribute name" field, specify the selector where the attribute name is located
- In the "Attributes value" field, specify the selector where the attribute value is located
Setting example
An example of customization based on the source code of the site
The result of checking the receipt of product attributes (characteristics, properties)
Site parsing general description.