Blog / Archive
Web scraping in affiliate marketing - how to download a website and adapt it to your needs?
If you've ever wondered how to download an entire website, you're probably familiar with the term web scraping.
What is web scraping?
Web scraping means downloading websites as copies to a computer. This technology is used not only to download entire websites, but also to extract specific data of interest from a given portal. The entire process is carried out using bots, an indexing robot, or a script written in Python. During scraping, specific data is collected and copied from the network to the local database.
Web scraping - what's its use?
You already know what web scraping is and you can probably guess how it can be used. Web scraping is a key method for businesses and analysts who strive to delve into and understand complex sets of data from various online sources. This process allows for the automatic collection of information from selected websites and gathers them for detailed analysis. Regardless of the type of data - be it numbers, texts, images, or other content - web scraping enables its aggregation in one place, providing a better understanding of trends, relationships, and dependencies.
For instance, companies can use web scraping to analyze customer opinions from product or service reviews on various platforms. This helps in identifying patterns related to customer satisfaction levels and areas that need improvement. Meanwhile, market analysis firms can collect data on product and service prices, sales volumes, and consumer trends, aiding in price strategy formulation and marketing action planning.
Thanks to web scraping, analysts can also conduct studies on user behaviors on websites, analyzing aspects like navigation, interactions, and time spent on individual pages. This can assist in optimizing the user interface, enhancing user experience, and pinpointing areas that require further refinements.
In medicine and scientific research, web scraping can be used to gather data from scientific publications, clinical trials, or medical websites. This enables the analysis of health trends, the examination of therapy effectiveness, or the identification of new discoveries.
In summary, web scraping as a data collection tool for analysis opens doors to a deeper understanding of phenomena, relationships, and trends across various fields. However, it's crucial to remember the ethical and legal aspects of web scraping and to exercise caution and adhere to rules that govern access to both public and private data.
Web scraping in affiliate marketing
How does web scraping relate to affiliate marketing? Let's start with the biggest argument that prompts you to get interested in web scraping, i.e. the time saved, which you gain by downloading competitors' websites. Everyone knows, or at least guesses, that the process of creating a good landing page can be time-consuming, and that success depends, among other things, on time. Other factors are openness to a change of approach, searching for new campaigns, conducting tests and, of course, advertising analysis. Success is achieved by those who do not stop at trifles, but look for ways to scale. To run one campaign, you need to do a lot of research on the target group, GEO selection, offers, etc., as well as prepare consumables, including a landing page.
Some people prefer to use landing pages provided by the affiliate network, others use ready-made templates from page builders, and others still prefer to create a landing page from scratch. The first two options are the most common. In some cases, they can become profitable, but this is not a long-term solution as competition is fierce and packages with available templates deplete quickly.
A high-quality landing page is the key to future success and a good return on investment. It is worth adding that not every landing page from a competitor can bring the expected result. It is better to fine-tune the desired landing page, taking into account the criteria of the future advertising campaign.
Of course, you have to remember to do everything legally, i.e. according to certain rules, which you will learn about in a moment.
Is web scraping legal?
Yes. Web scraping is not a prohibited technology and companies using it do so legally. Unfortunately, there will always be someone who will start using a given tool for piracy activities. Web scraping can be used to pursue unfair pricing and steal copyrighted content. It is clear that the owner of a website that is under scraper can suffer huge financial losses. Interestingly, web scraping was used by several foreign companies to save Instagram and Facebook stories that should be time-limited.
Scraping is fine as long as you respect the copyright and stick to set standards.If you decide to switch to the darker side that is not accepted in MyLead, you may face various consequences.
Some good practices when scraping websites
Remember about the GDPR
When it comes to EU countries, you must comply with the EU data protection regulation, commonly known as the GDPR. If you aren't scratching personal data, you don't need to worry too much about it. Let us remind you that personal data is any data that can identify a person, for example:
- first and last name,
- email,
- phone number,
- address,
- username (e.g. login / nickname),
- IP address,
- information about the credit or debit card number,
- medical or biometric data.
To web scrape, you need a reason for storing personal data. Examples of such reasons include:
1. Legitimate interest
It must be proved that data processing is necessary for the purposes of the legitimate business. However, this does not apply to situations where these interests are overridden by the interests or fundamental rights and freedoms of the person whose data you want to process.
2. Customer consent
Each person whose data you want to collect must consent to the collection, storage and use of their data in the way you intend to do so, e.g. for marketing purposes.
If you do not have a legitimate interest or customer consent, you are violating GDPR, which may result in a fine, a restriction of freedom, or imprisonment for up to two years.
Attention!
GDPR applies only to residents of European Union countries, so it does not apply to countries such as the United States, Japan or Afghanistan.
Comply with copyright
Copyright is the exclusive right to any work done, for example an article, photo, video, piece of music, etc. You can guess that copyright is very important in web scraping, because a lot of data on the internet is copyrighted. Of course, there are exceptions in which you can scrape and use data without violating copyright laws, and these are:
- usage for personal public use,
- usage for didactic purposes or for scientific activity,
- usage under the right to quote.
Web scraping - where to start?
1. URL
The first step is to find the URL of the page you are interested in. Specify the topic you want to choose. You are only limited by your imagination and data sources.
2. HTML code
Learn the structure of the HTML code. Without knowing HTML, you will have a hard time finding an item that you download from your competitors' website. The best way is to go to the element in the browser and use the Inspect option. Then you will see the HTML tags and be able to identify the element of interest. Here's the example of this on Wikipedia:
As you can see, when you hover the mouse over a given line of code, the element corresponding to this line of code is highlighted on the page.
3. Work environment
Your work environment should be ready. You'll find out later that you'll need text editors like Visual Studio Code, Notepad ++ (Windows),TextEdit (MacOS),or Sublime Text, so get one now.
Libraries for web scraping - how to save a web page?
Web scraping libraries are organized collections of scripts and functions written in specific programming languages that assist in the automatic retrieval of data from websites. They allow developers to easily analyze, filter, and extract content from the HTML or XML code of web pages. With them, instead of writing every function manually, developers can use ready-made, optimized solutions for searching, navigating, and manipulating the structure of websites.
Simple HTML DOM Parser - A PHP Library
This tool is for PHP developers that facilitates manipulation and interaction with HTML code. It allows for searching, modifying, or extracting specific sections of the HTML code in an easy and intuitive manner.
Beautiful Soup - A Python Library
Beautiful Soup is a Python library designed for parsing HTML and XML documents. It has been crafted to easily navigate, search, and modify the DOM tree while providing intuitive interfaces for extracting data from web pages.
Scrapy - A Python Library
BScrapy is a powerful library and framework for web scraping in Python. It enables the creation of specialized bots that can scan pages, follow links, extract the necessary information, and save it in the desired formats. Scrapy is perfect for more complex applications where deep web page searching or interaction with forms and other page elements is required.
Saving the page by the browser
By entering any browser, anyone, including you, can save the selected page on their computer, just by spending a few minutes of your time. A duplicate page is saved on the user's computer as an HTML file and folder. The entire copy of the page opens in the browser and looks quite smooth. However, to save a really large page, this process will have to be repeated many times.
from paid third parties. There are many companies and freelancers on the Internet who will do everything for you for a fee. One of the website copying services is ProWebScraper. It has a trial version available thanks to which you can download 100 pages. Later, of course, you will have to pay. The plans start from $40 a month depending on how many pages you want to scrape. You can always find another site with a free trial period. It is worth mentioning that some portals allow you to check whether a given page is copyable, because many sites protect themselves from this.
More user-friendly tools for beginners
Not everyone wanting to delve into web scraping is an experienced developer. For those seeking less technical, more intuitive solutions, there are tools specifically designed for ease of use. With visual interfaces and simple mechanisms, the following software allows for efficient data collection from websites without the need for coding.
ZennoPoster
ZennoPoster is an automation and web scraping tool that caters more to those who aren't necessarily programming experts. Its user-friendly visual interface allows for the creation of scraping scripts and other automated browser tasks.
Price: The tool is priced at $37 per month, but it has a 14-day trial period.
Browser Automation Studio
BAS is another user-friendly browser automation and web scraping program. It comes with built-in script creation tools that allow for data extraction, webpage navigation, and many other functions without any programming knowledge.
Price: The tool is free.
Octoparse
Octoparse is a web scraping application that allows for collecting vast amounts of data from websites effortlessly. With its visual interface, users can specify what data they want to collect, and Octoparse handles the rest.
Price: While one version of this tool is available for free, it has certain restrictions. In the free version, users can have a maximum of 10 tasks stored in their account. All tasks can only be run on local devices using the user's IP. Data export in the free plan is limited to 10,000 rows per export, even though the tool allows for unlimited webpage scans in a single run. It can also be used on any number of devices. However, technical support in this version is limited. Paid versions start at $75 per month.
import.io
import.io is a cloud-based web scraping tool that facilitates creating and running scripts to extract data from websites. It also offers features that automatically structure the collected data and convert it into useful formats like Excel or JSON.
Price: The tool offers a free demo, but the pricing for paid packages starts at $399 per month.
Online web scraping services
Online web scraping works like parsers (component analyzers),but their main advantage is the ability to work online without downloading and installing the program on your computer. The principle of operation of websites offering web scraping online is quite simple. We enter the URL of the page we are interested in, set the necessary settings (you can copy the mobile version of the page and rename all files, the program saves HTML, CSS, JavaScript, fonts) and download the archive. With this service, the webmaster can save any landing page, and then enter their own format and necessary corrections.
Save a Web 2 ZIP
Save a Web 2 ZIP is the most popular website when it comes to web scraping via a browser service. A very simple and thoughtful design attracts and inspires confidence, and everything is completely free. All you need to do is provide the link of the page you want to copy, choose the options you want and it's ready.
LPcopier
LPcopier is a Russian service that targets the affiliate marketing world. The portal allows scraping from about $5 per page. Additional services, such as the installation of analytical meters, are considered separately in terms of cost. It is also possible to order a landing page not from the CPA network or from an already ready landing page. If Russian scares you, just use the translation option that Google offers.
Xdan
The Xdan website is also a Russian website (available in English) offering CopySite, i.e. web scraping services. With the help of this website, you can create a local copy of a landing page for free with the option of cleaning HTML counters, replacing links or domains.
Copysta
The Russian Copyst service is one of the fastest services of this type offered. They declare that they will contact you within 15 minutes. The web scraping itself is done via a link, and for an additional fee you can update the website.
I downloaded the website. What's next?
Have you already downloaded a website? Great, now you'd have to think about what you want to do with it. You certainly want to modify it a bit. How?
How to redesign copied page?
To redesign the copied page for your own needs, you need to duplicate the asset however you like. To make changes to the structure, you can use any editor that allows you to work with the code, such as Visual Studio Code, Notepad ++ (Windows),TextEdit (MacOS),or Sublime Text. Open an editor that is convenient for you, customize the code, then save it and see how the changes are displayed in the browser. Edit the visual appearance of HTML tags through the use of CSS, add web forms, action buttons, links etc. After saving, the modified file will remain on the computer with updated functions, layout and targeted actions.
There are also websites that collect and analyze all design data from specific web archives that have a website creation and management system (CMS). The system creates a duplicate of the project with the admin and disk space. Archivarix is an example of such a website (the program can restore and archive the project).
Uploading websites to hosting
The last and most important step in web scraping of landing pages is uploading them to your hosting. Remember that coping and making small visual changes is not enough. Other people's affiliate links, scripts, replacement pixels, JS Metrica codes, and other counters almost always remain in the page's code. They must be removed manually (or with paid programs) before uploading to your hosting. If you want to know exactly how to upload your website to hosting, check out our article: “How to create a landing page? Creating a website step by step”.
How to defend against web scraping?
Protection against web scraping is essential for maintaining the privacy and security of your website and its data. There are several effective methods you can apply to minimize the risk of web scraping attacks.
- Robots.txt - Using the robots.txt file is a standard way to communicate with search robots. You can specify which parts of your site should be searched and which should not. Although honest bots usually follow these guidelines, it's worth noting that this file doesn't guarantee protection against all scraping bots.
- .htaccess - Through the .htaccess file, you can block access for specific User Agents that may be used by bots. It's one way to reject unwanted bots from accessing your website./li>
- CSRF (Cross-Site Request Forgery) - The CSRF mechanism can be used to secure forms and interactions with your site against automatic scraping. This might involve using CSRF tokens in forms./li>
- IP Address Filtering - You can limit access to your website only for certain IP addresses, which can help minimize web scraping attacks.
- CAPTCHA - Adding CAPTCHA to forms and interactions can make it difficult for bots to interact with your site automatically. It's one of the most popular defenses against automatic scraping./li>
- Rate Limiting with mod_qos on Apache servers - Setting limits for the number of requests from a single IP address within a specified time can limit the possibility of automatically downloading large amounts of data in a short time./li>
- Scrapshield - The Scrapshield service offered by CloudFlare is an advanced tool for detecting and blocking web scraping actions, which can assist in protecting your site./li>
If you've ever noticed that your landing page has fallen victim to web scraping techniques, there is a way to redirect some of the traffic back to your page.
On the Afflift forum, you will find a simple JavaScript code. Place it on your page, and it will protect you from the complete loss of traffic in case of web scraping.
The code can be found in THIS THREAD.
Good to see you here!
We hope that you already know what web scraping is, how to download a web page, and, most importantly, how to comply with copyright laws. Now it's your turn to make your move and start earning. However, if you have any questions about affiliate marketing or you do not know which program to choose, please contact us.