Saturday, 29 June 2013

Data Management Services

In recent studies it has been revealed that any business activity has astonishing huge volumes of data, hence the ideas has to be organized well and can be easily gotten when need arises. Timely and accurate solutions are important in facilitating efficiency in any business activity. With the emerging professional outsourcing and data organizing companies nowadays many services are offered that matches the various kinds of managing the data collected and various business activities. This article looks at some of the benefits that accrue of offered by the professional data mining companies.

Entering of data

These kinds of services are quite significant since they help in converting the data that is needed in high ideal and format that is digitized. In internet some of this data can found that is original and handwritten. In printed paper documents and or text are not likely to contain electronic or needed formats. The best example in this context is books that need to be converted to e-books. In insurance companies they also depend on this process in processing the claims of insurance and at the same time apply to the law firms that offer support to analyze and process legal documents.

EDC

That is referred to as electronic data. This method is mostly used by clinical researchers and other related organization in medical. The electronic data and capture methods are used in the utilization in managing trials and research. The data mining and data management services are given in upcoming databases for studies. The ideas contained can easily be captured, other services being done and the survey taken.

Data changing

This is the process of converting data found in one format to another. Data extraction process often involves mining data from an existing system, formatting it, cleansing it and can be installed to enhance both availability and retrieving of information easily. Extensive testing and application are the requirements of this process. The service offered by data mining companies includes SGML conversion, XML conversion, CAD conversion, HTML conversion, image conversion.

Managing data service

In this service it involves the conversion of documents. It is where one character of a text may need to be converted to another. If we take an example it is easy to change image, video or audio file formats to other applications of the software that can be played or displayed. In indexing and scanning is where the services are mostly offered.

Data extraction and cleansing

Significant information and sequences from huge databases and websites extraction firms use this kind of service. The data harvested is supposed to be in a productive way and should be cleansed to increase the quality. Both manual and automated data cleansing services are offered by data mining organizations. This helps to ensure that there is accuracy, completeness and integrity of data. Also we keep in mind that data mining is never enough.

Web scraping, data extraction services, web extraction, imaging, catalog conversion, web data mining and others are the other management services offered by data mining organization. If your business organization needs such services here is one that can be of great significance that is web scraping and data mining


Source: http://ezinearticles.com/?Data-Management-Services&id=7131758

Thursday, 27 June 2013

Data Extraction Services For Better Outputs in Your Business

Data Extraction can be defined as the process of retrieving data from an unstructured source in order to process it further or store it. It is very useful for large organizations who deal with large amount of data on a daily basis that need to be processed into meaningful information and stored for later use. The data extraction is a systematic way to extract and structure data from scattered and semi-structured electronic documents, as found on the web and in various data warehouses.

In today's highly competitive business world, vital business information such as customer statistics, competitor's operational figures and inter-company sales figures play an important role in making strategic decisions. By signing on this service provider, you will be get access to critivcal data from various sources like websites, databases, images and documents.

It can help you take strategic business decisions that can shape your business' goals. Whether you need customer information, nuggets into your competitor's operations and figure out your organization's performance, it is highly critical to have data at your fingertips as and when you want it. Your company may be crippled with tons of data and it may prove a headache to control and convert the data into useful information. Data extraction services enable you get data quickly and in the right format.

Few areas where Data Extraction can help you are:

    Capturing financial data
    Generating better sales leads
    Conducting market research, survey and analysis
    Conducting product research and analysis
    Track, extract and harvest product pricing data
    Searching for specific job postings
    Duplicating an online database
    Acquiring real estate data
    Processing auction information
    Searching online newspapers for latest pricing information
    Extracting and summarize news stories from online news sources

Outsourcing companies provide custom made data extraction services to the client's requirements. The different types of data extraction services;

    Web extraction
    Database extraction

Outsourcing is the beneficial option for large organizations seeking to manage large information. Outsourcing this services helps businesses in managing their data effectively, which in turn enables business to experience an increase in profits. By outsourcing, you can certainly increase your competitive edge and save costs too!



Source: http://ezinearticles.com/?Data-Extraction-Services-For-Better-Outputs-in-Your-Business&id=2760257

Tuesday, 25 June 2013

Data Recovery: Beginners Tips

Right now you probably in a lot of mental pain, and all you're concerned about is recovering your data as quickly as possible - so we'll refrain from comments on the wisdom of regular back ups. The time for preventative measures has gone - the issue at hand is data recovery.

First - a simple tip could save you a lot of money. Take out your rolodex and get hold of your tech-savvy friends. If you're in luck, they'll offer to help, and if you're really lucky, they might even have some disk recovery software.

If you're out of luck, then get out your wallet or purse out now... because this is going to cost you. Also, be prepared for a lot of time being wasted - data recovery can take a long time.

The first thing to establish is what exactly is wrong with your hard disk:

    Either your computer won't boot up, or
    Your computer boots up OK but you can't see one of your other drives.

Let's see if we can eliminate the worst scenario. Listen closely to your hard drive - is it making any sort of weird noise, such as scratching, scraping, ticking etc?

If so, then your drive is physically damaged and the only hope that you have is to take it to a data recovery service where experts might be able to get your data off for you. These services are expensive and time consuming - so you need to make a judgement call as to the value of data on the disk:

    If it's only your saved game data or downloaded music files you would like back, you're probably better off kicking yourself for not backing up, and accepting the data loss.
    If, on the other hand, it's a book or other type of information product that you've been working on for years, then send it to a data recovery service for an evaluation and quote - it usually costs nothing.


If your hard disk sounds OK, then you stand a decent chance of recovering data yourself.

First you'll need to download some software to help you out.

Unfortunately, the better software utilities are not free, but the good news is that many allow you to try them out to see they can access the data. There are some freeware versions available but generally speaking these are not easy to use - no user interface / little documentation, or they are not very effective.

There's a list of recommended software on our site - http://www.recoverdatafiles.com - compare the different options then download a few of the trial versions.

Your next steps will be based on how your hard drive/s were setup:

    If you only have a single hard drive that has not been partitioned or split into different "logical" drives, you'll probably need to attach the hard drive to another computer that has enough space to store all your data. This can be quite technical so if you don't have the skills please get a computer savvy friend to help out. Another option is to purchase an external USB hard drive case. You can then simply slot the hard drive into the case and plug it into another PC using a USB port.
    If you have a multiple drive setup and your computer boots up fine, then it will merely be a case of getting the downloaded software to read the files and then copy them to another drive - provided you have a drive with enough space on it. If not, you'll need to attach the hard drive to another machine with enough spare capacity.
    The scenario where you have a multiple drive setup, where the problem drive is the one that contains your operating system files is more tricky. Look for a data recovery software package that has a boot disk option available. What this means is that when you start your computer with the boot disk in it, it will automatically run the data recovery program without trying to start windows. You should be able to see your files and then copy them across to another drive.

Hopefully these tips will enable you to get all your important files back.

Once you've had some time to recover, please take a look at the various articles on our website - our goal is to make it one of the best resources on data recovery.


Source: http://ezinearticles.com/?Data-Recovery:-Beginners-Tips&id=59035

Saturday, 22 June 2013

How Web Data Extraction Services Will Save Your Time and Money by Automatic Data Collection

Data scrape is the process of extracting data from web by using software program from proven website only. Extracted data any one can use for any purposes as per the desires in various industries as the web having every important data of the world. We provide best of the web data extracting software. We have the expertise and one of kind knowledge in web data extraction, image scrapping, screen scrapping, email extract services, data mining, web grabbing.

Who can use Data Scraping Services?

Data scraping and extraction services can be used by any organization, company, or any firm who would like to have a data from particular industry, data of targeted customer, particular company, or anything which is available on net like data of email id, website name, search term or anything which is available on web. Most of time a marketing company like to use data scraping and data extraction services to do marketing for a particular product in certain industry and to reach the targeted customer for example if X company like to contact a restaurant of California city, so our software can extract the data of restaurant of California city and a marketing company can use this data to market their restaurant kind of product. MLM and Network marketing company also use data extraction and data scrapping services to to find a new customer by extracting data of certain prospective customer and can contact customer by telephone, sending a postcard, email marketing, and this way they build their huge network and build large group for their own product and company.

We helped many companies to find particular data as per their need for example.

Web Data Extraction

Web pages are built using text-based mark-up languages (HTML and XHTML), and frequently contain a wealth of useful data in text form. However, most web pages are designed for human end-users and not for ease of automated use. Because of this, tool kits that scrape web content were created. A web scraper is an API to extract data from a web site. We help you to create a kind of API which helps you to scrape data as per your need. We provide quality and affordable web Data Extraction application

Data Collection

Normally, data transfer between programs is accomplished using info structures suited for automated processing by computers, not people. Such interchange formats and protocols are typically rigidly structured, well-documented, easily parsed, and keep ambiguity to a minimum. Very often, these transmissions are not human-readable at all. That's why the key element that distinguishes data scraping from regular parsing is that the output being scraped was intended for display to an end-user.

Email Extractor

A tool which helps you to extract the email ids from any reliable sources automatically that is called a email extractor. It basically services the function of collecting business contacts from various web pages, HTML files, text files or any other format without duplicates email ids.

Screen scrapping

Screen scraping referred to the practice of reading text information from a computer display terminal's screen and collecting visual data from a source, instead of parsing data as in web scraping.

Data Mining Services

Data Mining Services is the process of extracting patterns from information. Datamining is becoming an increasingly important tool to transform the data into information. Any format including MS excels, CSV, HTML and many such formats according to your requirements.

Web spider

A Web spider is a computer program that browses the World Wide Web in a methodical, automated manner or in an orderly fashion. Many sites, in particular search engines, use spidering as a means of providing up-to-date data.

Web Grabber

Web grabber is just a other name of the data scraping or data extraction.

Web Bot

Web Bot is software program that is claimed to be able to predict future events by tracking keywords entered on the Internet. Web bot software is the best program to pull out articles, blog, relevant website content and many such website related data We have worked with many clients for data extracting, data scrapping and data mining they are really happy with our services we provide very quality services and make your work data work very easy and automatic.


Source: http://ezinearticles.com/?How-Web-Data-Extraction-Services-Will-Save-Your-Time-and-Money-by-Automatic-Data-Collection&id=5159023

Thursday, 20 June 2013

Why Outsourcing Data Mining Services?

Are huge volumes of raw data waiting to be converted into information that you can use? Your organization's hunt for valuable information ends with valuable data mining, which can help to bring more accuracy and clarity in decision making process.

Nowadays world is information hungry and with Internet offering flexible communication, there is remarkable flow of data. It is significant to make the data available in a readily workable format where it can be of great help to your business. Then filtered data is of considerable use to the organization and efficient this services to increase profits, smooth work flow and ameliorating overall risks.

Data mining is a process that engages sorting through vast amounts of data and seeking out the pertinent information. Most of the instance data mining is conducted by professional, business organizations and financial analysts, although there are many growing fields that are finding the benefits of using in their business.

Data mining is helpful in every decision to make it quick and feasible. The information obtained by it is used for several applications for decision-making relating to direct marketing, e-commerce, customer relationship management, healthcare, scientific tests, telecommunications, financial services and utilities.

Data mining services include:

    Congregation data from websites into excel database
    Searching & collecting contact information from websites
    Using software to extract data from websites
    Extracting and summarizing stories from news sources
    Gathering information about competitors business

In this globalization era, handling your important data is becoming a headache for many business verticals. Then outsourcing is profitable option for your business. Since all projects are customized to suit the exact needs of the customer, huge savings in terms of time, money and infrastructure can be realized.

Advantages of Outsourcing Data Mining Services:

    Skilled and qualified technical staff who are proficient in English
    Improved technology scalability
    Advanced infrastructure resources
    Quick turnaround time
    Cost-effective prices
    Secure Network systems to ensure data safety
    Increased market coverage

Outsourcing will help you to focus on your core business operations and thus improve overall productivity. So data mining outsourcing is become wise choice for business. Outsourcing of this services helps businesses to manage their data effectively, which in turn enable them to achieve higher profits.


Source: http://ezinearticles.com/?Why-Outsourcing-Data-Mining-Services?&id=3066061

Wednesday, 19 June 2013

Data Entry Services Help Your Business Flow Smoothly

A business comes into existence with the sole motive of earning profits and a business owner will take all steps within his means to ensure that work keeps on flowing smoothly and the optimum utilization of resources takes place. Every division in the organization is created with the objective of catalyzing the growth and not causing a hindrance to the progress of the business. Hence it is important to consider each division carefully and analyze if any further optimization can be undertaken at any level. The finance division of a business is one of the most crucial aspects of any organization. It is responsible for maintaining a check and keeping a record of each and every transaction that takes place in the day to day running of the business by data entry services provided by professionals or in-house accounts personnel. This ensures that necessary information regarding the plans; strategies and policies of the organization are available at a moment's notice to facilitate decision-making by the senior management.

Data entry services by professionals appointed for this task play a crucial role in running a business successfully. It makes a major difference in the performance standards of any business. Outsourcing a competent firm for providing your business with data entry services helps you in optimization of resources that were earlier being invested in the accounts department to take care of this crucial need of the business. Data entry services provided by experienced professionals help your business to save time and money and help the organization to increase the pace of regular business activities. The other competitive advantage provided by the data entry services include the ready availability of accurate and authentic at any given point that helps to facilitate decision making for profit creation and expansion of the business. Accurate data maintained on a daily basis and transferred online to the organization help the business to keep track of each expense incurred and profit gained thereby enabling the business to chart out the next course of action.

Data entry services are provided by professionally competent firms who hire experienced individuals to cater to the requirements of every individual client. The data entry services are usually provided round the clock to ensure that the client does not have to wait or face delays when the data is urgently required. The data entry services are provided by vendors who have years of experience, advanced technology and software to carry out the work and required flexibility to accommodate the needs of the client. It is therefore a viable option for any business irrespective of whether it is small or a big corporation. Data entry services, though not complex in nature, but are highly time consuming and this is the prime reason why companies need to outsource this service to cut down on the cost spend on hiring data entry professionals on the company payroll. The data entry services provided by a reputed vendor will ensure that you have highly accurate data properly accumulated for your reference while the confidentiality of your data is also assured. Hence outsourcing data entry services might be the best option for any business in this competitive world.



Source: http://ezinearticles.com/?Data-Entry-Services-Help-Your-Business-Flow-Smoothly&id=641783

Friday, 14 June 2013

An Easy Way For Data Extraction

There are so many data scraping tools are available in internet. With these tools you can you download large amount of data without any stress. From the past decade, the internet revolution has made the entire world as an information center. You can obtain any type of information from the internet. However, if you want any particular information on one task, you need search more websites. If you are interested in download all the information from the websites, you need to copy the information and pate in your documents. It seems a little bit hectic work for everyone. With these scraping tools, you can save your time, money and it reduces manual work.

The Web data extraction tool will extract the data from the HTML pages of the different websites and compares the data. Every day, there are so many websites are hosting in internet. It is not possible to see all the websites in a single day. With these data mining tool, you are able to view all the web pages in internet. If you are using a wide range of applications, these scraping tools are very much useful to you.

The data extraction software tool is used to compare the structured data in internet. There are so many search engines in internet will help you to find a website on a particular issue. The data in different sites is appears in different styles. This scraping expert will help you to compare the date in different site and structures the data for records.

And the web crawler software tool is used to index the web pages in the internet; it will move the data from internet to your hard disk. With this work, you can browse the internet much faster when connected. And the important use of this tool is if you are trying to download the data from internet in off peak hours. It will take a lot of time to download. However, with this tool you can download any data from internet at fast rate.There is another tool for business person is called email extractor. With this toll, you can easily target the customers email addresses. You can send advertisement for your product to the targeted customers at any time. This the best tool to find the database of the customers.

However, there are some more scraping tolls are available in internet. And also some of esteemed websites are providing the information about these tools. You download these tools by paying a nominal amount.


Source: http://ezinearticles.com/?An-Easy-Way-For-Data-Extraction&id=3517104

Wednesday, 12 June 2013

Yellow Pages Data Scraping

If your are in business you may or may not understand the importance of data scraping. Just the process itself will save you time in your business. When seeking information for making business decisions, you have to do a lot of research. This can be in the form of making phone calls, to spending many man hours online. Even with all of this hard work, you cannot be sure the information you seek is the right information.

When you have access to a company like us, these worries are a thing of the past. We are a very professional and courteous web scraping company. We specialize in finding high quality information about potential customers and clients. We have unique and different methods that make us second to none in the industry of web scraping.

Beware of companies that claim to provide data scraping services that are not the real deal. Many of these companies charge a lot for very little work, and it is also low quality. We strive to separate ourselves from these unfavorable business practices.

We possess the ability to help you get the most from data scraping. We have a team of people with backgrounds and lots of experience in the web scraping industry. There is nothing we have not seen, and no task we cannot handle. This is the reason we continue to be a leader and set the standards for high quality data scraping services.

If you are looking for Yellow Pages Data Scraping, you have come to right place. We specialize in Yellow Pages Data Scraping, and have special techniques to compile that data quickly for you. If the information is coming to fast, we know how to slow it down, and when it is not coming fast enough we know how to speed it up.

You need to come experience the benefits of quality web scraping and what it can do for your business. Once you see the benefits of data scraping you will make this a important focus in the company. Doing business with us is like playing to WIN



Source: http://thewebscraping.com/yellow-pages-data-scraping/

Monday, 10 June 2013

Yellowpages Data Scraping

While driving on a long trip this weekend, I had a bit of time to think. One topic that came to my mind was screen scraping, with a focus on APIs. It hit me: screen scraping is more of a problem with the content producer than it is with the “unauthorized scraping” application.

Screen scraping is the process of taking information that is rendered on the client, and then transforming the information in another process. Typically, the information that is obtained is later processed for filtering, saving, or making a calculation on the information. Everyone has performed some [legitimate form] of screen scraping. When you print a web page, the content is reformatted to be printed. Many of the unauthorized formats of screen scraping have been collecting information on current gambling games [poker, etc], redirecting capchas, and collecting airline fare/availability information.

The scrapee’s [the organization that the scraper is targeting] argument against the process is typically a claim that the tool puts an unusual demand on their service. Typically this demand does not provide them with their usual predictable probability of profit that they are used to. Another argument is that the scraper provides an unfair advantage to other users on the service. In most cases, the scrapee fights against this in legal or technical manners. A third argument is that the content is being misappropriated, or some value is being gained by the scraper and defrauded from the scrapee.

The problem I have with the fighting back against scrapers, is that it never solves the problem that the scrapers try to fix. Let’s take a few examples to go over my point: the KVS tool, TV schedules, and poker bots. The KVS tool uses [frequently updated] plugins to scrape airline sites to get accurate pricing and seat availability details. The tool is really good for people that want to get a fair bit of information on what fares are available and when. It does not provide any information that was not provided by anyone else. It just made many more queries than most people can do manually. Airlines fight against this because they make a lot of money on uninformed users. Their business model is to guarantee that their passengers are not buying up cheap seats. When an airline claims that they have a “lowest price guarantee” that typically means that they show the discount tickets for as long as possible, until they’re gone.

Another case where web scraping has caused another issue is with TV schedules. With the MythTV craze a few years ago, many open source users were using MythTV to record programs via their TV card. It’s a great technology, however the schedule is not provided in the cable TV feed, at least in an unencrypted manner. Users had to resort to scrapping television sites for publicly available “copyrighted” schedules.

The Poker-bots are a little bit of an ethical issue. This is something that differs from the real world rules of the game. When playing poker outside of the internet, players do not have access to real-time statistic tools. Online poker providers aggressively fight against the bots. It makes sense; bots can perform the calculations a lot faster than humans can.

Service providers try to block scrapers in a few different ways. The end of the Wikipedia article lists more; this is a shortened version. Web sites try to deny/misinform scrapers in a few manners: profile the web request traffic (clients that have difficulty with cookies, and do not load JavaScript/images are big warning signs), block the requesting provider, provide “invisible false data” (honeypot-like paths on the content), etc. Application-based services [Pokerbots] are more focused on trying to look for processes that may influence the running executable, securing the internal message handling, and sometimes record the session (also typically done on MMORPGs)

In the three cases, my point is not to argue why the service is justified in attempting to block them, my point is that the service providers are ignoring an untapped secondary market. Those service providers have refused to address the needs of this market – or maybe just haven’t seen the market as viable, and are merely ignoring it.

If people wish to make poker bots, create a service that allows just the bots to compete against each other. The developers of these bots are [generally] interested in the technology, not so much the part about ripping-off non-bot users.

For airlines, do not try to hide your data. Open up API keys for individual users. If an individual user is trying to abuse the data to resell it, to create a Hipmunk/Kayak clone, revoke the key. Even if the individual user’s service request don’t fit the profile; there are ways of catching this behavior. Mapmakers have solved this problem a long time ago by creating trap streets. Scrapers are typically used as a last resort, they’re used to do something that the current process is made very difficult to do.

Warning more ranting: with airline sites, it’s difficult to get a very good impression on the cost differences of flying to different markets [like flying from Greensboro rather than Charlotte] or even changing tickets, so purchasing from an airline is difficult without the aid of this kind of tool. Most customers want to book a single round trip ticket, but some may have a complex itinerary that will have them leaving Charlotte stopping over in Texas, then to San Francisco, and then returning to Texas and flying back to my original destination. That could be accomplished by purchasing separate round trip tickets, but the rules of the tickets allow such combinations to exist on a single literary. Why not allow your users to take advantage of these rules [without the aid of a costly customer service representative]?

People who use scrapers do not represent the majority of the service’s customers. In the case of the television schedules example, they do not profit off the information, and the content that they wished to retrieve wasn’t even motivated by profit. Luckily, an organization stepped in and provided this information at a reasonable [$25/yr] cost. The organization is SchedulesDirect.

The silver lining to the battle on scrapers can get interesting. The PokerClients have prompted scraper developers to come up with clever solutions. The “Coding the Wheel” blog has an interesting article about this and how they inject DLLs into running applications, use OCR, and abuse Windows Message Handles [again of another process]. Web scraping introduces interesting topics that deal with machine learning [to create profiles], and identifying usage patterns.

In conclusion, solve the issue that the screen scrapers attempt to solve, and if you have a situation like poker, prevent the behavior you wish to deny.


Source: http://theexceptioncatcher.com/blog/2012/07/how-to-get-rid-of-screen-scrapers-from-your-website/

Thursday, 6 June 2013

Web Scraping and Data Extraction Service

Web Scraping is where data from websites is automatically / manually collected and then converted into structured data. It is the fastest method and the most expedient way to extracting information from websites with custom timescale.

Web Scraping Services include, but not limit to:

    Web scraping (Content / Images) and information restructure working for specific business purposes;
    Provide large databases for website applications;
    Data Mining (Text / HTML / Website):
                 o  Large chunk of texts;
                 o  Data from multiple sites;
    Crawl and pull data from different sources to create search engines;
    Automated information collection in quick time cycle;
    Data migration.
    Content Scraping for new Forums , websites:  easier to build new website or forum by scraping content from other sites

Our Web Scraping Service is simple, productive, fast, and comprehensive. Our customers can be sure that no matter what structures and difficulties the targeted sites can be, our web scraping service will still lead to the same brilliant results (comprehensive size, amount of records including text, content, images, PDFs, and others).

Samples of Reports for Web Scraping Services ( updating .. )

» Real Estate Data Extraction
» Extract Store Details
» University's Web Data Scraping
» Extract Product Description
» Scraping Business Directory
» Yellow Pages Scraping
» Price Grabber Data Extraction
» Scraping Property Information
» Amazon Product Extraction
» Download Product Images
» Automate osCommerce Product Upload
» Scraping Business Contact
» Craigslist Posting Service
» Imdb Data Extraction
» Meta Data Extraction
» Scraping From Dynamic Pages
» Extract Lyrics Data
» Email Scraping & Extraction
» Scraping Customer List
» Scraping Data From WebSite

We guarantees a knowledgeable team with proficient skills and experience in order to deliver excellent data analysis and information restructure by using our web scraping service.



Source: http://globolstaff.com/web-scraping-and-data-extraction-service.html

Wednesday, 5 June 2013

Screen scraping sales from Createspace with Zend_Http_Clien

More and more of our data is hidden behind login forms in online apps. When this data updates frequently, and the site provides no API to access the information, keeping on top of it can be a laborious task.

One such example is Createspace. Createspace are a company who provide produce-on-demand manufacturing for products such as books, DVDs and CDs. This allows individuals and smaller publishers to get their products to the market without investing in heavy up front printing costs. Any orders for the product go directly to Createspace, they manufacture and ship the product and finally allocate the profit to the seller.

I have recently been involved in helping to get a book to market and am using Createspace’s services to produce the book. Keeping track of sales, however, is time consuming due to having to login to Createspace each time and navigate to the relevant area to retrieve the data. The book is also being produced by another company in the UK which have a similar setup meaning now twice the time is required each time I wish to check for sales.

So what to do with no API? No real choice but to screen scrape. Presented here is a quick script I knocked up using PHP and the Zend Framework to scrape sales data from Createspace. Whilst the implementation is Createspace specific, the general process is not and so I hope this will be give some pointers for similar tasks.

To use this we’re using the Zend_Http_Client from the Zend Framework. This offers similar functionality to the basic PHP cURL extension but in a nicer (IMHO) API. The basic (generic) steps required are:

    Post required credential details to the application login URL
    Store any authentication details (likely a session cookie) sent back from the process
    Use the obtained credentials to retrieve the page we wish to scrape the data from
    Sprinkle some regex magic on the retrieved HTML to extract the figures we require

Here’s the script:

    //
    // General config
    //

    // revenue per produce (so we can calculate totals)
    define('REVENUE_PER_PRODUCT', 100.00);

    // login url and credentials
    define('CREATESPACE_LOGIN_URL', 'https://www.createspace.com/LoginProc.do');
    define('CREATESPACE_LOGIN_EMAIL', 'email');
    define('CREATESPACE_LOGIN_PASSWORD', 'password');

    // reports url
    define('CREATESPACE_REPORTS_URL', 'https://www.createspace.com/Member/Report/MemberReport.do');

    //
    // Retrieve CreateSpace sales data
    //
    $obj_client = new Zend_Http_Client(CREATESPACE_LOGIN_URL);

    // fake the useragent in the request to make it look more authentic
    $obj_client->setConfig(array(
        'useragent' => 'Mozilla/5.0 (X11; U; Linux i686; en-GB; rv:1.9.0.6) Gecko/2009020911 Ubuntu/8.10 (intrepid) Firefox/3.0.6'
    ));

    // we want to retrieve any cookies posted back to us to use in the next step
    $obj_client->setCookieJar();

    // login parameters (entries to the login form fields)
    $obj_client->setParameterPost(array(
       'login' => CREATESPACE_LOGIN_EMAIL,
       'password' => CREATESPACE_LOGIN_EMAIL,
       'action' => 'Log In'
    ));

    // send the POST data
    $obj_client->request('POST');

    // we're now "logged in" so we can retrieve the reports page
    $obj_client->setUri(CREATESPACE_REPORTS_URL);
    $obj_client->request('GET');

    // extract the content from the request
    $str_page = $obj_client->getLastResponse()->getBody();

    //
    // Now we have the raw report HTML, it's simply a case of extracting the sales figures
    //

    // first grab the table data rows (tbody)
    preg_match('/<table .*?id="units".*?<tbody>(.*?)< \/tbody>< \/table>/is', $str_page, $arr_matches);
    $str_table_body = $arr_matches[1];

    // then extract each row's data
    preg_match_all('/<tr class=".*?">.*?<td>(.*?)< \/td>.*?</td><td>-< \/td>.*?</td><td>-< \/td>.*?</td><td>-< \/td>.*?</td><td>(.*?)< \/td>/is', $str_table_body, $arr_matches, PREG_SET_ORDER);

    // merge into a more sane array, indexed by date in the form Ym (e.g. 200901 for January, 2009)
    $arr_data = array();

    foreach($arr_matches as $arr_match) {
        $str_date = date('Ym', strtotime($arr_match[1]));
        if(!isset($arr_data[$str_date])) {
            $arr_data[$str_date] = array();
        }

        $int_volume = (int)$arr_match[2];

        $arr_data[$str_date] = array('volume' => $int_volume, 'revenue' => $int_volume * REVENUE_PER_BOOK);
    }

    // $arr_data now contains sales data (volume and revenue) for each month found in the sales table, indexed by
    // the month
    print_r($arr_data);

A few things to note. Firstly, we’re faking the useragent to a generic “real looking” example (as opposed to the default “Zend_Http_Client”). Morally we’re doing nothing wrong here but I suspect “automated crawling” is frowned upon in the T&Cs somewhere so best not to make it too obvious.

It should also be mentioned that this method (like all screen scraping) is vulnerable to breaking if Createspace change their login system or HTML structure. There are certainly cleverer parsing methods that can be employed which are more adaptable to change but only up to a point. There’s not a lot you can do if things dramatically change except for adapting the script to accommodate.


Source: http://2tap.com/2009/04/29/screen-scraping-book-sales-from-createspace-with-zend_http_client/