Top 10 Big Data Trends for 2017 : Tableau whitepaper

  1. Big data becomes fast and approachable: Options expand to speed up Hadoop Sure, you can perform machine learning and conduct sentiment analysis on Hadoop, but the first question people often ask is: How fast is the interactive SQL? SQL, after all, is the conduit to business users who want to use Hadoop data for faster, more repeatable KPI dashboards as well as exploratory analysis. This need for speed has fueled the adoption of faster databases like Exasol and MemSQL, Hadoop-based stores like Kudu, and technologies that enable faster queries. Using SQL-onHadoop engines (Apache Impala, Hive LLAP, Presto, Phoenix, and Drill) and OLAP-on-Hadoop technologies (AtScale, Jethro Data, and Kyvos Insights), these query accelerators are further blurring the lines between traditional warehouses and the world of big data.
  2. Big data no longer just Hadoop: Purpose-built tools for Hadoop become obsolete In previous years, we saw several technologies rise with the big-data wave to fulfill the need for analytics on Hadoop. But enterprises with complex, heterogeneous environments no longer want to adopt a siloed BI access point just for one data source (Hadoop). Answers to their questions are buried in a host of sources ranging from systems of record to cloud warehouses, to structured and unstructured data from both Hadoop and non-Hadoop sources. (Incidentally, even relational databases are becoming big data-ready. SQL Server 2016, for instance, recently added JSON support.) In 2017, customers will demand analytics on all data. Platforms that are data- and source-agnostic will thrive while those that are purpose-built for Hadoop and fail to deploy across use cases will fall by the wayside. The exit of Platfora serves as an early indicator of this trend.
  3. Organizations leverage data lakes from the get-go to drive value A data lake is like a man-made reservoir. First you dam the end (build a cluster), then you let it fill up with water (data). Once you establish the lake, you start using the water (data) for various purposes like generating electricity, drinking, and recreating (predictive analytics, ML, cyber security, etc.). Up until now, hydrating the lake has been an end in itself. In 2017, that will change as the business justification for Hadoop tightens. Organizations will demand repeatable and agile use of the lake for quicker answers. They’ll carefully consider business outcomes before investing in personnel, data, and infrastructure. This will foster a stronger partnership between the business and IT. And self-service platforms will gain deeper recognition as the tool for harnessing big-data assets.
  4. Architectures mature to reject one-size-fits all frameworks Hadoop is no longer just a batch-processing platform for data-science use cases. It has become a multi-purpose engine for ad hoc analysis. It’s even being used for operational reporting on day-to-day workloads—the kind traditionally handled by data warehouses. In 2017, organizations will respond to these hybrid needs by pursuing use case-specific architecture design. They’ll research a host of factors including user personas, questions, volumes, frequency of access, speed of data, and level of aggregation before committing to a data strategy. These modern-reference architectures will be needs-driven. They’ll combine the best self-service data-prep tools, Hadoop Core, and end-user analytics platforms in ways that can be reconfigured as those needs evolve. The flexibility of these architectures will ultimately drive technology choices.
  5. Variety, not volume or velocity, drives big-data investments Gartner defines big data as the three Vs: high-volume, highvelocity, high-variety information assets. While all three Vs are growing, variety is becoming the single biggest driver of big-data investments, as seen in the results of a recent survey by New Vantage Partners. This trend will continue to grow as firms seek to integrate more sources and focus on the “long tail” of big data. From schema-free JSON to nested types in other databases (relational and NoSQL), to non-flat data (Avro, Parquet, XML), data formats are multiplying and connectors are becoming crucial. In 2017, analytics platforms will be evaluated based on their ability to provide live direct connectivity to these disparate sources.
  6. Spark and machine learning light up big data Apache Spark, once a component of the Hadoop ecosystem, is now becoming the big-data platform of choice for enterprises. In a survey of data architects, IT managers, and BI analysts, nearly 70% of the respondents favored Spark over incumbent MapReduce, which is batch-oriented and doesn’t lend itself to interactive applications or real-time stream processing. These big-compute-on-big-data capabilities have elevated platforms featuring computation-intensive machine learning, AI, and graph algorithms. Microsoft Azure ML in particular has taken off thanks to its beginner-friendliness and easy integration with existing Microsoft platforms. Opening up ML to the masses will lead to the creation of more models and applications generating petabytes of data. As machines learn and systems get smart, all eyes will be on self-service software providers to see how they make this data approachable to the end user
  7. The convergence of IoT, cloud, and big data create new opportunities for self-service analytics It seems that everything in 2017 will have a sensor that sends information back to the mothership. IoT is generating massive volumes of structured and unstructured data, and an increasing share of this data is being deployed on cloud services. The data is often heterogeneous and lives across multiple relational and non-relational systems, from Hadoop clusters to NoSQL databases. While innovations in storage and managed services have sped up the capture process, accessing and understanding the data itself still pose a significant last-mile challenge. As a result, demand is growing for analytical tools that seamlessly connect to and combine a wide variety of cloud-hosted data sources. Such tools enable businesses to explore and visualize any type of data stored anywhere, helping them discover hidden opportunity in their IoT investment.
  8. Self-service data prep becomes mainstream as end users begin to shape big data Making Hadoop data accessible to business users is one of the biggest challenges of our time. The rise of self-service analytics platforms has improved this journey. But business users want to further reduce the time and complexity of preparing data for analysis, which is especially important when dealing with a variety of data types and formats. Agile self-service data-prep tools not only allow Hadoop data to be prepped at the source but also make the data available as snapshots for faster and easier exploration. We’ve seen a host of innovation in this space from companies focused on enduser data prep for big data such as Alteryx, Trifacta, and Paxata. These tools are lowering the barriers to entry for late Hadoop adopters and laggards and will continue to gain traction in 2017.
  9. Big data grows up: Hadoop adds to enterprise standards We’re seeing a growing trend of Hadoop becoming a core part of the enterprise IT landscape. And in 2017, we’ll see more investments in the security and governance components surrounding enterprise systems. Apache Sentry provides a system for enforcing fine-grained, role-based authorization to data and metadata stored on a Hadoop cluster. Apache Atlas, created as part of the data governance initiative, empowers organizations to apply consistent data classification across the data ecosystem. Apache Ranger provides centralized security administration for Hadoop. Customers are starting to expect these types of capabilities from their enterprise-grade RDBMS platforms. These capabilities are moving to the forefront of emerging big-data technologies, thereby eliminating yet another barrier to enterprise adoption.
  10. Rise of metadata catalogs helps people find analysis-worthy big data For a long time, companies threw away data because they had too much to process. With Hadoop, they can process lots of data, but the data isn’t generally organized in a way that can be found. Metadata catalogs can help users discover and understand relevant data worth analyzing using self-service tools. This gap in customer need is being filled by companies like Alation and Waterline which use machine learning to automate the work of finding data in Hadoop. They catalog files using tags, uncover relationships between data assets, and even provide query suggestions via searchable UIs. This helps both data consumers and data stewards reduce the time it takes to trust, find, and accurately query the data. In 2017, we’ll see more awareness and demand for self-service discovery, which will grow as a natural extension of self-service analytics.

Source: https://www.tableau.com/

AWS and Elastic Map Reduce (EMR) Netflix

Why Use Elastic MapReduce (EMR)?

EMR enables businesses, researchers, data analysts, and developers to easily and cost-effectively process vast amounts of data.

By reducing the cost and complexity of analyzing huge data sets, EMR also enables greater experimentation and innovation

Case Study: Netflix

50 billion daily events coming from netflix-enabled televisions, mobile devices, and laptops. How do you collected and store all of that data?

Netflix streams 8 TB of data into the cloud per day. This is collected, aggregated, and pushed to Amazon S3 via a fleet of EC2 servers running Apache Chukwa.

The processed data is then streamed back into Amazon S3 where it is accessible by other teams including personalisation/recommendation services.

The processed data is then streamed back into Amazon S3 where it is accessible by other teams including personalization/recommendation services and to analysts through a real-time custom visualization tool called Sting

For Netflix, they can run their prod cluster with 300 nodes during the day…

And expand it to 400+ on the evening and weekend. Also using EMR’s alarming capabilities, this can be setup to be done automatically based on the load on the cluster.

And for jobs with specific hardware or capacity requirements, analysts can spin-up their own query clusters, again streaming from the same data source.

Test-tube data (Storing information in DNA)

The Economist Jan 26th 2013
LIKE all the best ideas, this one was born in a pub. Nick Goldman and Ewan Birney of the European Bioinformatics Institute (EBI) near Cambridge, were pondering what they could do with the torrent of genomic data their research group generates, all of which has to be archived.The volume of data is growing faster than the capacity of the hard drives used to hold it. “That means the cost of storage is rising, but our budgets are not,” says Dr Goldman. Over a few beers, the pair began wondering if artificially constructed DNA might be one way to store the data torrent generated by the natural stuff. After a few more drinks and much scribbling on beer mats, what started out as a bit of amusing speculation had turned into the bones of a workable scheme. After some fleshing out and a successful test run, the full details were published this week in Nature.

The idea is not new. DNA is, after all, already used to store information in the form of genomes by every living organism on Earth. Its prowess at that job is the reason that information scientists have been trying to co-opt it for their own uses. But this has not been without problems.

Dr Goldman’s new scheme is significant in several ways. He and his team have managed to set a record (739.3 kilobytes) for the amount of unique information encoded. But it has been designed to do far more than that. It should, think the researchers, be easily capable of swallowing the roughly 3 zettabytes (a zettabyte is one billion trillion or 10²¹ bytes) of digital data thought presently to exist in the world and still have room for plenty more. It would do so with a density of around 2.2 petabytes (10¹⁵) per gram; enough, in other words, to fit all the world’s digital information into the back of a lorry. Moreover, their method dramatically reduces the copying errors to which many previous DNA storage attempts have been prone.

Faithful reproduction

The trick to this fidelity lies in the way the researchers translate their files from the hard drive to the test tube. DNA uses four chemical “bases”—adenosine (A), thymine (T), cytosine (C) and guanine (G)—to encode information. Previous approaches have often mapped the binary 1s and 0s used by computers directly onto these bases. For instance, A and C might represent 0, while G and T signify 1. The problem is that sequences of 1s or 0s in the source code can generate repetition of a single base in the DNA (say, TTTT). Such repetitions are more likely to be misread by DNA-sequencing machines, leading to errors when reading the information back.

The team’s solution was to translate the binary computer information into ternary (a system that uses three numerals: 0, 1 and 2) and then encode that information into the DNA. Instead of a direct link between a given number and a particular base, the encoding scheme depends on which base has been used most recently (see table). For instance, if the previous base was A, then a 2 would be represented by T. But if the previous base was G, then 2 would be represented by C. Similar substitution rules cover every possible combination of letters and numbers, ensuring that a sequence of identical digits in the data is not represented by a sequence of identical bases in the DNA, helping to avoid mistakes.

The code then had to be created in artificial DNA. The simplest approach would be to synthesise one long DNA string for every file to be stored. But DNA-synthesis machines are not yet able to do that reliably. So the researchers decided to chop their files into thousands of individual chunks, each 117 bases long. In each chunk, 100 bases are devoted to the file data themselves, and the remainder used for indexing information that records where in the completed file a specific chunk belongs. The process also contains the DNA equivalent of the error-detecting “parity bit” found in most computer systems.

To provide yet more tolerance for mistakes, the researchers chopped up the source files a further three times, each in a slightly different, overlapping way. The idea is to ensure that each 25-base quarter of a 100-base chunk was also represented in three other chunks of DNA. If any copying errors did occur in a particular chunk, it could be compared against its three counterparts, and a majority vote used to decide which was correct. Reading the chunks back is simply a matter of generating multiple copies of the fragments using a standard chemical reaction, feeding these into a DNA-sequencing machine and stitching the files back together.

When the scheme was tested, it worked almost as planned. The researchers were able to encode and decode five computer files, including an MP3 recording of part of Martin Luther King’s “I have a dream” speech and a PDF version of the 1953 paper by Francis Crick and James Watson describing the structure of DNA. The one glitch was that, despite all the precautions, two 25-base segments of the DNA paper went missing. The problem was eventually traced to a combination of a quirk of DNA chemistry and another quirk in the machines used to do the synthesis. Dr Goldman is confident that a tweak to their code will avoid the problem in future.

There are downsides to DNA as a data-storage medium. One is the relatively slow speed at which data can be read back. It took the researchers two weeks to reconstruct their five files, although with better equipment it could be done in a day. Beyond that, the process can be sped up by adding more sequencing machines.

Ironically, then, the method is not suitable for the EBI’s need to serve up its genome data over the internet at a moment’s notice. But for less intensively used archives, that might not be a problem. One example given is that of CERN, Europe’s biggest particle-physics lab, which maintains a big archive of data from the Large Hadron Collider.

Store out of direct sunlight

The other disadvantage is cost. Dr Goldman estimates that, at commercial rates, their method costs around $12,400 per megabyte stored. That is millions of times more than the cost of writing the same data to the magnetic tape currently used to archive digital information. But magnetic tapes degrade and must be replaced every few years, whereas DNA remains readable for tens of thousands of years so long as it is kept somewhere cool, dark and dry—as proved by the recovery of DNA from woolly mammoths and Neanderthals.

The longer you want to store information, then, the more attractive DNA becomes. And the cost of sequencing and synthesising DNA is falling fast. The researchers reckon that, within a decade, that could make DNA competitive with other methods for (infrequently-used) archives designed to last fifty years or more.

There is one final advantage in using DNA. Modern, digital storage technologies tend to come and go: just think of the fate of the laser disc, for example. In the early 2000s NASA, America’s space agency, was reduced to trawling around internet auction sites in order to find old-style eight-inch floppy drives to get at the data it had laid down in the 1960s and 1970s. But, says Dr Goldman, DNA has endured for more than 3 billion years. So long as life—and biologists—endure, someone should know how to read it.

Amazon Elastic Block Store (Amazon EBS)

Amazon Elastic Block Store (Amazon EBS) provides persistent block storage volumes for use with Amazon EC2 instances in the AWS Cloud. Each Amazon EBS volume is automatically replicated within its Availability Zone to protect you from component failure, offering high availability and durability. Amazon EBS volumes offer the consistent and low-latency performance needed to run your workloads. With Amazon EBS, you can scale your usage up or down within minutes – all while paying a low price for only what you provision.

Amazon EBS allows you to create storage volumes and attach them to Amazon EC2 instances. Once attached, you can create a file system on top of these volumes, run a database, or use them in any other way you would use block storage. Amazon EBS volumes are placed in a specific Availability Zone, where they are automatically replicated to protect you from the failure of a single component. All EBS volume types offer durable snapshot capabilities and are designed for 99.999% availability.

Amazon EBS provides a range of options that allow you to optimize storage performance and cost for your workload. These options are divided into two major categories: SSD-backed storage for transactional workloads such as databases and boot volumes (performance depends primarily on IOPS) and HDD-backed storage for throughput intensive workloads such as MapReduce and log processing (performance depends primarily on MB/s).

SSD-backed volumes include the highest performance Provisioned IOPS SSD (io1) for latency-sensitive transactional workloads and General Purpose SSD (gp2) that balance price and performance for a wide variety of transactional data. HDD-backed volumes include Throughput Optimized HDD (st1) for frequently accessed, throughput intensive workloads and the lowest cost Cold HDD (sc1) for less frequently accessed data.

Elastic Volumes is a feature of Amazon EBS that allows you to dynamically increase capacity, tune performance, and change the type of live volumes with no downtime or performance impact. This allows you to easily right-size your deployment and adapt to performance changes.

For upfront fast performing websites, applications or databases the combo of Amazons EC2 and EBS is a class leading performing platform. Amazon understand customers and business requirements and they class themselves as the most customer centric company on earth, with services like this maybe they are!.

Amazons S3 storage

Amazon S3

S3, the Simple Storage Service, is a reliable, fast and cheap way to store “stuff” on the Internet. S3 can be used to store just about anything: XML documents, binary data, images, videos, or whatever else our customers want to store.

Amazon Simple Storage Service (Amazon S3)

  • Object storage with a simple web service interface to store and retrieve any amount of data from anywhere on the web.
  • It is designed to deliver 99.999999999% durability, and scale past trillions of objects worldwide.
  • Business use S3 as primary storage for cloud-native applications; as a bulk repository, or “data lake,” for analytics; as a target for backup & recovery and disaster recovery; and with server less computing.
  • It’s simple to move large volumes of data into or out of S3
  • Once data is stored in Amazon S3, it can be automatically tiered into lower cost, longer-term cloud storage classes like S3 Standard – Infrequent Access and Amazon Glacier for archiving

About Amazon S3

Amazon S3 stores data as objects within resources called “buckets”. You can store as many objects as you want within a bucket, and write, read, and delete objects in your bucket. Objects can be up to 5 terabytes.

You can control access to the bucket (who can create, delete, and retrieve objects in the bucket for example), view access logs for the bucket and its objects, and choose the AWS region where a bucket is stored to optimize for latency, minimize costs, or address regulatory requirements.

Features

Amazon S3 is designed as a complete storage platform. Consider the ownership value included with every GB.

Simplicity. Amazon S3 is built for simplicity, with a web-based management console, mobile app, and full REST APIs and SDKs for easy integration with third party technologies.

Durability. Amazon S3 is available in regions around the world, and includes geographic redundancy within each region as well as the option to replicate across regions. In addition, multiple versions of an object may be preserved for point-in-time recovery.

Scalability. Customers around the world depend on Amazon S3 to safeguard trillions of objects every day. Costs grow and shrink on demand, and global deployments can be done in minutes. Industries like financial services, healthcare, media, and entertainment use it to build big data, analytics, transcoding, and archive applications.

Security. Amazon S3 supports data transfer over SSL and automatic encryption of your data once it is uploaded. You can also configure bucket policies to manage object permissions and control access to your data using AWS Identity and Access Management (IAM).

Broad integration with other AWS services for security (IAM and KMS), alerting (CloudWatch, CloudTrail and Event Notifications), computing (Lambda), and database (EMR, Redshift), designed to integrate directly with Amazon S3.

Cloud Data Migration options. AWS storage includes multiple specialized methods to help you get data into and out of the cloud.

Enterprise-class Storage Management. S3 Storage Management features allow you to take a data-driven approach to storage optimization, data security, and management efficiency.

Amazon S3 video

What is Cloud Computing and why bring your business there??

What is Cloud Computing and why bring your business there??

From running applications that share photos to mobile users or if you are supporting critical operations the cloud platform provides instant access to elastic and low cost IT resources. Cloud computing mean you don’t need to make large upfront investments in hardware. Instead, you can spin up exactly the right type and size of computing resources you need to power your IT infrastructure, accessing as many resources as you need you only pay for what you use. Exactly like you household utilities

Cloud Computing and how it Works?

Cloud services provide a simple way to access servers, storage, databases and applications services via the Internet.  Cloud services platforms like Amazon Web Services (AWS) and Microsoft Azure own and maintain networks and the  hardware required to power these services, you simply configure what you need!!

Benefits of the cloud – Benefit from massive economies of scale icon

Cost

Using cloud computing means you can operate lower variable cost than maintaining and scaling your own hardware to meet business demands during peak and off peak times, scale up and down!

Future Requirements…..Limitations

Having to estimate your infrastructure capacity needs means you must spend on what you “think” may be the business requirement, this means you often either end up sitting on expensive idle resources or dealing with limited capacity. Cloud services gives you the ability to access as much or as little as you need, and scale up and down as required.

Performance

In a cloud computing environment, new IT resources spun up instantly, which means you reduce the time it takes to make those resources available to your developers from weeks to just minutes. This results in a dramatic increase in agility for the organization, since the cost and time it takes to experiment and develop new applications is significantly lower.

Physical benefits

No more costs on running and maintaining data centers or PoP sites

Cloud providers let you focus on your business rather than on the heavy lifting of racking, stacking and powering servers, also removing security both manual and electronic and even the cooling systems that can be the biggest cost of any datacenter

Global Reach

You have the ability to deploy your application in multiple sites around the world. You can choose to have your data or application hosted in any datacenter the cloud provider has to offer, this will allow faster response times than simply hosting all your services in one country. This means you can provide a lower latency and better experience for your customers simply and at minimal cost.

Types of Cloud Computing

Cloud computing has three main types that are commonly referred to as Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software as a Service (SaaS). Selecting the right type of cloud computing for your needs can help you

Cloud Solutions

The move to cloud has progressed at a steady rate and now thousands of businesses have joined Microsoft Azure, Oracle Google and Amazon Web Services (AWS)  for solutions to build their businesses. Cloud computing platforms provide flexibility to build your application, your way, regardless of your industry or business size. Companies are can saving huge resources time and money, without compromising security requirements or business availability and performance.

CA2 Data Analytics with R

Intro, What is “r”

R sound like a character from James bond, and in ways it kind of is! R is a free application developing statistics and graphics, number crunching in graphical form! R can be run on most if not all platforms UNIX  Windows and MacOS.

AT&T developed R and its named partly after the first names of its designers Ross Ihaka and Robert Gentleman in 1992. R’s ability to produce quality static graphics, dynamic and interactive graphics make it a much used tool in the world of data graphing.

Me and “R”

Coding does not come naturally to me, I struggle with it most of the time, however R is relitavly easy to use and you get lots back in return for a little code..

I struggled with understanding how to upload R’s data, of which it uses to compile the graphs but once I got the links to data sources I looked for an interesting one, I picked the TITANIC passenger list, its based on the different classes of the passengers and their survival rate, the figures which most of us are aware of regarding your chances of being on a life boat depended on your place in society, once you graph the figures and look at them in an image form quite stark. The graph really showed the enormity of the disaster and the consequences for those of the lower classes.

I will list the below sites that helped me and you tube videos I found helpful.

First things first, complete the “TRY R” from Code School http://tryr.codeschool.com/

r-blog-image-1

Once completed I downloaded R to my laptop from this site:                         https://cran.r-project.org/bin/windows/base/

In order to complete my own R data analysis I need to have R on my machine. 

Here is an example of the console interface of the R console

r-blog-console

I used some other sites to help me with understanding R, data types, importing data, exporting data and viewing data. Here is a list of useful sites:

http://www.statmethods.net/input/contents.htm

r-blog-handy-sites

http://www.r-tutor.com/r-introduction/data-frame/data-import

r-blog-handy-sites1

https://www.datacamp.com/community/tutorials/r-data-import-tutorial#gs.H5qvyoU

r-blog-handy-sites2

Next I uploaded some CSV files, A CSV is a comma separated values file, which allows data to be saved in a table structured format, data can be imported via the WEB or locally from your machine. This video on You Tube is really good!!

https://youtu.be/I1K3ZijJ3LM

r-blog-you-tube

Entering my data into R, the commands:

titanic-load

Enter data

>load(“C:\\Users\\gradunne\\Downloads\\Titanic.csv”)

Read Data

>titanic<-read.table(“C:\\Users\\gradunne\\Downloads\\Titanic.csv”, header = T, sep=”,”)

Plot my data

>plot (titanic aes(x =PClass, fill = factor(Survived))) +geom_histogram(width=1.1) + xlab(“PClass”)+ ylab(“total”) + labs(fill = “Survived”)

r-blog-histo

In my graph I have an X axis of total survivors against first, second and third class passengers, in order to make the data stand out more I have used an optional colour code, the concept is quite clear, the wealthy survived where the poor in the lower classes did not. The Green is those who survived and the Red are those who died, being a working class male on the Titanic was a death sentence.

passengers-by-cat

Graphs really do paint a picture that often word and numbers never will.

In his book The Visual Miscellaneum  David McCandless says,  “We’re all visual now, every day every hour maybe even every minute we’re seeing and absorbing information, we’re steeped in it, maybe even lost in it, so perhaps what we need are well -designed colorful and hopefully – useful charts to help us navigate”

visual

Google Fusion Tables CA1

My first CA for Data Management and analytics was to prepare a fusion table showing the dispersal of the population of the Republic of Ireland, the source of the information was from the Central Statistics Office (www.cso.ie) and also the Irish Independent (www.independent.ie). The information from the CSO was in the KML file format (Keyhole Markup Language) this gave the geographical boundaries, the Information from the Independent was saved as a CSV file (common separated value) this gave the population breakdown per county.

Both sets of information were uploaded to my google drive and prepared separately as individual fusion tables, once they were both uploaded they were then merged, a common value between both tables was needed, I modified the CSV file headings as the counties were under the heading “location” and the KML file had the counties listed as “county”, this mismatch caused issues when merging the information, also some of the location names did not match on the CSV and KML files causing areas of the map to remain unpopulated, simple modifications in rendering the files fixed this issue.

Once the information had been merged successfully the fusion table offers options to highlight the data as required, initially the population location was pin pointed with marker icons in red, by holding the mouse over the icon you get a pop up tab giving male, female and total population of that location, this is not very clear and will not quickly deliver the information at a glance, in order to show the information clearly and meaningful format I rendered the map using polygon background colors with a gradient fill, this was presented from light yellow for sparsely populated areas through green light blue and finally dark blue and red for heavily populated areas, as expected Dublin with a population of 1.2 million was noticeably different from the rest of the country, smaller county populations like Leitrim with a total population of 31 thousand was colored very light yellow.

I divided the population numbers into buckets graded in an upward incremental value of 30,000 per bucket, this helped the variation of colors stand out more and shows clearly the variation in population numbers across the republic. Further dissemination of information can be obtained by breaking out male and female populations per county and the country as a whole, interestingly Dublin with the largest population also had the biggest discrepancy between males and females with a difference of 34,000 more females than males, almost every other county had s 1:1 ratio with only small deviations between both sexes.

Google fusion tables are a fantastic way to show at a glance anything from trends, statistics, population numbers or any other data sets in an easy to understand graphical format using colors to instantly highlight major differences or even fine tuning it down to minor differences by tightening bucket variances, this is a very valuable tool which I intend to make use of in my own work environment.

Google Fusion Map For CA1

fusion-table-map

PoP up information

fusion-table-pop-up

Using Buckets to separate population values

fusion-table-bucket