Wednesday, October 21, 2015

What is truth in data and information?

This topic was discussed recently, when we gathered for the EU Project Optique Summit II in Frankfurt. From a data and information point of view, trusted information should be truthful, essential, relevant and expected. But what does it really mean?

One interesting example is the code of ethics for journalists;  https://www.spj.org/pdf/ethicscode.pdf. If journalists were perfectly honest, fair and courageous in gathering, reporting and interpreting information, then we would most likely trust the written word. But that is not the case. Instead we find that journalists create and use data and information to serve special interests, they build and promote commercial and political opinions, many times based on pure economic reasons or a chance to access influence and power.

In the code of ethics one can find approaches that could lead to truthful information; test the accuracy of information against its source (authenticity), find more sources to prove the reliability of information, be generous with source references, avoid misrepresentation and distortion, never plagiarize, tell the story even when it is unpopular to do so, avoid stereotyping, support the open exchange of views, give a voice to the voiceless; official and unofficial information can be equally valid, distinguish between advocacy and news reporting and distinguish news from advertising.

I just wanted to write about this, without getting into the philosophical and mathematical theories/definitions of truth. Enough is said about that. In short one could say that truthful data has a known origin (authenticity) and its data syntax can be referenced and verified. Truthful information (data with a context) reflects an observable reality, relating it to a subjectively perceived context. Data is therefore by definition objective and information is subjective.

One last thing about truth. Everything that we observe and experience needs to be true, if we are going to interact and base our decisions on the right premises. A sensor on a spacecraft that delivers false/faulty data can cause catastrophic outcomes. A journalist that spreads false reports can alter the outcome of an election. We are constantly bombarded with misleading and untruthful information, like the food industry that claims products to be healthy when they are not. We need therefore tools that can help us to test and verify the quality, authenticity and accuracy of information. DNVGL is working to develop the ISO-8000 standard for Information Quality, that might be a step in the right direction.

Tuesday, September 9, 2014

Comments on symbiosis between Mass-media and the Political establishment

Following text in Swedish:

Är det bara jag som ser det så här? Så här i valtider, så inser man vilken viktig roll massmedia har. Massmedia lever på vad som avviker från det ”normala”. Att rapportera att allt är normalt säljer inga annonsplatser. Nyheter är något/någon avviker från normen; som är exempelvis rikare, större, snabbare, oroligare, svårare, ondare, dyrare, osv. Problemet är att massmedia blir den främsta uttolkaren av vad som är normalt. Varför? Jo, för att media kontrollerar flödet av information och bygger opinionen för medborgarna.

Med tiden så anpassar sej partier och politiker till en ”åsiktskorridor” och djupt begraver sina ideologiska uttryck, upprördhet, idealistiska strävanden och känslomässiga uttalanden till det som kan godkännas och rapporteras av media. Detta leder till att ”etablerade” politiker talar på samma sätt, använder samma argument, och har samma målsättningar. Det skiljer alltså inte mycket mellan Socialdemokraten eller Moderaten.

Den viktiga frågan blir därmed; vem sätter egentligen den framtida politiska agendan? Centerpartiet försökte för ett tag sedan bygga ett framtida partiprogram, men efter massiv kritik från massmedia så anpassades programmet till det som kan anses vara politiskt korrekt. Man kan fråga sej om massmedia är objektiv i sin rapportering eller om man medvetet inkluderar subjektiva bedömningar som leder kritiken i en viss riktning.

Massmedia och politiker lever i symbios med varandra. Båda parter är beroende av varandra för här finns välbetalda uppdrag, jobb och positioner, kontakter, presstöd, reklamintäkter, kändisskap, opinions- och åsiktsmanipulation. Den oberoende, opartiska och objektivt granskande makten har nästan helt försvunnit – och detta bådar inte gott för den demokratiska utvecklingen i Sverige.

Wednesday, July 2, 2014

Five V’s of big data

I have just begun working with big data issues, supporting the EU project Optique; http://www.optique-project.eu/ with events and setting up a network of interested people, industries, authorities and organizations. If you would like to participate, then send me an email.

Big data is a fascinating topic. The term “big data” leads ones thoughts toward the vast volumes of data, continuously generated through modern technology. It is not difficult to find impressive examples http://en.wikipedia.org/wiki/Big_data; The Hubble space-telescope gathered 120 Terrabyte (Tb) of data between 1990 and 2007. The Large Hadron Collider (LHC) at CERN captures around 25,000 Tb every year from its acceleration/collision runs. The animated film “Despicable Me” used 142 Tb of data to show 95 minutes of film. The online radio site Pandora has 250 Tb of music in its archives. Walmart manages over 1 million customer transactions every hour and have accumulated over 2,560 Tb about their customers shopping habits. YouTube has 530 Petabyte (Pb) of video streams in their archives. The governance of these huge data assets poses technical, managerial and financial challenges for storage, processing (indexing, search and retrieval), broad-band-width transmission, quality assurance and protection.

But, big data is much more than just its volume. It is also the velocity of big data, or in other words, the time it takes to create/capture, update, index, manage, analyze and use data. Some data might be captured millions of times every second (like the LHC collision sensors) and other data might be updated manually once a year. Data that are created, updated and used with different timelines must be analyzed and synchronized in order to be used for a comparative user query. If it takes to much lead-time to query, search, retrieve and assemble a result then the big data resource will not be used as anticipated. A big challenge is therefore the ability to “crunch” a lot of data in a short time span, and to analyze data in real-time, automatically.

When one has firm grasp over the volume and velocity, then it is time to understand the variety of big data. Integrating data from different sources can result in severe inconsistencies when the same data can be both unstructured or structured and based on different representations, formats and media. So, we could for example find the data term “temperature” represented in many different forms. It could be a number, a text string, a stream of bits, a color, an image or icon, an animation, a sound, a frequency or an algorithm and expressed in Celsius, Fahrenheit, Kelvin or some new scale suited for a special purpose – Mexican food; scorching, very hot, hot, just fine and gringo. Some data are structured; it has already a data model linked to it and other data sources are unstructured and needs hands-on work to sort out its interpretations. The challenge comprises of making sure that data can be mapped to something, it could be an object, a property or a reference. It should also have a clear definition and references. If data is unstructured, then we need to map towards existing structures or rapidly create new objects, properties or references. Large amounts of time could be saved if this could be done automatically.

These three V’s, volume, velocity and variety were Gartner’s initial 2001 definition of big data characteristics. But there are more things that we need to take into account. Some organizations regard value to be an essential characteristic, or how to create value from big data; http://www.bigdatavalue.eu/. This is naturally very important, and it will surely be the natural state-of-art for the future. Most organizations will be dependent of big data sources to better understand and provide the right products and services to their dependents, customers, employees, students, patients and clients.
But, value is not a big data characteristic. It is more an output effect, based on the demand for specific information and the big data environment capabilities to supply/deliver data efficiently and effective to cover that demand.

IBM and others correctly assume that big data sources will not be used if one cannot ensure availability and trust in the data. They have called it veracity, or making sure that these huge data assets are validated, they are truthful, reliable and in short trusted. Big data links here to a number of already ongoing initiatives such as data/information quality (making sure that data is available, authentic, actual, and accurate), Information Security where sensitive content will be protected and secure from unauthorized access/manipulation, regulatory openness and transparency (SOX), respected integrity (PUL) and intellectual property rights (IPR & Copyrights). Veracity is not a characteristic for big data. It is more a set of requirements to ensure its validity and trust.

The final characteristic of big data is by some Complexity. I do not agree with that, complexity is nothing more than a consequence derived from the relationships between volumes, velocity and variety (see picture below). So the five V’s of big data are; Volume, Velocity, Variety, Value and Veracity.


Tuesday, July 2, 2013

Aggressive taxation on computers and Internet


Update 2014-07-03: Högsta förvaltningsdomstolen slog fast att innehavare av en dator med internetuppkoppling inte behöver betala tv-avgift. Nu har Radiotjänst beslutat att återbetala tv-avgifter som man krävt in av hushåll med datorer

I have tried to avoid being overly critical in my blog, but it is extremely hard when I see the new legal interpretation from Radiotjänst. Radiotjänst collects a TV-license fee on behalf of the three public broadcasters. Current law stipulates that the fee should be paid by every household containing a TV set, and possession of such a device must be reported to Radiotjänst. But, since February 2013 Radiotjänst have changed their interpretation to include any smart phone, laptop computer, surf tablet or personal computer connected to the Internet, to be considered a TV-set and thereby requires payment of the TV-license fee. I am not sure if Radiotjänst have consider the following:
  • There are an estimated 2,405,518,376 Internet users today; http://www.internetworldstats.com/stats.htm. All these users are in possession of a TV set or its computer equivalence, and they are connected to the Internet. These Internet users fulfill the above legal stipulations, should they then pay the fee? If not, what further legal requirements distinguish those who must pay their fee, and those who does not? Or is this about the estimated 8,397,900 Swedish Internet users (92.5% of the population); http://www.internetworldstats.com/eu/se.htm, as defined under the nationality principle, or residents under the resident principle and as such subjects under Swedish law?
  • Internet broadcasted Swedish TV and Radio can be consumed abroad, so users who are not subjects under Swedish law can therefore enjoy the programming free of charge. Does that imply that Swedish citizens currently working or has their residence abroad and abiding to another country’s laws are not legally required to pay the fee?
  • If Swedish citizens/residents cannot be presented with an option not to consume Swedish TV and Radio, then this is in fact nothing more than an aggressive and mandatory taxation on computers and Internet. This is very unfortunate due to many years of political initiatives to expand and encourage computer literacy and internet connectivity. This taxation is counter-productive and goes in a opposite direction of many other countries, like the US with their Internet Tax Freedom Act.
  • This taxation can lead to a multitude of tax-avoidance protests, ideas and concepts, which might lead to monitoring of citizens/residents internet capabilities and habits. If so, then this would be a serious infringement on our personal integrity and create another precedent for a tighter government control.
Internet and access to information should be kept free. Much of this problem could be solved by setting up a login-based pay service for Swedish TV/Radio broadcasting. So, those who are interested can pay for what they consume. Radiotjänst have most likely concluded that such an approach would not pay their bills. This is not an Internet problem, it is an antiquated and expensive Public Service that cannot keep up with competition and have lost the public support to finance their operations. If the government decide to keep the Public Broadcasters, then why not be perfectly honest about that and finance them through the State Tax? That would be a mandatory tax, no need for policing of computers and internet, no need for Radiotjänst and at least 150 Million crowns in annual savings.

Monday, February 20, 2012

A year has passed..

This last year have passed by quickly. A lot of things have happened. We left the US in June 2011 and moved back to Scandinavia, which means that I am commuting between Stockholm and Oslo. Still employed at the DNV HQ in Norway, where information continues to be high on the agenda. Many organizations are looking for solutions where their historical data-, information- and knowledge resources can be reused and repurposed for a wider selection of customers and users. Information is becoming a crucial immateriell resource in these organizations.

At the same time, there are many organizations out there struggling to upkeep their growing volumes of data and information. Recent events in the Mexican Gulf requires the Oil & Gas industries to be even more transparent, and to be able to show their proactive hazard- and risk analysis. This means management of even more information, concluding in higher costs and longer lead-times. Defense organizations are looking for automatic solutions where the bulk part of information will not be processed by humans, but computers. This trend will go hand-in-hand with the trend of outsourcing data- and information resources to companies that can protect, quality assure and adapt the information for many different usages.

Many new projects are in the pipe-line. The SESAR Project in Europe, that is consolidating information for Air Traffic Management (ATM). The Net Enabled Capability (NEC) managed by European Defense Agency and their Crisis Management Operation Program will do the same. The European Union will look more deeply into the shared data issue for the European Digital Agenda. It is going to be an exciting 2012, and I hope to blog more than once a year.

Sunday, February 6, 2011

Building another website


I am quite impressed over the capabilities in Illustrator, Flash and Dreamweaver. Together they hold more capabilities than an ordinary user like myself will ever need. This is testsite #2, where I am testing to incorporate Flash into HTML. I've designed the header in Flash, and it includes animation and the navigation. The scrollbar is done in Dreamweaver. Graphics was made in Illustrator. You can test the website at Jalles Website.

Monday, September 13, 2010

An ad that creates a reaction...

When creating an ad, you must be aware that the average reader will observe the ad for a fraction of a second. One idea is therefore to get the readers attention, to stop the reader and let him observe the ad for a few seconds. So, this ad experiments with the attention-span, to see if by using fairly simple means can stop the reader and transmit the intended message.

I change the color of a young girls eye, to an energetic bright green. That will make the reader stop, and then glance back over the ad to observe the logo, the headline and the website address. In just a few seconds, this ad has communicated the notion of green, energy and earth. Those interested in renewable energy will take a third and fourth deeper look and perhaps read some of the text.