{"info":{"_postman_id":"c753cad0-6425-44ab-b369-152e94e030fa","name":"SummarizeBot API","description":"

SummarizeBot API provides artificial intelligence and blockchain-powered solutions for text and multimedia analysis. SummarizeBot API allows applications to scrape web articles, summarize text from documents or the web (or audio and video content). Use the API for sentiment analysis and extraction of text, videos, and images. More than 100 languages are included and supported file types include .doc, .pdf, .epub, .csv, .pptx, .rtf and others.

\n","schema":"https://schema.getpostman.com/json/collection/v2.0.0/collection.json","toc":[],"owner":"3967924","collectionId":"c753cad0-6425-44ab-b369-152e94e030fa","publishedId":"RW1hhbeF","public":true,"customColor":null,"publishDate":"2018-05-09T08:59:24.000Z"},"item":[{"name":"v1 Reference","item":[{"name":"/summarize","id":"ad47419f-6ca0-46c2-bcf4-1478fdde4aec","request":{"method":"GET","header":[],"body":{"mode":"formdata","formdata":[]},"url":"https://www.summarizebot.com/api/summarize?apiKey=YourAPIKey&url=https://en.wikipedia.org/wiki/Automatic_summarization&size=18&keywords=10&fragments=15&language=English","description":"

Summarize file from a given url.

\n","urlObject":{"protocol":"https","path":["api","summarize"],"host":["www","summarizebot","com"],"query":[{"description":{"content":"

To use the API you will need an API key. Please, register at http://www.summarizebot.com/summarization_business.html to get your personal API key.

\n","type":"text/plain"},"key":"apiKey","value":"YourAPIKey"},{"description":{"content":"

Article or web page url.

\n","type":"text/plain"},"key":"url","value":"https://en.wikipedia.org/wiki/Automatic_summarization"},{"description":{"content":"

Summary length as percentage of original document.

\n","type":"text/plain"},"key":"size","value":"18"},{"description":{"content":"

Maximum count of keywords to return.

\n","type":"text/plain"},"key":"keywords","value":"10"},{"description":{"content":"

Maximum count of key fragments to return.

\n","type":"text/plain"},"key":"fragments","value":"15"},{"description":{"content":"

Optional for text files, required for audio files and images.

\n","type":"text/plain"},"key":"language","value":"English"}],"variable":[]}},"response":[{"id":"e44d029b-0dfd-4af5-b1c2-5a7e4cdcc339","name":"Summarize GET result","originalRequest":{"method":"GET","header":[],"body":{"mode":"formdata","formdata":[]},"url":{"raw":"https://www.summarizebot.com/api/summarize?apiKey=YourAPIKey&url=https://en.wikipedia.org/wiki/Automatic_summarization&size=18&keywords=10&fragments=15&language=English","protocol":"https","host":["www","summarizebot","com"],"path":["api","summarize"],"query":[{"key":"apiKey","value":"YourAPIKey","description":"To use the API you will need an API key. Please, register at http://www.summarizebot.com/summarization_business.html to get your personal API key."},{"key":"url","value":"https://en.wikipedia.org/wiki/Automatic_summarization","description":"Article or web page url."},{"key":"size","value":"18","description":"Summary length as percentage of original document."},{"key":"keywords","value":"10","description":"Maximum count of keywords to return."},{"key":"fragments","value":"15","description":"Maximum count of key fragments to return."},{"key":"language","value":"English","description":"Optional for text files, required for audio files and images."}]},"description":"Summarize file from a given url."},"status":"OK","code":200,"_postman_previewlanguage":"json","header":[{"name":"connection","key":"connection","value":"Keep-Alive","description":"Options that are desired for the connection"},{"name":"content-length","key":"content-length","value":"10266","description":"The length of the response body in octets (8-bit bytes)"},{"name":"content-type","key":"content-type","value":"application/json","description":"The mime type of this content"},{"name":"date","key":"date","value":"Sun, 25 Mar 2018 11:41:39 GMT","description":"The date and time that the message was sent"},{"name":"keep-alive","key":"keep-alive","value":"timeout=5, max=100","description":"Custom header"},{"name":"server","key":"server","value":"Apache/2.4.7 (Ubuntu)","description":"A name for the server"}],"cookie":[],"responseTime":null,"body":"[{\"summary\": [{\"id\": 0, \"weight\": 3.07, \"sentence\": \"Automatic summarization is the process of shortening a text document with software, in order to create a summary with the major points of the original document.\"}, {\"id\": 1, \"weight\": 2.39, \"sentence\": \"Technologies that can make a coherent summary take into account variables such as length, writing style and syntax.\"}, {\"id\": 2, \"weight\": 3.37, \"sentence\": \"Automatic data summarization is part of machine learning and data mining.\"}, {\"id\": 5, \"weight\": 2.94, \"sentence\": \"Search engines are an example; others include summarization of documents, image collections and videos.\"}, {\"id\": 6, \"weight\": 2.65, \"sentence\": \"Document summarization tries to create a representative summary or abstract of the entire document, by finding the most informative sentences, while in image summarization the system finds the most representative and important (i.e.\"}, {\"id\": 8, \"weight\": 2.54, \"sentence\": \"There are two general approaches to automatic summarization: extraction and abstraction.\"}, {\"id\": 12, \"weight\": 2.85, \"sentence\": \"Research to date has focused primarily on extractive methods, which are appropriate for image collection summarization and video summarization.\"}, {\"id\": 13, \"weight\": 2.74, \"sentence\": \"In this summarization task, the automatic system extracts objects from the entire collection, without modifying the objects themselves.\"}, {\"id\": 15, \"weight\": 2.42, \"sentence\": \"Similarly, in image collection summarization, the system extracts images from the collection without modifying the images themselves.\"}, {\"id\": 18, \"weight\": 2.34, \"sentence\": \"While some work has been done in abstractive summarization (creating an abstract synopsis like that of a human), the majority of summarization systems are extractive (selecting a subset of sentences to place in a summary).\"}, {\"id\": 19, \"weight\": 2.67, \"sentence\": \"Machine learning techniques from closely related fields such as information retrieval or text mining have been successfully adapted to help automatic summarization.\"}, {\"id\": 20, \"weight\": 3.35, \"sentence\": \"Apart from Fully Automated Summarizers (FAS), there are systems that aid users with the task of summarization (MAHS = Machine Aided Human Summarization), for example by highlighting candidate passages to be included in the summary, and there are systems that depend on post-processing by a human (HAMS = Human Aided Machine Summarization).\"}, {\"id\": 21, \"weight\": 2.81, \"sentence\": \"There are broadly two types of extractive summarization tasks depending on what the summarization program focuses on.\"}, {\"id\": 23, \"weight\": 2.67, \"sentence\": \"The second is query relevant summarization, sometimes called query-based summarization, which summarizes objects specific to a query.\"}, {\"id\": 24, \"weight\": 2.45, \"sentence\": \"Summarization systems are able to create both query relevant text summaries and generic machine-generated summaries depending on what the user needs.\"}, {\"id\": 25, \"weight\": 2.74, \"sentence\": \"An example of a summarization problem is document summarization, which attempts to automatically produce an abstract from a given document.\"}, {\"id\": 30, \"weight\": 3.03, \"sentence\": \"Image collection summarization is another application example of automatic summarization.\"}, {\"id\": 33, \"weight\": 2.55, \"sentence\": \"Video summarization is a related domain, where the system automatically creates a trailer of a long video.\"}, {\"id\": 39, \"weight\": 2.78, \"sentence\": \"Query based summarization techniques, additionally model for relevance of the summary with the query.\"}, {\"id\": 40, \"weight\": 2.62, \"sentence\": \"Some techniques and algorithms which naturally model summarization problems are TextRank and PageRank, Submodular set function, Determinantal point process, maximal marginal relevance (MMR) etc.\"}, {\"id\": 89, \"weight\": 2.44, \"sentence\": \"Another keyphrase extraction algorithm is TextRank.\"}, {\"id\": 93, \"weight\": 2.38, \"sentence\": \"Unsupervised keyphrase extraction removes the need for training data.\"}, {\"id\": 99, \"weight\": 2.33, \"sentence\": \"TextRank is a general purpose graph-based ranking algorithm for NLP.\"}, {\"id\": 100, \"weight\": 2.5, \"sentence\": \"Essentially, it runs PageRank on a graph specially designed for a particular NLP task.\"}, {\"id\": 138, \"weight\": 2.33, \"sentence\": \"Like keyphrase extraction, document summarization aims to identify the essence of a text.\"}, {\"id\": 140, \"weight\": 2.6, \"sentence\": \"Before getting into the details of some summarization methods, we will mention how summarization systems are typically evaluated.\"}, {\"id\": 150, \"weight\": 2.61, \"sentence\": \"A promising line in document summarization is adaptive document/text summarization.\"}, {\"id\": 151, \"weight\": 2.47, \"sentence\": \"[5] The idea of adaptive summarization involves preliminary recognition of document/text genre and subsequent application of summarization algorithms optimized for this genre.\"}, {\"id\": 152, \"weight\": 2.51, \"sentence\": \"First summarizes that perform adaptive summarization have been created.\"}, {\"id\": 160, \"weight\": 2.4, \"sentence\": \"During the DUC 2001 and 2002 evaluation workshops, TNO developed a sentence extraction system for multi-document summarization in the news domain.\"}, {\"id\": 163, \"weight\": 2.34, \"sentence\": \"Maximum entropy has also been applied successfully for summarization in the broadcast news domain.\"}, {\"id\": 168, \"weight\": 2.48, \"sentence\": \"LexRank[7] is an algorithm essentially identical to TextRank, and both use this approach for document summarization.\"}, {\"id\": 177, \"weight\": 2.37, \"sentence\": \"It is worth noting that TextRank was applied to summarization exactly as described here, while LexRank was used as part of a larger summarization system (MEAD) that combines the LexRank score (stationary probability) with other features like sentence position and length using a linear combination with either user-specified or automatically tuned weights.\"}, {\"id\": 179, \"weight\": 2.85, \"sentence\": \"Another important distinction is that TextRank was used for single document summarization, while LexRank has been applied to multi-document summarization.\"}, {\"id\": 194, \"weight\": 2.67, \"sentence\": \"Multi-document summarization is an automatic procedure aimed at extraction of information from multiple texts written about the same topic.\"}, {\"id\": 202, \"weight\": 2.38, \"sentence\": \"Multi-document extractive summarization faces a problem of potential redundancy.\"}, {\"id\": 210, \"weight\": 2.53, \"sentence\": \"These methods have achieved the state of the art results for Document Summarization Corpora, DUC 04 - 07.\"}, {\"id\": 214, \"weight\": 2.87, \"sentence\": \"The Simplish Simplifying & Summarizing tool[13] - performs just such an automatic multi-lingual multi-document summarization.\"}, {\"id\": 215, \"weight\": 2.41, \"sentence\": \"The idea of a Submodular set function has recently emerged as a powerful modeling tool for various summarization problems.\"}, {\"id\": 216, \"weight\": 2.51, \"sentence\": \"Submodular functions naturally model notions of coverage, information, representation and diversity.\"}, {\"id\": 223, \"weight\": 2.55, \"sentence\": \"The Facility Location function also naturally models coverage and diversity.\"}, {\"id\": 229, \"weight\": 2.4, \"sentence\": \"While submodular functions are fitting problems for summarization, they also admit very efficient algorithms for optimization.\"}, {\"id\": 234, \"weight\": 2.34, \"sentence\": \"Similarly, work by Lin and Bilmes, 2011,[16] shows that many existing systems for automatic summarization are instances of submodular functions.\"}, {\"id\": 236, \"weight\": 2.42, \"sentence\": \"Submodular Functions have also been used for other summarization tasks.\"}, {\"id\": 239, \"weight\": 2.56, \"sentence\": \"Submodular Functions have also successfully been used for summarizing machine learning datasets.\"}, {\"id\": 245, \"weight\": 2.43, \"sentence\": \"Intra-textual methods assess the output of a specific summarization system, and the inter-textual ones focus on contrastive analysis of outputs of several summarization systems.\"}, {\"id\": 250, \"weight\": 2.67, \"sentence\": \"It essentially calculates n-gram overlaps between automatically generated summaries and previously-written human summaries.\"}, {\"id\": 254, \"weight\": 2.6, \"sentence\": \"Similarly, for image summarization, Tschiatschek et al., developed a Visual-ROUGE score which judges the performance of algorithms for image summarization.\"}]}, {\"keywords\": [{\"weight\": 4.07, \"keyword\": \"submodular functions\", \"ids\": [239, 216, 236, 229, 234]}, {\"weight\": 4.05, \"keyword\": \"submodular optimization\", \"ids\": [217, 224, 218, 225]}, {\"weight\": 3.79, \"keyword\": \"stationary distribution\", \"ids\": [135, 104]}, {\"weight\": 3.78, \"keyword\": \"recall-oriented understudy\", \"ids\": [141, 249]}, {\"weight\": 3.78, \"keyword\": \"maximum entropy\", \"ids\": [163, 162]}, {\"weight\": 3.72, \"keyword\": \"generating ideograms\", \"ids\": [212, 213]}, {\"weight\": 3.7, \"keyword\": \"summarization systems\", \"ids\": [140, 24, 245, 18, 196]}, {\"weight\": 3.65, \"keyword\": \"content overlap\", \"ids\": [171, 147]}, {\"weight\": 3.6, \"keyword\": \"sentence position\", \"ids\": [177, 208]}, {\"weight\": 3.59, \"keyword\": \"surveillance videos\", \"ids\": [7, 35]}]}, {\"fragments\": [{\"fragment\": \"absorbing random walk\", \"ids\": [206], \"weight\": 3.22}, {\"fragment\": \"summarizes objects specific\", \"ids\": [23], \"weight\": 3.21}, {\"fragment\": \"naturally models coverage\", \"ids\": [223], \"weight\": 3.21}, {\"fragment\": \"summarizing news articles\", \"ids\": [28], \"weight\": 3.19}, {\"fragment\": \"nlp ranking task\", \"ids\": [169], \"weight\": 3.19}, {\"fragment\": \"cover function attempts\", \"ids\": [219], \"weight\": 3.18}, {\"fragment\": \"heuristic post-processing step\", \"ids\": [184], \"weight\": 3.17}, {\"fragment\": \"rank individual unigrams\", \"ids\": [107], \"weight\": 3.15}, {\"fragment\": \"naive bayes classifier\", \"ids\": [161], \"weight\": 3.14}, {\"fragment\": \"maximal marginal relevance\", \"ids\": [40], \"weight\": 3.14}, {\"fragment\": \"evaluation systems existing\", \"ids\": [258], \"weight\": 3.14}, {\"fragment\": \"machine learning techniques\", \"ids\": [19], \"weight\": 3.13}, {\"fragment\": \"method simply ranks\", \"ids\": [119], \"weight\": 3.13}, {\"fragment\": \"relevant source documents\", \"ids\": [200], \"weight\": 3.1}, {\"fragment\": \"algorithms model notions\", \"ids\": [38], \"weight\": 3.1}]}]\n"}],"_postman_id":"ad47419f-6ca0-46c2-bcf4-1478fdde4aec"},{"name":"/summarize","id":"04a1a297-b6c3-4f51-acb4-64808b337f74","request":{"method":"POST","header":[{"key":"Content-Type","value":"application/octet-stream","description":"

The HTTP header should be specified as 'application/octet-stream'.

\n"}],"body":{"mode":"file","file":{}},"url":"https://www.summarizebot.com/api/summarize?apiKey=YourAPIKey&filename=1.txt&size=18&keywords=10&fragments=15&language=English","description":"

Summarize file from binary data. POST body should include file content in binary form. The HTTP header should be specified as 'application/octet-stream'.

\n","urlObject":{"protocol":"https","path":["api","summarize"],"host":["www","summarizebot","com"],"query":[{"description":{"content":"

To use the API you will need an API key. Please, register at http://www.summarizebot.com/summarization_business.html to get your personal API key.

\n","type":"text/plain"},"key":"apiKey","value":"YourAPIKey"},{"description":{"content":"

Name of the file, e.g. filename=1.txt.

\n","type":"text/plain"},"key":"filename","value":"1.txt"},{"description":{"content":"

Summary length as percentage of original document.

\n","type":"text/plain"},"key":"size","value":"18"},{"description":{"content":"

Maximum count of keywords to return.

\n","type":"text/plain"},"key":"keywords","value":"10"},{"description":{"content":"

Maximum count of key fragments to return.

\n","type":"text/plain"},"key":"fragments","value":"15"},{"description":{"content":"

Optional for text files, required for audio files and images.

\n","type":"text/plain"},"key":"language","value":"English"}],"variable":[]}},"response":[{"id":"0a58e7ec-f4ee-477e-978b-a9441ef3acbc","name":"Summarize POST result","originalRequest":{"method":"POST","header":[{"key":"Content-Type","value":"application/octet-stream","description":"The HTTP header should be specified as 'application/octet-stream'."}],"body":{"mode":"file","file":{}},"url":{"raw":"https://www.summarizebot.com/api/summarize?apiKey=YourAPIKey&filename=1.txt&size=18&keywords=10&fragments=15&language=English","protocol":"https","host":["www","summarizebot","com"],"path":["api","summarize"],"query":[{"key":"apiKey","value":"YourAPIKey","description":"To use the API you will need an API key. Please, register at http://www.summarizebot.com/summarization_business.html to get your personal API key."},{"key":"filename","value":"1.txt","description":"Name of the file, e.g. filename=1.txt."},{"key":"size","value":"18","description":"Summary length as percentage of original document."},{"key":"keywords","value":"10","description":"Maximum count of keywords to return."},{"key":"fragments","value":"15","description":"Maximum count of key fragments to return."},{"key":"language","value":"English","description":"Optional for text files, required for audio files and images."}]},"description":"Summarize file from binary data. POST body should include file content in binary form. The HTTP header should be specified as 'application/octet-stream'."},"status":"OK","code":200,"_postman_previewlanguage":"json","header":[{"name":"connection","key":"connection","value":"Keep-Alive","description":"Options that are desired for the connection"},{"name":"content-length","key":"content-length","value":"2495","description":"The length of the response body in octets (8-bit bytes)"},{"name":"content-type","key":"content-type","value":"application/json","description":"The mime type of this content"},{"name":"date","key":"date","value":"Sun, 25 Mar 2018 11:54:54 GMT","description":"The date and time that the message was sent"},{"name":"keep-alive","key":"keep-alive","value":"timeout=5, max=100","description":"Custom header"},{"name":"server","key":"server","value":"Apache/2.4.7 (Ubuntu)","description":"A name for the server"}],"cookie":[],"responseTime":null,"body":"[{\"summary\": [{\"id\": 0, \"weight\": 2.36, \"sentence\": \"Automatic summarization is the process of shortening a text document with software, in order to create a summary with the major points of the original document.\"}, {\"id\": 1, \"weight\": 2.4, \"sentence\": \"Technologies that can make a coherent summary take into account variables such as length, writing style and syntax.\"}, {\"id\": 2, \"weight\": 2.57, \"sentence\": \"Automatic data summarization is part of machine learning and data mining.\"}, {\"id\": 5, \"weight\": 2.6, \"sentence\": \"Search engines are an example; others include summarization of documents, image collections and videos.\"}, {\"id\": 22, \"weight\": 2.37, \"sentence\": \"Image collection summarization is another application example of automatic summarization.\"}, {\"id\": 31, \"weight\": 2.35, \"sentence\": \"Query based summarization techniques, additionally model for relevance of the summary with the query.\"}]}, {\"keywords\": [{\"weight\": 2.14, \"keyword\": \"extractive methods\", \"ids\": [12, 9]}, {\"weight\": 1.76, \"keyword\": \"surveillance videos\", \"ids\": [7, 27]}, {\"weight\": 1.63, \"keyword\": \"summarization algorithms\", \"ids\": [28]}, {\"weight\": 1.58, \"keyword\": \"topic\", \"ids\": [21, 18]}, {\"weight\": 1.58, \"keyword\": \"summaries\", \"ids\": [16]}, {\"weight\": 1.58, \"keyword\": \"subset\", \"ids\": [9, 28, 3]}, {\"weight\": 1.58, \"keyword\": \"relevance\", \"ids\": [31, 32]}, {\"weight\": 1.58, \"keyword\": \"process\", \"ids\": [0, 32]}, {\"weight\": 1.58, \"keyword\": \"objects\", \"ids\": [15, 28]}, {\"weight\": 1.58, \"keyword\": \"extract\", \"ids\": [12, 13, 9, 8, 7]}]}, {\"fragments\": [{\"fragment\": \"summarizing news articles\", \"ids\": [20], \"weight\": 1.42}, {\"fragment\": \"generic machine-generated summaries depending\", \"ids\": [16], \"weight\": 1.31}, {\"fragment\": \"maximal marginal relevance\", \"ids\": [32], \"weight\": 1.03}, {\"fragment\": \"extractive summarization tasks depending\", \"ids\": [13], \"weight\": 0.99}, {\"fragment\": \"query relevant summarization\", \"ids\": [15], \"weight\": 0.99}, {\"fragment\": \"automatic data summarization\", \"ids\": [2], \"weight\": 0.99}, {\"fragment\": \"image collection summarization\", \"ids\": [22], \"weight\": 0.95}, {\"fragment\": \"algorithms model notions\", \"ids\": [30], \"weight\": 0.94}, {\"fragment\": \"query based summarization techniques\", \"ids\": [31], \"weight\": 0.93}, {\"fragment\": \"image collection exploration\", \"ids\": [24], \"weight\": 0.89}, {\"fragment\": \"natural language generation techniques\", \"ids\": [10], \"weight\": 0.83}, {\"fragment\": \"redundant frames captured\", \"ids\": [27], \"weight\": 0.25}]}]\n"}],"_postman_id":"04a1a297-b6c3-4f51-acb4-64808b337f74"},{"name":"/keywords","id":"d8f09752-7c64-4345-80b3-571177fa839d","request":{"method":"GET","header":[],"body":{"mode":"formdata","formdata":[]},"url":"https://www.summarizebot.com/api/keywords?apiKey=YourAPIKey&url=https://en.wikipedia.org/wiki/Automatic_summarization&keywords=10&language=English","description":"

Extract keywords from a given url.

\n","urlObject":{"protocol":"https","path":["api","keywords"],"host":["www","summarizebot","com"],"query":[{"description":{"content":"

To use the API you will need an API key. Please, register at http://www.summarizebot.com/summarization_business.html to get your personal API key.

\n","type":"text/plain"},"key":"apiKey","value":"YourAPIKey"},{"description":{"content":"

Article or web page url.

\n","type":"text/plain"},"key":"url","value":"https://en.wikipedia.org/wiki/Automatic_summarization"},{"description":{"content":"

Maximum count of keywords to return.

\n","type":"text/plain"},"key":"keywords","value":"10"},{"description":{"content":"

Optional for text files, required for audio files and images.

\n","type":"text/plain"},"key":"language","value":"English"}],"variable":[]}},"response":[{"id":"194a6fe8-3bbd-4bc8-8bfe-08600db1502d","name":"Keywords GET result","originalRequest":{"method":"GET","header":[],"body":{"mode":"formdata","formdata":[]},"url":{"raw":"https://www.summarizebot.com/api/keywords?apiKey=YourAPIKey&url=https://en.wikipedia.org/wiki/Automatic_summarization&keywords=10&language=English","protocol":"https","host":["www","summarizebot","com"],"path":["api","keywords"],"query":[{"key":"apiKey","value":"YourAPIKey","description":"To use the API you will need an API key. Please, register at http://www.summarizebot.com/summarization_business.html to get your personal API key."},{"key":"url","value":"https://en.wikipedia.org/wiki/Automatic_summarization","description":"Article or web page url."},{"key":"keywords","value":"10","description":"Maximum count of keywords to return."},{"key":"language","value":"English","description":"Optional for text files, required for audio files and images."}]},"description":"Extract keywords from a given url."},"status":"OK","code":200,"_postman_previewlanguage":"json","header":[{"name":"connection","key":"connection","value":"Keep-Alive","description":"Options that are desired for the connection"},{"name":"content-length","key":"content-length","value":"769","description":"The length of the response body in octets (8-bit bytes)"},{"name":"content-type","key":"content-type","value":"application/json","description":"The mime type of this content"},{"name":"date","key":"date","value":"Thu, 29 Mar 2018 20:56:53 GMT","description":"The date and time that the message was sent"},{"name":"keep-alive","key":"keep-alive","value":"timeout=5, max=100","description":"Custom header"},{"name":"server","key":"server","value":"Apache/2.4.7 (Ubuntu)","description":"A name for the server"}],"cookie":[],"responseTime":null,"body":"[{\"keywords\": [{\"weight\": 4.07, \"keyword\": \"submodular functions\", \"ids\": [239, 216, 236, 229, 234]}, {\"weight\": 4.05, \"keyword\": \"submodular optimization\", \"ids\": [217, 224, 218, 225]}, {\"weight\": 3.79, \"keyword\": \"stationary distribution\", \"ids\": [135, 104]}, {\"weight\": 3.78, \"keyword\": \"recall-oriented understudy\", \"ids\": [141, 249]}, {\"weight\": 3.78, \"keyword\": \"maximum entropy\", \"ids\": [163, 162]}, {\"weight\": 3.72, \"keyword\": \"generating ideograms\", \"ids\": [212, 213]}, {\"weight\": 3.7, \"keyword\": \"summarization systems\", \"ids\": [140, 24, 245, 18, 196]}, {\"weight\": 3.65, \"keyword\": \"content overlap\", \"ids\": [171, 147]}, {\"weight\": 3.6, \"keyword\": \"sentence position\", \"ids\": [177, 208]}, {\"weight\": 3.59, \"keyword\": \"surveillance videos\", \"ids\": [7, 35]}]}]\n"}],"_postman_id":"d8f09752-7c64-4345-80b3-571177fa839d"},{"name":"/keywords","id":"2de6c507-8a00-4d03-be3c-083639078f36","request":{"method":"POST","header":[{"key":"Content-Type","value":"application/octet-stream","description":"

The HTTP header should be specified as 'application/octet-stream'.

\n"}],"body":{"mode":"file","file":{}},"url":"https://www.summarizebot.com/api/keywords?apiKey=YourAPIKey&filename=1.txt&keywords=10&language=English","description":"

Extract keywords from binary data. POST body should include file content in binary form. The HTTP header should be specified as 'application/octet-stream'.

\n","urlObject":{"protocol":"https","path":["api","keywords"],"host":["www","summarizebot","com"],"query":[{"description":{"content":"

To use the API you will need an API key. Please, register at http://www.summarizebot.com/summarization_business.html to get your personal API key.

\n","type":"text/plain"},"key":"apiKey","value":"YourAPIKey"},{"description":{"content":"

Name of the file, e.g. filename=1.txt.

\n","type":"text/plain"},"key":"filename","value":"1.txt"},{"description":{"content":"

Maximum count of keywords to return.

\n","type":"text/plain"},"key":"keywords","value":"10"},{"description":{"content":"

Optional for text files, required for audio files and images.

\n","type":"text/plain"},"key":"language","value":"English"}],"variable":[]}},"response":[{"id":"6606fb92-f282-4fc0-bd6d-5ad8a6d5e4e1","name":"Keywords POST result","originalRequest":{"method":"POST","header":[{"key":"Content-Type","value":"application/octet-stream","description":"The HTTP header should be specified as 'application/octet-stream'."}],"body":{"mode":"file","file":{}},"url":{"raw":"https://www.summarizebot.com/api/keywords?apiKey=YourAPIKey&filename=1.txt&keywords=10&language=English","protocol":"https","host":["www","summarizebot","com"],"path":["api","keywords"],"query":[{"key":"apiKey","value":"YourAPIKey","description":"To use the API you will need an API key. Please, register at http://www.summarizebot.com/summarization_business.html to get your personal API key."},{"key":"filename","value":"1.txt","description":"Name of the file, e.g. filename=1.txt."},{"key":"keywords","value":"10","description":"Maximum count of keywords to return."},{"key":"language","value":"English","description":"Optional for text files, required for audio files and images."}]},"description":"Extract keywords from binary data. POST body should include file content in binary form. The HTTP header should be specified as 'application/octet-stream'."},"status":"OK","code":200,"_postman_previewlanguage":"json","header":[{"name":"connection","key":"connection","value":"Keep-Alive","description":"Options that are desired for the connection"},{"name":"content-length","key":"content-length","value":"628","description":"The length of the response body in octets (8-bit bytes)"},{"name":"content-type","key":"content-type","value":"application/json","description":"The mime type of this content"},{"name":"date","key":"date","value":"Thu, 29 Mar 2018 21:04:43 GMT","description":"The date and time that the message was sent"},{"name":"keep-alive","key":"keep-alive","value":"timeout=5, max=100","description":"Custom header"},{"name":"server","key":"server","value":"Apache/2.4.7 (Ubuntu)","description":"A name for the server"}],"cookie":[],"responseTime":null,"body":"[{\"keywords\": [{\"weight\": 2.14, \"keyword\": \"extractive methods\", \"ids\": [12, 9]}, {\"weight\": 1.76, \"keyword\": \"surveillance videos\", \"ids\": [7, 27]}, {\"weight\": 1.63, \"keyword\": \"summarization algorithms\", \"ids\": [28]}, {\"weight\": 1.58, \"keyword\": \"topic\", \"ids\": [21, 18]}, {\"weight\": 1.58, \"keyword\": \"summaries\", \"ids\": [16]}, {\"weight\": 1.58, \"keyword\": \"subset\", \"ids\": [9, 28, 3]}, {\"weight\": 1.58, \"keyword\": \"relevance\", \"ids\": [31, 32]}, {\"weight\": 1.58, \"keyword\": \"process\", \"ids\": [0, 32]}, {\"weight\": 1.58, \"keyword\": \"objects\", \"ids\": [15, 28]}, {\"weight\": 1.58, \"keyword\": \"extract\", \"ids\": [12, 13, 9, 8, 7]}]}]\n"}],"_postman_id":"2de6c507-8a00-4d03-be3c-083639078f36"},{"name":"/extract","id":"daaa85af-428c-4a82-afe9-1508badb348b","request":{"method":"GET","header":[],"body":{"mode":"formdata","formdata":[]},"url":"https://www.summarizebot.com/api/extract?apiKey=YourAPIKey&url=https://edition.cnn.com/2018/02/02/sport/six-nations-england-under-20s-rugby-union/index.html","description":"

Extract article text and metadata from a given url.

\n","urlObject":{"protocol":"https","path":["api","extract"],"host":["www","summarizebot","com"],"query":[{"description":{"content":"

To use the API you will need an API key. Please, register at http://www.summarizebot.com/summarization_business.html to get your personal API key.

\n","type":"text/plain"},"key":"apiKey","value":"YourAPIKey"},{"description":{"content":"

Article or web page url.

\n","type":"text/plain"},"key":"url","value":"https://edition.cnn.com/2018/02/02/sport/six-nations-england-under-20s-rugby-union/index.html"}],"variable":[]}},"response":[{"id":"4213e154-7245-4b2d-bbba-dfc79a0a6959","name":"Extract GET result","originalRequest":{"method":"GET","header":[],"body":{"mode":"formdata","formdata":[]},"url":{"raw":"https://www.summarizebot.com/api/extract?apiKey=YourAPIKey&url=https://edition.cnn.com/2018/02/02/sport/six-nations-england-under-20s-rugby-union/index.html","protocol":"https","host":["www","summarizebot","com"],"path":["api","extract"],"query":[{"key":"apiKey","value":"YourAPIKey","description":"To use the API you will need an API key. Please, register at http://www.summarizebot.com/summarization_business.html to get your personal API key."},{"key":"url","value":"https://edition.cnn.com/2018/02/02/sport/six-nations-england-under-20s-rugby-union/index.html","description":"Article or web page url."}]},"description":"Extract article text and metadata from a given url."},"status":"OK","code":200,"_postman_previewlanguage":"json","header":[{"name":"connection","key":"connection","value":"Keep-Alive","description":"Options that are desired for the connection"},{"name":"content-length","key":"content-length","value":"17482","description":"The length of the response body in octets (8-bit bytes)"},{"name":"content-type","key":"content-type","value":"application/json","description":"The mime type of this content"},{"name":"date","key":"date","value":"Thu, 29 Mar 2018 21:10:44 GMT","description":"The date and time that the message was sent"},{"name":"keep-alive","key":"keep-alive","value":"timeout=5, max=100","description":"Custom header"},{"name":"server","key":"server","value":"Apache/2.4.7 (Ubuntu)","description":"A name for the server"}],"cookie":[],"responseTime":null,"body":"{\"text\": \"(CNN) Night has fallen on a toe-numbing English winter's day. In a manor house, where spirits of aristocrats are rumored to roam ancient hallways, are some of England's finest young athletes.\\n\\nIn a dimly lit, oak-paneled room at Bisham Abbey, 30 miles west of London, these 18 to twentysomethings have gathered for another chapter in their learning.\\n\\nA grand-looking Victorian lady, framed in gold, peers down on the assembled players and coaches. On these same dark walls hang the works of Raphael. There is no mistaking that this 13th century building has a past.\\n\\nBut despite the antiquities which surround them, this evening is about what lies ahead.\\n\\nThese young men are preparing for a life of distinction. Only the brightest of athletic talents are taught how to cope with media interrogation and social media's potential pitfalls. The world of the sporting elite is no longer just about the physical, the tactical and the mental.\\n\\nThe players, members of England rugby's elite 32-man Under-20 squad -- some of whom were part of the Under 20's Six Nations grand slam-winning team last year -- are being told to think about how they would like to be perceived.\\n\\nThey are told to show their personality, but to use common sense, to assume nothing is private. It is important, he says, to inspire, to promote the sport, to be engaging, to smile. How peculiar it must be as a teenager to have such a destiny.\\n\\nSuch is their age, these athletes have known little other than a life of Twitter, Snapchat and Instagram. What to do with those juvenile tweets from years gone by, published with flippant immaturity? Delete. The millionaire stars of today could heed such advice.\\n\\nThis is the third day of a week-long training camp. They have already spent hours on a muddy field, perfecting set-pieces and pre-planned plays, and though daylight vanished some hours ago they are still focusing. There is little of the restlessness associated with the rambunctiousness of youth.\\n\\nThey want to be senior internationals. Listening intently to an hour-long lesson on etiquette and media customs is a fraction of what it takes.\\n\\nEngland Rugby, the world's wealthiest national body, has invested heavily in its youth. The professionalism of the Under-20 set-up, one of the dominant forces in this age group, is the pleasing upshot.\\n\\nSince its inception in 2008, England have won the Under-20s World Cup three times -- second to New Zealand's five on the all-time list.\\n\\nNo other country has such strength in depth. Despite having a number of players on tour with the senior squad last summer, they still negotiated their way to the World Cup final, admittedly suffering a heavy defeat to the Baby Blacks.\\n\\nOther than a rudimentary board of honors, which could be easily missed in the room called the \\\"Great Hall,\\\" there is little evidence of past successes at the Abbey.\\n\\nA trophy cabinet would be spilling with silverware -- they have won the U20s Six Nations six times since its formation 10 years ago.\\n\\nThirty-one of the seniors' initial 46-man 2018 Six Nations squad are U20s graduates.\\n\\nBut this is not a place to reminisce about the young-boys-made-good. The forthcoming Six Nations, the summer's World Cup, and the hard work ahead is all consuming.\\n\\nAny youth set-up is a conveyor belt, a factory churning out talent, the pulleys in relentless motion season after season. A rugby player's early years move forward without rest.\\n\\nBut England Rugby, or those in the \\\"pathway\\\" as the country's elite player development program is referred to, are not producing clones. Gone are the days of a homogenous approach to coaching.\\n\\n\\\"We're trying to get each player to be really individual, charismatic, but it doesn't always have to be extroverted,\\\" says Robbie Anderson, the squad's psychologist.\\n\\nEngland's current crop are a more introverted bunch than their predecessors of recent seasons. There are spiritual individuals, one of whom reads the Bible on the bus to matches. This is a squad which uses coffee breaks to discuss values.\\n\\n\\\"They key thing is robustness,\\\" says Anderson. \\\"We don't want 23 players who fit into a certain framework.\\n\\n\\\"They need to ask themselves, 'Am I comfortable with who I am and have I got a support network around me?'\\n\\n\\\"There's people like me looking at the character, Keith Gee is looking at the development and education side, so if they hit a speed bump they're not completely and utterly lost.\\\"\\n\\nThe brilliant will reach the pinnacle, but even the good -- of which all in this squad are -- will likely earn a living in the professional game.\\n\\nBut, as Anderson warns, each player, no matter how dizzying their skill, is one injury away from ruination.\\n\\nAnother training day, another brutally cold January morning. Heads retract into shoulders like frightened tortoises, such is the air's bite. Only the active stay warm on a day such as this.\\n\\nEngland's forwards, not yet fully evolved into granite-jawed beasts, are in the gymnasium making heavy weights look featherlight. The players come in various shapes and sizes, and that, to an extent, is still rugby union's charm.\\n\\nWatching on is the squad's medical team. These are players in transition and must be treated carefully. Once kings at school age, many are now fledgling professionals, absorbing brutal hits from hardened club players.\\n\\nDr. Phil Riley explains that his team's role is not solely to mend. They are here to also prevent and educate.\\n\\n\\\"Most of them have left school, where they were important, big players, and have gone to a club environment where they're expected to turn up every day and perform every day and are used a little as cannon fodder,\\\" he says.\\n\\n\\\"They go from being less susceptible to injury, because they are the biggest players, to more susceptible.\\n\\n\\\"We work with our strength and conditioning team and get to know the players well -- their injury history, the problems they've faced in the past -- and try to develop appropriate training programs, medical programs, and rehab programs to either prevent first injury or prevent second injury.\\\"\\n\\nDuring the last five years laws have been changed, protocols enhanced, players educated and medics empowered. All of this, says Riley, has brought a \\\"sea change\\\" in coaching culture.\\n\\n\\\"We haven't got it right,\\\" admits Riley. \\\"But we're working towards identifying it, managing it and removing players when there's concern.\\\"\\n\\nWhile the forwards are in the gym, the squad's backline -- the speedsters, the creatives -- are a short stroll away in the Abbey, thinking up strategies to outwit the opposition. In an hour or so, the two groups will trade places.\\n\\nPre-planned moves are being written into books. Come Friday night against Italy in Gorizia, these theories will come as naturally to them on the field as flying is to birds. Or that is the plan.\\n\\nEach move has a name, ones which can't be printed here for fear of giving rivals an advantage. To the uninitiated, the players are talking in riddles. Incongruous nouns are thrown into the middle of sentences. Raised eyebrows can be the only reaction of those not in the know.\\n\\nThey scribble, they whisper and then they openly discuss.\\n\\nWhat is notable is that it is the players who dominate the conversation. A coach, in this instance James Ponton, poses questions, but it is the players who must find solutions. Leadership is a word often repeated inside the camp and here it is in practice.\\n\\n\\\"There's a lot of onus on the players to know our roles in all of the things that we are doing,\\\" says Will Butler, a center in his first professional contract with Premiership club Worcester Warriors.\\n\\nThough yet 10am, the day is already some hours in the making for the players and coaches.\\n\\nTwo hours ago they took their wellness and hydration tests (urine samples). These examinations can tell coaches and doctors much of what they need to know about each individual (the quality of their sleep, their nutritional needs, muscle soreness and hydration, for example).\\n\\nBreakfast has been devoured and \\\"unit meetings\\\" conducted -- there will be a further two team meetings before the day's end.\\n\\nSuch a detailed approach creates data. Lots of it.\\n\\nThe man absorbing this stream of statistics is strength and conditioning coach Robin Eager. There is a wealth of information to filter, he says. It is non-stop.\\n\\nA typical day will involve copious amounts of planning. He will be in the gym, getting the players stronger and more powerful, and he will discuss with the coaches how best to organize sessions. How hard should a player be pushed? Eager will have the answer.\\n\\n\\\"If the planning process is good, my communication with coaches can be 'yes, we're on plan' and that's all they need to know,\\\" he says, admitting that neither coaches or players enjoy deciphering spreadsheets.\\n\\nThe majority of his time will be spent liaising with Premiership clubs, for England only have these players at their disposal for 13 weeks a year. The responsibility of getting a player ready for international rugby rests not solely with the union.\\n\\nInternational rugby is a gladiatorial contest. Since the sport went professional in 1995, players have become bigger, stronger and faster. The forwards in the New Zealand side that won the inaugural Rugby World Cup in 1987 weighed on average 15st 9lb (99.5 kg) a man. The biggest forward at the 2015 World Cup, France's Unini Atonio, topped the scales at 22st 12 lb (145kg).\\n\\nCompared to the behemoths of today, those playing 30 years ago look like figurines. Players' frames must now withstand one bone-crunching tackle after another. Is there pressure on young players to beef up too rapidly?\\n\\n\\\"It's easier to build strong kids than it is to repair broken men,\\\" Eager says.\\n\\n\\\"They've got to be able to move competently first before we try getting them significantly stronger through lifting heavier weights. It's giving them a coat of armor so they can start to tolerate the demands as they progress.\\n\\n\\\"It's easy to get people to body build, but they won't be good rugby players. They won't be resilient to the demands of the sport.\\n\\n\\\"You have to have an element of patience, particularly with some of the second-row forwards. It generally takes them longer, as they tend to be the skinny, lanky kids, so there's coordination issues and, generally, there's 20kg to put on.\\n\\n\\\"If you want to accelerate it, fine, but you have to understand there are potential pitfalls.\\\"\\n\\nHe adds: \\\"Strength and conditioning would be an easy job if we said we've got to get all props to a certain weight or a certain level of fitness.\\n\\n\\\"But that's not realistic and that's what makes it interesting -- you've got to make decisions all the time based on what's going to make an individual excel in the one or two things which will potentially make him an international player.\\n\\n\\\"The flip side is, what made an international player five years ago isn't necessarily what's going to make these guys international players in five years' time. Who knows what that looks like, I definitely don't.\\\"\\n\\nIntensity and stress - what it takes to be great\\n\\nRefueled after lunch -- the squad will shovel down 40 to 50 chicken breasts a day -- the players head outside to put the morning's theory into practice. Code names are yelled and each player will scuttle into position for a choreographed move. It's a dance of sorts, a bruising ballet.\\n\\nSteve Bates, the RFU's performance manager and international performance coach, says little on the sidelines, leaving his lieutenants to bellow orders. Communicating with the coaches via ear pieces are the strength and conditioning coaches. There is a set time for each move. This is no place for ambiguity.\\n\\nAs the players go through what is called a \\\"game test\\\" session, the size of this operation is more visible than ever. There's a kit man, two analysts standing on scaffolding filming, four coaches, a media manager, a four-man medical team and two strength and conditioning coaches.\\n\\nThe attention to detail should come as no surprise.\\n\\nThe next generation are being taken out of their rugby bubble. English rugby has huge potential, and it wants to fulfill it.\\n\\nToday's high-intensity session aims to test the players' stress tolerance. It is the ones who can cope with playing under such fire, says Bates, who usually make it through the system.\\n\\n\\\"What we're looking for from players in this environment is more about their mental application, a resilience to keep bouncing back, a toughness to really fight for their position and when things are tough in training,\\\" says Bates, who joined the Under-20s set-up as head coach last August.\\n\\n\\\"There are a lot of guys who are probably physically in the ball park, there's quite a few guys who can play, but there aren't many guys who are physical and can play under this stress.\\n\\n\\\"It's our job to develop that playing under stress, with a winning element to it. Some of the best players can cope with that easily. You can spot those characteristics in the very, very good.\\n\\n\\\"But playing under stress is also something that can be developed by being put under those conditions for longer periods of time consistently.\\\"\\n\\nBates, a former London Wasps and England scrum-half, is the man credited with unearthing Jonny Wilkinson, arguably his country's greatest player.\\n\\nThe 54-year-old was a player during the beer-swigging amateur era, when international players would train with their compatriots a few days before a Test, combining rugby with full-time jobs. Much has changed since then, admits Bates with a wry smile.\\n\\n\\\"For me, the thing that stands out in this environment is how much the individual is the focus of attention. The difference is so marked,\\\" he says.\\n\\n\\\"All the analysis, the GPS stuff, the dietary stuff, is all athlete centered. How do we get the best out of our players and use the technology and the resources that are available to improve individuals?\\n\\n\\\"The mentality of everybody in the game is push, push, push. The inquisitiveness, the desire to push the game forward, is as high as I've ever known it.\\\"\\n\\nThey have access to Jones' ideas and methods. The Australian, says Bates, is \\\"outstanding\\\" in his attention to detail on the individual. They have also exchanged ideas with coaches from other sports, most recently Britain's boxing team.\\n\\nHis aim, says Bates, is to produce players who are adaptable, who can play \\\"for any number of coaches.\\\"\\n\\nIt is nearly time to rest completely. Another challenging day is nearing its conclusion and players and coaches are devouring plates of carefully planned carbohydrates and proteins.\\n\\nGood nutrition, of course, is essential if these players are to blossom. Only with the right fuel can they thrive under pressure and outlast the opposition.\\n\\nCentre Butler says he is now able to maintain his power, speed and strength in the final 20 minutes of a match, the period where England aim to kill off tiring opponents, thanks to a better understanding of what he needs to eat and when.\\n\\nAt times, he feels he is just endlessly consuming calories. It can be testing, Butler says, but it can also be fun.\\n\\n\\\"There are protein hits every three hours, having six meals a day,\\\" says the 19-year-old.\\n\\n\\\"The majority of these camps are quite intense so there's a lot of eating going on and the nutritionist does a very good job in keeping everyone happy.\\\"\\n\\nNutritionist Andres Kasper ensures the players' dietary needs are met, formulating detailed menus for the catering staff to serve up.\\n\\nA player's intake reads like that of a bear's before hibernation. Protein levels are prescribed on an individual basis, but equates to about 20-40g five to six times a day, dependent on body weight.\\n\\nAfter a tough training session, the squad will consume, says Kasper, in the region of 2-2.5kg of cooked pasta at lunch and dinner. There is also Greek yogurt for easy protein hits -- the squad will go through about 8kg tubs of the stuff per day.\\n\\nAs they refill and restore, the players aren't as earnest as they have been on the pitch and in meetings. They have loosened up. They chat, they joke. Reassuringly, they are not in habitual focus.\\n\\n\\\"There's that expectation that now you're an actual rugby player there's a responsibility of being an athlete and having high standards and not messing about on the training field because we don't have that much time together as a team,\\\" explains Butler.\\n\\nAfter dinner some will play table tennis in the games' room, others will receive treatment on aches and pains and a chosen few will analyze the day's training session and report back.\\n\\nThen it is to bed, to the on-site dormitories, to recover and rest before doing it again tomorrow. It is relentless. It is challenging. It is what it takes to become a great.\", \"article title\": \"How to build a rugby player -- Inside England's Under-20s camp\", \"meta information\": {\"meta description\": \"England's Under-20s give CNN Sport exclusive access as they prepare for the Under-20 Six Nations, a championship they have won six times in 10 years.\", \"publish date\": null, \"image\": \"https://cdn.cnn.com/cnnnext/dam/assets/180129105453-owen-farrell-super-tease.jpg\", \"authors\": [\"Aimee Lewis\"], \"meta keywords\": \"sport, Six Nations 2018: Inside England's Under-20s training camp - CNN\"}}\n"}],"_postman_id":"daaa85af-428c-4a82-afe9-1508badb348b"},{"name":"/extract","id":"7efd9ed4-8f06-4a2d-91a3-5a5bebdfb249","request":{"method":"POST","header":[{"key":"Content-Type","value":"application/octet-stream","description":"

The HTTP header should be specified as 'application/octet-stream'.

\n"}],"body":{"mode":"file","file":{}},"url":"https://www.summarizebot.com/api/extract?apiKey=YourAPIKey&filename=1.txt","description":"

Extract article text and metadata from binary data. POST body should include file content in binary form. The HTTP header should be specified as 'application/octet-stream'.

\n","urlObject":{"protocol":"https","path":["api","extract"],"host":["www","summarizebot","com"],"query":[{"description":{"content":"

To use the API you will need an API key. Please, register at http://www.summarizebot.com/summarization_business.html to get your personal API key.

\n","type":"text/plain"},"key":"apiKey","value":"YourAPIKey"},{"description":{"content":"

Name of the file, e.g. filename=1.txt.

\n","type":"text/plain"},"key":"filename","value":"1.txt"}],"variable":[]}},"response":[{"id":"20db3306-879d-441b-a8ce-cccf5b959a56","name":"Extract POST result","originalRequest":{"method":"POST","header":[{"key":"Content-Type","value":"application/octet-stream","description":"The HTTP header should be specified as 'application/octet-stream'."}],"body":{"mode":"file","file":{}},"url":{"raw":"https://www.summarizebot.com/api/extract?apiKey=YourAPIKey&filename=1.txt","protocol":"https","host":["www","summarizebot","com"],"path":["api","extract"],"query":[{"key":"apiKey","value":"YourAPIKey","description":"To use the API you will need an API key. Please, register at http://www.summarizebot.com/summarization_business.html to get your personal API key."},{"key":"filename","value":"1.txt","description":"Name of the file, e.g. filename=1.txt."}]},"description":"Extract article text and metadata from binary data. POST body should include file content in binary form. The HTTP header should be specified as 'application/octet-stream'."},"status":"OK","code":200,"_postman_previewlanguage":"json","header":[{"name":"connection","key":"connection","value":"Keep-Alive","description":"Options that are desired for the connection"},{"name":"content-length","key":"content-length","value":"4150","description":"The length of the response body in octets (8-bit bytes)"},{"name":"content-type","key":"content-type","value":"application/json","description":"The mime type of this content"},{"name":"date","key":"date","value":"Thu, 29 Mar 2018 21:16:16 GMT","description":"The date and time that the message was sent"},{"name":"keep-alive","key":"keep-alive","value":"timeout=5, max=100","description":"Custom header"},{"name":"server","key":"server","value":"Apache/2.4.7 (Ubuntu)","description":"A name for the server"}],"cookie":[],"responseTime":null,"body":"{\"text\": \"Automatic summarization is the process of shortening a text document with software, in order to create a summary with the major points of the original document. Technologies that can make a coherent summary take into account variables such as length, writing style and syntax.\\r\\n\\r\\nAutomatic data summarization is part of machine learning and data mining. The main idea of summarization is to find a subset of data which contains the \\\"information\\\" of the entire set. Such techniques are widely used in industry today. Search engines are an example; others include summarization of documents, image collections and videos. Document summarization tries to create a representative summary or abstract of the entire document, by finding the most informative sentences, while in image summarization the system finds the most representative and important (i.e. salient) images. For surveillance videos, one might want to extract the important events from the uneventful context.\\r\\n\\r\\nThere are two general approaches to automatic summarization: extraction and abstraction. Extractive methods work by selecting a subset of existing words, phrases, or sentences in the original text to form the summary. In contrast, abstractive methods build an internal semantic representation and then use natural language generation techniques to create a summary that is closer to what a human might express. Such a summary might include verbal innovations. Research to date has focused primarily on extractive methods, which are appropriate for image collection summarization and video summarization.\\r\\n\\r\\nThere are broadly two types of extractive summarization tasks depending on what the summarization program focuses on. The first is generic summarization, which focuses on obtaining a generic summary or abstract of the collection (whether documents, or sets of images, or videos, news stories etc.). The second is query relevant summarization, sometimes called query-based summarization, which summarizes objects specific to a query. Summarization systems are able to create both query relevant text summaries and generic machine-generated summaries depending on what the user needs.\\r\\n\\r\\nAn example of a summarization problem is document summarization, which attempts to automatically produce an abstract from a given document. Sometimes one might be interested in generating a summary from a single source document, while others can use multiple source documents (for example, a cluster of articles on the same topic). This problem is called multi-document summarization. A related application is summarizing news articles. Imagine a system, which automatically pulls together news articles on a given topic (from the web), and concisely represents the latest news as a summary.\\r\\n\\r\\nImage collection summarization is another application example of automatic summarization. It consists in selecting a representative set of images from a larger set of images.[1] A summary in this context is useful to show the most representative images of results in an image collection exploration system. Video summarization is a related domain, where the system automatically creates a trailer of a long video. This also has applications in consumer or personal videos, where one might want to skip the boring or repetitive actions. Similarly, in surveillance videos, one would want to extract important and suspicious activity, while ignoring all the boring and redundant frames captured.\\r\\n\\r\\nAt a very high level, summarization algorithms try to find subsets of objects (like set of sentences, or a set of images), which cover information of the entire set. This is also called the core-set. These algorithms model notions like diversity, coverage, information and representativeness of the summary. Query based summarization techniques, additionally model for relevance of the summary with the query. Some techniques and algorithms which naturally model summarization problems are TextRank and PageRank, Submodular set function, Determinantal point process, maximal marginal relevance (MMR) etc.\\r\\n\", \"article title\": \"\", \"meta information\": {}}\n"}],"_postman_id":"7efd9ed4-8f06-4a2d-91a3-5a5bebdfb249"},{"name":"/language","id":"e937d267-a4c0-4911-8cb8-7313f9609e9f","request":{"method":"GET","header":[],"body":{"mode":"formdata","formdata":[]},"url":"https://www.summarizebot.com/api/language?apiKey=YourAPIKey&url=https://en.wikipedia.org/wiki/Automatic_summarization","description":"

Detect text language from a given url.

\n","urlObject":{"protocol":"https","path":["api","language"],"host":["www","summarizebot","com"],"query":[{"description":{"content":"

To use the API you will need an API key. Please, register at http://www.summarizebot.com/summarization_business.html to get your personal API key.

\n","type":"text/plain"},"key":"apiKey","value":"YourAPIKey"},{"description":{"content":"

Article or web page url.

\n","type":"text/plain"},"key":"url","value":"https://en.wikipedia.org/wiki/Automatic_summarization"}],"variable":[]}},"response":[{"id":"dea479c3-0d89-407f-8fcb-dd7eb104ded6","name":"Language GET Result","originalRequest":{"method":"GET","header":[],"body":{"mode":"formdata","formdata":[]},"url":{"raw":"https://www.summarizebot.com/api/language?apiKey=YourAPIKey&url=https://en.wikipedia.org/wiki/Automatic_summarization","protocol":"https","host":["www","summarizebot","com"],"path":["api","language"],"query":[{"key":"apiKey","value":"YourAPIKey","description":"To use the API you will need an API key. Please, register at http://www.summarizebot.com/summarization_business.html to get your personal API key."},{"key":"url","value":"https://en.wikipedia.org/wiki/Automatic_summarization","description":"Article or web page url."}]},"description":"Detect text language from a given url."},"status":"OK","code":200,"_postman_previewlanguage":"json","header":[{"name":"connection","key":"connection","value":"Keep-Alive","description":"Options that are desired for the connection"},{"name":"content-length","key":"content-length","value":"19","description":"The length of the response body in octets (8-bit bytes)"},{"name":"content-type","key":"content-type","value":"application/json","description":"The mime type of this content"},{"name":"date","key":"date","value":"Thu, 29 Mar 2018 21:19:20 GMT","description":"The date and time that the message was sent"},{"name":"keep-alive","key":"keep-alive","value":"timeout=5, max=100","description":"Custom header"},{"name":"server","key":"server","value":"Apache/2.4.7 (Ubuntu)","description":"A name for the server"}],"cookie":[],"responseTime":null,"body":"{\"language\": \"en\"}\n"}],"_postman_id":"e937d267-a4c0-4911-8cb8-7313f9609e9f"},{"name":"/language","id":"d2fca72f-a8e4-4a12-8dc1-12465e1e7aa3","request":{"method":"POST","header":[{"key":"Content-Type","value":"application/octet-stream","description":"

The HTTP header should be specified as 'application/octet-stream'.

\n"}],"body":{"mode":"file","file":{}},"url":"https://www.summarizebot.com/api/language?apiKey=YourAPIKey&filename=1.txt","description":"

Detect text language from binary data. POST body should include file content in binary form. The HTTP header should be specified as 'application/octet-stream'.

\n","urlObject":{"protocol":"https","path":["api","language"],"host":["www","summarizebot","com"],"query":[{"description":{"content":"

To use the API you will need an API key. Please, register at http://www.summarizebot.com/summarization_business.html to get your personal API key.

\n","type":"text/plain"},"key":"apiKey","value":"YourAPIKey"},{"description":{"content":"

Name of the file, e.g. filename=1.txt.

\n","type":"text/plain"},"key":"filename","value":"1.txt"}],"variable":[]}},"response":[{"id":"765f3b77-1359-48bf-b1a1-846f32861b2a","name":"Language POST result","originalRequest":{"method":"POST","header":[{"key":"Content-Type","value":"application/octet-stream","description":"The HTTP header should be specified as 'application/octet-stream'."}],"body":{"mode":"file","file":{}},"url":{"raw":"https://www.summarizebot.com/api/language?apiKey=YourAPIKey&filename=1.txt","protocol":"https","host":["www","summarizebot","com"],"path":["api","language"],"query":[{"key":"apiKey","value":"YourAPIKey","description":"To use the API you will need an API key. Please, register at http://www.summarizebot.com/summarization_business.html to get your personal API key."},{"key":"filename","value":"1.txt","description":"Name of the file, e.g. filename=1.txt."}]},"description":"Detect text language from binary data. POST body should include file content in binary form. The HTTP header should be specified as 'application/octet-stream'."},"status":"OK","code":200,"_postman_previewlanguage":"json","header":[{"name":"connection","key":"connection","value":"Keep-Alive","description":"Options that are desired for the connection"},{"name":"content-length","key":"content-length","value":"19","description":"The length of the response body in octets (8-bit bytes)"},{"name":"content-type","key":"content-type","value":"application/json","description":"The mime type of this content"},{"name":"date","key":"date","value":"Thu, 29 Mar 2018 21:22:09 GMT","description":"The date and time that the message was sent"},{"name":"keep-alive","key":"keep-alive","value":"timeout=5, max=100","description":"Custom header"},{"name":"server","key":"server","value":"Apache/2.4.7 (Ubuntu)","description":"A name for the server"}],"cookie":[],"responseTime":null,"body":"{\"language\": \"en\"}\n"}],"_postman_id":"d2fca72f-a8e4-4a12-8dc1-12465e1e7aa3"},{"name":"/sentiment","id":"571e3f55-f917-43e7-8ff8-2a56683ebc25","request":{"method":"GET","header":[],"body":{"mode":"formdata","formdata":[]},"url":"https://www.summarizebot.com/api/sentiment?apiKey=YourAPIKey&url=https://edition.cnn.com/2018/03/30/middleeast/gaza-protests-intl/index.html","description":"

Analyze text for positive or negative sentiment from a given url.

\n","urlObject":{"protocol":"https","path":["api","sentiment"],"host":["www","summarizebot","com"],"query":[{"description":{"content":"

To use the API you will need an API key. Please, register at http://www.summarizebot.com/summarization_business.html to get your personal API key.

\n","type":"text/plain"},"key":"apiKey","value":"YourAPIKey"},{"description":{"content":"

Article or web page url.

\n","type":"text/plain"},"key":"url","value":"https://edition.cnn.com/2018/03/30/middleeast/gaza-protests-intl/index.html"}],"variable":[]}},"response":[{"id":"d467f2a4-62d9-4ccb-9242-9808d9e81101","name":"Sentiment GET result","originalRequest":{"method":"GET","header":[],"body":{"mode":"formdata","formdata":[]},"url":{"raw":"https://www.summarizebot.com/api/sentiment?apiKey=YourAPIKey&url=https://edition.cnn.com/2018/03/30/middleeast/gaza-protests-intl/index.html","protocol":"https","host":["www","summarizebot","com"],"path":["api","sentiment"],"query":[{"key":"apiKey","value":"YourAPIKey","description":"To use the API you will need an API key. Please, register at http://www.summarizebot.com/summarization_business.html to get your personal API key."},{"key":"url","value":"https://edition.cnn.com/2018/03/30/middleeast/gaza-protests-intl/index.html","description":"Article or web page url."}]},"description":"Analyze text for positive or negative sentiment from a given url."},"status":"OK","code":200,"_postman_previewlanguage":"json","header":[{"name":"connection","key":"connection","value":"Keep-Alive","description":"Options that are desired for the connection"},{"name":"content-length","key":"content-length","value":"4315","description":"The length of the response body in octets (8-bit bytes)"},{"name":"content-type","key":"content-type","value":"application/json","description":"The mime type of this content"},{"name":"date","key":"date","value":"Sat, 31 Mar 2018 11:34:54 GMT","description":"The date and time that the message was sent"},{"name":"keep-alive","key":"keep-alive","value":"timeout=5, max=100","description":"Custom header"},{"name":"server","key":"server","value":"Apache/2.4.7 (Ubuntu)","description":"A name for the server"}],"cookie":[],"responseTime":null,"body":"{\"aspects\": [{\"polarity\": \"negative\", \"phrase\": \"protesters\", \"weight\": -0.6}, {\"polarity\": \"negative\", \"phrase\": \"protests\", \"weight\": -0.79}, {\"polarity\": \"negative\", \"phrase\": \"quickly turned\", \"weight\": -0.7}, {\"polarity\": \"negative\", \"phrase\": \"bloody\", \"weight\": -0.44}, {\"polarity\": \"positive\", \"phrase\": \"away\", \"weight\": 0.7}, {\"polarity\": \"positive\", \"phrase\": \"half\", \"weight\": 0.7}, {\"polarity\": \"positive\", \"phrase\": \"hour\", \"weight\": 0.7}, {\"polarity\": \"negative\", \"phrase\": \"small number\", \"weight\": -0.7}, {\"polarity\": \"negative\", \"phrase\": \"injured by\", \"weight\": -0.81}, {\"polarity\": \"positive\", \"phrase\": \"live\", \"weight\": 0.7}, {\"polarity\": \"positive\", \"phrase\": \"young men\", \"weight\": 0.7}, {\"polarity\": \"negative\", \"phrase\": \"injured\", \"weight\": -0.78}, {\"polarity\": \"negative\", \"phrase\": \"burning\", \"weight\": -0.69}, {\"polarity\": \"positive\", \"phrase\": \"security\", \"weight\": 0.6}, {\"polarity\": \"negative\", \"phrase\": \"riot\", \"weight\": -0.8}, {\"polarity\": \"positive\", \"phrase\": \"main\", \"weight\": 0.7}, {\"polarity\": \"negative\", \"phrase\": \"instigators\", \"weight\": -0.49}, {\"polarity\": \"positive\", \"phrase\": \"warned\", \"weight\": 0.7}, {\"polarity\": \"negative\", \"phrase\": \"breach\", \"weight\": -0.53}, {\"polarity\": \"negative\", \"phrase\": \"severely\", \"weight\": -0.7}, {\"polarity\": \"positive\", \"phrase\": \"numerous\", \"weight\": 0.7}, {\"polarity\": \"positive\", \"phrase\": \"successful\", \"weight\": 0.73}, {\"polarity\": \"negative\", \"phrase\": \"breaches\", \"weight\": -0.64}, {\"polarity\": \"positive\", \"phrase\": \"fighter\", \"weight\": 0.48}, {\"polarity\": \"negative\", \"phrase\": \"fire\", \"weight\": -0.7}, {\"polarity\": \"positive\", \"phrase\": \"exchange\", \"weight\": 0.7}, {\"polarity\": \"negative\", \"phrase\": \"wounded\", \"weight\": -0.82}, {\"polarity\": \"positive\", \"phrase\": \"largest\", \"weight\": 0.7}, {\"polarity\": \"negative\", \"phrase\": \"protest\", \"weight\": -0.78}, {\"polarity\": \"positive\", \"phrase\": \"the large\", \"weight\": 0.7}, {\"polarity\": \"positive\", \"phrase\": \"right\", \"weight\": 0.58}, {\"polarity\": \"positive\", \"phrase\": \"return\", \"weight\": 0.7}, {\"polarity\": \"negative\", \"phrase\": \"break\", \"weight\": -0.64}, {\"polarity\": \"negative\", \"phrase\": \"siege\", \"weight\": -0.7}, {\"polarity\": \"positive\", \"phrase\": \"possible\", \"weight\": 0.7}, {\"polarity\": \"negative\", \"phrase\": \"increased\", \"weight\": -0.7}, {\"polarity\": \"positive\", \"phrase\": \"put\", \"weight\": 0.7}, {\"polarity\": \"negative\", \"phrase\": \"killed\", \"weight\": -0.79}, {\"polarity\": \"positive\", \"phrase\": \"including\", \"weight\": 0.7}, {\"polarity\": \"negative\", \"phrase\": \"militant\", \"weight\": -0.61}, {\"polarity\": \"negative\", \"phrase\": \"blamed\", \"weight\": -0.82}, {\"polarity\": \"negative\", \"phrase\": \"added\", \"weight\": -0.7}, {\"polarity\": \"negative\", \"phrase\": \"endangering\", \"weight\": -0.71}, {\"polarity\": \"negative\", \"phrase\": \"end\", \"weight\": -0.6}, {\"polarity\": \"negative\", \"phrase\": \"injuries\", \"weight\": -0.67}, {\"polarity\": \"negative\", \"phrase\": \"suffered\", \"weight\": -0.84}, {\"polarity\": \"negative\", \"phrase\": \"expected\", \"weight\": -0.7}, {\"polarity\": \"negative\", \"phrase\": \"continue\", \"weight\": -0.49}, {\"polarity\": \"positive\", \"phrase\": \"independence\", \"weight\": 0.7}, {\"polarity\": \"negative\", \"phrase\": \"displaced\", \"weight\": -0.49}, {\"polarity\": \"positive\", \"phrase\": \"war\", \"weight\": 0.7}, {\"polarity\": \"positive\", \"phrase\": \"living\", \"weight\": 0.7}, {\"polarity\": \"negative\", \"phrase\": \"followed\", \"weight\": -0.7}, {\"polarity\": \"positive\", \"phrase\": \"increasing\", \"weight\": 0.7}, {\"polarity\": \"negative\", \"phrase\": \"clashes\", \"weight\": -0.71}, {\"polarity\": \"negative\", \"phrase\": \"controversial\", \"weight\": -0.6}, {\"polarity\": \"negative\", \"phrase\": \"slammed by\", \"weight\": -0.56}, {\"polarity\": \"negative\", \"phrase\": \"many\", \"weight\": -0.7}, {\"polarity\": \"positive\", \"phrase\": \"was viewed\", \"weight\": 0.7}, {\"polarity\": \"negative\", \"phrase\": \"erosion\", \"weight\": -0.49}, {\"polarity\": \"positive\", \"phrase\": \"hope\", \"weight\": 0.74}, {\"polarity\": \"negative\", \"phrase\": \"take\", \"weight\": -0.7}, {\"polarity\": \"positive\", \"phrase\": \"of simple\", \"weight\": 0.24}, {\"polarity\": \"negative\", \"phrase\": \"prejudge\", \"weight\": -0.49}, {\"polarity\": \"positive\", \"phrase\": \"peace\", \"weight\": 0.7}, {\"polarity\": \"positive\", \"phrase\": \"responsible\", \"weight\": 0.73}, {\"polarity\": \"negative\", \"phrase\": \"mourning\", \"weight\": -0.77}], \"sentiment\": {\"polarity\": \"negative\", \"weight\": -15.64}}\n"}],"_postman_id":"571e3f55-f917-43e7-8ff8-2a56683ebc25"},{"name":"/sentiment","id":"e028a81e-fe2d-484b-92af-13b7186569c5","request":{"method":"POST","header":[{"key":"Content-Type","value":"application/octet-stream","description":"

The HTTP header should be specified as 'application/octet-stream'.

\n"}],"body":{"mode":"file","file":{}},"url":"https://www.summarizebot.com/api/sentiment?apiKey=YourAPIKey&filename=1.txt","description":"

Analyze text for positive or negative sentiment from binary data. POST body should include file content in binary form. The HTTP header should be specified as 'application/octet-stream'.

\n","urlObject":{"protocol":"https","path":["api","sentiment"],"host":["www","summarizebot","com"],"query":[{"description":{"content":"

To use the API you will need an API key. Please, register at http://www.summarizebot.com/summarization_business.html to get your personal API key.

\n","type":"text/plain"},"key":"apiKey","value":"YourAPIKey"},{"description":{"content":"

Name of the file, e.g. filename=1.txt.

\n","type":"text/plain"},"key":"filename","value":"1.txt"}],"variable":[]}},"response":[{"id":"a71f84c5-41a9-4c92-a7c5-8c092518fa59","name":"Sentiment POST result","originalRequest":{"method":"POST","header":[{"key":"Content-Type","value":"application/octet-stream","description":"The HTTP header should be specified as 'application/octet-stream'."}],"body":{"mode":"file","file":{}},"url":{"raw":"https://www.summarizebot.com/api/sentiment?apiKey=YourAPIKey&filename=1.txt","protocol":"https","host":["www","summarizebot","com"],"path":["api","sentiment"],"query":[{"key":"apiKey","value":"YourAPIKey","description":"To use the API you will need an API key. Please, register at http://www.summarizebot.com/summarization_business.html to get your personal API key."},{"key":"filename","value":"1.txt","description":"Name of the file, e.g. filename=1.txt."}]},"description":"Analyze text for positive or negative sentiment from binary data. POST body should include file content in binary form. The HTTP header should be specified as 'application/octet-stream'."},"status":"OK","code":200,"_postman_previewlanguage":"json","header":[{"name":"connection","key":"connection","value":"Keep-Alive","description":"Options that are desired for the connection"},{"name":"content-length","key":"content-length","value":"929","description":"The length of the response body in octets (8-bit bytes)"},{"name":"content-type","key":"content-type","value":"application/json","description":"The mime type of this content"},{"name":"date","key":"date","value":"Sat, 31 Mar 2018 11:43:39 GMT","description":"The date and time that the message was sent"},{"name":"keep-alive","key":"keep-alive","value":"timeout=5, max=100","description":"Custom header"},{"name":"server","key":"server","value":"Apache/2.4.7 (Ubuntu)","description":"A name for the server"}],"cookie":[],"responseTime":null,"body":"{\"aspects\": [{\"polarity\": \"positive\", \"phrase\": \"centrally\", \"weight\": 0.7}, {\"polarity\": \"positive\", \"phrase\": \"minutes\", \"weight\": 0.7}, {\"polarity\": \"positive\", \"phrase\": \"away\", \"weight\": 0.7}, {\"polarity\": \"positive\", \"phrase\": \"central station\", \"weight\": 0.7}, {\"polarity\": \"positive\", \"phrase\": \"vibrant\", \"weight\": 0.65}, {\"polarity\": \"positive\", \"phrase\": \"shopping district\", \"weight\": 0.7}, {\"polarity\": \"positive\", \"phrase\": \"are modern\", \"weight\": 0.7}, {\"polarity\": \"positive\", \"phrase\": \"very comfortable\", \"weight\": 0.7}, {\"polarity\": \"positive\", \"phrase\": \"are lovely\", \"weight\": 0.7}, {\"polarity\": \"positive\", \"phrase\": \"accommodating\", \"weight\": 0.7}, {\"polarity\": \"positive\", \"phrase\": \"Great\", \"weight\": 0.67}, {\"polarity\": \"positive\", \"phrase\": \"stay\", \"weight\": 0.7}, {\"polarity\": \"positive\", \"phrase\": \"would definitely recommend\", \"weight\": 0.7}], \"sentiment\": {\"polarity\": \"positive\", \"weight\": 9.72}}\n"}],"_postman_id":"e028a81e-fe2d-484b-92af-13b7186569c5"},{"name":"/comments","id":"db8d945e-1ac3-4aab-8767-7b22fead48d9","request":{"method":"GET","header":[],"body":{"mode":"formdata","formdata":[]},"url":"https://www.summarizebot.com/api/comments?apiKey=YourAPIKey&url=https://news.ycombinator.com/item?id=16719403","description":"

Extract comments from a given url.

\n","urlObject":{"protocol":"https","path":["api","comments"],"host":["www","summarizebot","com"],"query":[{"description":{"content":"

To use the API you will need an API key. Please, register at http://www.summarizebot.com/summarization_business.html to get your personal API key.

\n","type":"text/plain"},"key":"apiKey","value":"YourAPIKey"},{"description":{"content":"

Article or web page url.

\n","type":"text/plain"},"key":"url","value":"https://news.ycombinator.com/item?id=16719403"}],"variable":[]}},"response":[{"id":"6e86a97b-3470-4c50-9f69-ed7116e99648","name":"Comments GET result","originalRequest":{"method":"GET","header":[],"body":{"mode":"formdata","formdata":[]},"url":{"raw":"https://www.summarizebot.com/api/comments?apiKey=YourAPIKey&url=https://news.ycombinator.com/item?id=16719403","protocol":"https","host":["www","summarizebot","com"],"path":["api","comments"],"query":[{"key":"apiKey","value":"YourAPIKey","description":"To use the API you will need an API key. Please, register at http://www.summarizebot.com/summarization_business.html to get your personal API key."},{"key":"url","value":"https://news.ycombinator.com/item?id=16719403","description":"Article or web page url."}]},"description":"Extract comments from a given url."},"status":"OK","code":200,"_postman_previewlanguage":"json","header":[{"name":"connection","key":"connection","value":"Keep-Alive","description":"Options that are desired for the connection"},{"name":"content-length","key":"content-length","value":"64443","description":"The length of the response body in octets (8-bit bytes)"},{"name":"content-type","key":"content-type","value":"application/json","description":"The mime type of this content"},{"name":"date","key":"date","value":"Sat, 31 Mar 2018 11:20:31 GMT","description":"The date and time that the message was sent"},{"name":"keep-alive","key":"keep-alive","value":"timeout=5, max=100","description":"Custom header"},{"name":"server","key":"server","value":"Apache/2.4.7 (Ubuntu)","description":"A name for the server"}],"cookie":[],"responseTime":null,"body":"{\"comments\": [\"\\nSpoiler warning. Article punchline ahead.\\\"The whole point of a dimension reduction model is to mathematically represent the data in simpler form. It\\u2019s as if Cambridge Analytica took a very high-resolution photograph, resized it to be smaller, and then deleted the original. The photo still exists \\u2014 and as long as Cambridge Analytica\\u2019s models exist, the data effectively does too.\\\"That's an eloquent piece of explanation of a very important point.\\nAnd apropos the discussion about privacy legislation, it's also going to be a very interesting point. Will the Cambridge Analyticas of the world be able to claim they have held on to no personal data, when strictly speaking the raw data has indeed been deleted after being used to create a derivative work that can for all important purposes be used to recreate the original?\\nAssuming I find out I'm being profiled and demand to have my data removed, will society grant me rights to have derivative forms removed or adjusted too?\\nI'm somewhat pessimistic that legal hairsplitting about matters like these will make enforcement very difficult.\\n \\nreply\\n\\n\", \"\\n> when strictly speaking the raw data has indeed been deleted after being used to create a derivative work that can for all important purposes be used to recreate the original?To be precise, you almost certainly cannot use this data to recreate anything remotely resembling the original dataset. This type of dimensionality reduction would throw away enormous volumes of data. There is no meaningful sense in which you can reconstruct the data from it.What they have done is distill some insights about people from this data. It's arguable whether they should be allowed to keep those insights, but there's no privacy risk there really.It's honestly kind of disingenuous to describe dimensionality reduction in the way that they do here. It is like reducing the resolution of a photo, but it'd best be described as reducing that resolution to say, the 20 most representative pixels. There's no real sense in which the photo still exists.\\n \\nreply\\n\\n\", \"\\nThis type being ... something like PCA? It's up to the user how much to actually reduce the dimensionality.The pixel analogy is bad, but to use it anyway -- you get to choose how many pixels you keep. You could keep literally all of them.\\n \\nreply\\n\\n\", \"\\nThat's only accurate in the sense that because an LSTM's hidden layer is much smaller in dimension than the data on which it is trained, there is less information in it.However, it concisely represents a manifold in a much larger dimensional space and effectively captures most of the information in it.It may be (and is) lossy, but don't underestimate the expressive power of a deep neural network.\\n \\nreply\\n\\n\", \"\\nYou're throwing out buzzwords instead of addressing the response.It's dimensionality reduction. You cannot recover the original object. It's like using a shadow to reconstruct the face of the person casting the shadow.Note this has nothing to do with the expressive power of a deep neural network. You are by definition trying to throw away noisy aspects of the data and generalize a lower dimensional manifold from a high dimensional space. If it's not lossy, it won't generalize.\\n \\nreply\\n\\n\", \"\\nYou're right that it's really just a form of dimensionality reduction. My point was just that it's a more powerful form of dimensionality reduction than PCA or NMDS.[Edit: and that the salient characteristics are likely contained in the model.]\\n \\nreply\\n\\n\", \"\\nPrecisely because it's more powerful, it doesn't encode the identifying information of the original data. Something like PCA likely would retain identifying characteristics (depending on how many low-rank vectors you drop).\\n \\nreply\\n\\n\", \"\\nOutside of the fact that they have identities for all of the people whose data they acquired, yes, it would be harder to reconstruct individual people with it than PCA because of the direct interpretability of its data.\\n \\nreply\\n\\n\", \"\\nThey claim to have deleted that data. If they haven't deleted the data, then of course it's still an invasion of privacy. But the ML model really has nothing to do with it.\\n \\nreply\\n\\n\", \"\\nI think the ML model has a lot to do with it in this case.\\nOne of the arguments I expect to see is that \\\"Oh, no! We removed all the data. It's gone. I mean, that was only a few hundred megabytes per person anyway, but we just calculate a few thousand numbers from it and save in our system, then delete the data. That's less data per person than is needed to show a short cute cat GIF. What harm could we possibly do with that?\\\"\\n \\nreply\\n\\n\", \"\\nIs this basically a choice between .mp3 and .ogg, png vs jpg vs gif?\\n \\nreply\\n\\n\", \"\\nIt\\u2019s kind of comparable.Regardless, I still think having the most relevant features already extracted is all they need to ask many of the questions they might want to. The point is that that\\u2019s still quite bad.\\n \\nreply\\n\\n\", \"\\nRight, I was just trying to confirm an analogy. It seems like this stuff is like a lossy codec for traits.\\n \\nreply\\n\\n\", \"\\nAlternatively, \\\"If I take a FLAC you own, make a 320kbps MP3 from it, and store it on my laptop, am I still in possession of any IP belonging to you?\\\"\\n \\nreply\\n\\n\", \"\\nI think a better analogy might be \\\"If I take a few hundred thousand MP3s and come up with a clever way to reduce each to a short representation of its genre, mood, tempo, etc that can be used to identify similar music, then throw away all the original MP3s, am I still in possession of the original music\\\". The whole point is to turn the individual data into broad, general categorisations that are easier to handle because they contain much less information. Remember, they're using this for ad targeting, and the reason they're doing it is so they can target broad groups of people rather than having to manually go through and target ads at each individual one by one.\\n \\nreply\\n\\n\", \"\\nI like that analogy. I'll make it more tenuous with - \\\"I took a copy of your album collection without your permission, ripped them to MP3, played them so much everyone is sick of them. but you've still got all the original CD's you don't even use, so no problem right?\\\"On this tangent, IP ownership for deep learning models is interesting - how to you prove (in court) someone has/hasn't copied model/stolen a training set? If you fed someone else's training/model into your system, how easy is it to prove? Will we see the equivalent of map 'trap streets' in trained CNN models?Which led me to: https://medium.com/@dtunkelang/the-end-of-intellectual-prope...\\n \\nreply\\n\\n\", \"\\nWhere do neural nets come into this? The Koscinski-Stillwell-Graepel paper talks about using the reduced-dimensionality data with logistic regression.\\n \\nreply\\n\\n\", \"\\nOnly if the \\\"true\\\" data actually lives in a lower dimensional manifold and the data acurrately can encode it with low noise. I doubt anyone can tell who you will vote for depending on which cat videos you liked, no matter how magic your regressor.\\n \\nreply\\n\\n\", \"\\nI do think that the most significant components of a personality will likely be targetable with a relatively low nuclear norm.And, for example, where someone's proclivity on the exploration/exploitation spectrum, if you will, (IE, how strongly do they respond to fear-based messaging) falls is probably quite predictable from a spectrum of likes.Cat pictures may be less informative, but not all of these people clicked exclusively on feline fuzzy photos.\\n \\nreply\\n\\n\", \"\\nI'm not an expert on personality so won't disagree (except to say that I am a little sceptical of a static personality profile actually existing and I think people who always vote a certain way would be the easiest to regress and also the most useless to target). \\nAs I said in another post, it really depends on what part of your privacy you are trying to protect. It is also a mistake to think of anything on the internet as a private forum.\\n \\nreply\\n\\n\", \"\\nEspecially when the dataset probably isn't that high in entropy. Something like PCA can drop the dimensionality by significant amounts as long as the data has enough clear signals in it.\\n \\nreply\\n\\n\", \"\\n\\\" cannot use this data to recreate anything remotely resembling the original dataset.\\\"This by itself may be mostly true perhaps - and many of the comments get into ways of playing with this dataset to make it better, I don't have experience with those methods, but,what I have not seen anyone mention, if you have this dumbed down dataset, the original is gone.. you can still combine with other data sets that are either public or previously created and likely fine tune;dumbed down set + public voter records + public arrest records + previous whatever records - sort, match, what's left over.and pretty much recreate what you needed from the original, maybe not 100%, but I would guess you could get really close.\\n \\nreply\\n\\n\", \"\\n> To be precise, you almost certainly cannot use this data to recreate anything remotely resembling the original dataset. This type of dimensionality reduction would throw away enormous volumes of data. There is no meaningful sense in which you can reconstruct the data from it.First off, I think that's wrong. The idea is after all to keep the information that will result in the smallest error compared to the original on the dimensions one cares about. Withing what the model emphasizes a reconstruction can be not only \\\"remotely resembling the original dataset\\\" but as closely resembling the original dataset as is possible with the capacity of the representation.Next, I'm really not talking only about the particular method described in the post. It's definitely possible to choose to make a light enough reduction to preserve the aspects of the information one is interested in, and to optimize for recall rather than generalization.\\nA more realistic context is going to be that some information about the affected individuals is still exposed or kept (maybe in a compact derived form), which would in many cases give excellent possibilites to restore information accurately enough that claims to have the removed the data are effectively deceptive.Even for cases where the models are in good faith created only to \\\"distill some insights\\\" I'm skeptical that they really are useless for recovering individual information. I'm by no means an expert in differential privacy but I do listen when it comes up, and a lot of what we see from that field seems to come down to being able to trade off the relation between keeping the data useful and how many pieces of additional information (or assumptions and brute force) are needed to break the integrity protections. With surprises that tend to be on the side of 'Oops. Turns out this clever trick can recover the originals easier than we thought.'> It's honestly kind of disingenuous to describe dimensionality reduction in the way that they do here. It is like reducing the resolution of a photo, but it'd best be described as reducing that resolution to say, the 20 most representative pixels. There's no real sense in which the photo still exists.In my honest opinion the original analogy does an excellent job of intuitively explaining that most of the informative aspects of the data are kept (we can still see just fine what's in the image) while irrelevant details are discarded, and that is probably what was intended.If anything comes off as disingenuous in that context it's your representation that it's like a strong reduction in the pixel domain (where it does indeed destroy a lot of the information). What can be done is much more like running the picture through a high-performance Imagenet classifier and keeping the 20 (or 2048, or whatever's needed) most informative values at a level that corresponds strongly to semantic content of the picture, and holding on the model.\\nWe could probably generate images that people would have a hard time distinguishing from the original with that.\\n \\nreply\\n\\n\", \"\\n> distill some insights about people from this datai'd argue that insight is the bit that's important, and the bit that's the privacy risk.\\n \\nreply\\n\\n\", \"\\n> It's arguable whether they should be allowed to keep those insights, but there's no privacy risk there really.So if Google has distilled someone's emails over the years into \\\"closeted homosexual with a deeply repressed leather fetish\\\", that's not an invasion of their privacy as long as they throw away the source materials?\\n \\nreply\\n\\n\", \"\\nAs long as they retain no data which could specifically identify the original person, yes. There is nothing wrong with building segmentation models as long as they aren't specific enough to identify a specific person.My concern would be, how granular is too granular? What if we added \\\"and live in zip code 12355 and is registered Green Party\\\"? This now gets eerily specific, and might be sufficient to identify an individual.\\n \\nreply\\n\\n\", \"\\nWhy would they ever discard that? Why would there be a granularity where ML suddenly stops working? Why would you even stop at one model per person, instead of one model per mood, or modes of thought at different stress points?\\n \\nreply\\n\\n\", \"\\nIf they kept information like that, then yes that would be an invasion of privacy. But that sort of information is almost certainly not encoded in an ML model trained on 50 million people's data.\\n \\nreply\\n\\n\", \"\\nLet's say I take age and income of everyone in a city and train a regression model that predicts income from age. The model has slope and intercept that \\\"encode\\\" the information from all the people.It would not be possible to make inferences about the income of any particular person from the slope and intercept, so it would be ok to share those values in, say, a journal article, even though disclosing income of a particular person would not be ok.\\n \\nreply\\n\\n\", \"\\nHow do you know what CA trained on, or what's possible? Do you have qualifications in ML?\\n \\nreply\\n\\n\", \"\\nI know what they trained on because it's been reported on. They got around 50 million people's FB profiles, and a smaller subset's (300k, I think) personality test results.I use ML models every day in my work, and understand how they function. It is true that individuals information is probabilistically encoded into the parameters of the model. However, if the model is any good, the people they trained on's information is encoded only a bit more than that of the entire population.There is sort of a privacy issue in the following sense: The models they've built have learned relationships between preferences and personalities that they wouldn't otherwise have been able to learn. But these relationships are abstract. They are not tethered to any particular, identifiable individual.A reasonable argument can be made that those learned relationships are, in a sense, stolen property. And I think arguments along those lines are interesting things that we'll have to explore as this sort of thing becomes more common. But the idea that this model invades individuals privacy just isn't really true.\\n \\nreply\\n\\n\", \"\\nBut if the resulting model doesn't contain information about individuals, how does this help targeting individuals for the campaign?Edit: is it that the model is then applied to only strictly public data about the person? If so I guess the interesting question then becomes whether the model is definitely not anything near overfitting (i.e. containing enough information to match a person's public data directly since it was trained on it (amongst other data))? (I'm not an ML developer.)Edit 2: also, going with your comparison with the \\\"20 most representative pixels\\\", it seems interesting then that 'this much' (although not exactly sure how much) information can be inferred from a public profile when just also knowing enough about the whole Facebook population. OK, so perhaps a human would be able to infer about as much, but doesn't scale, and that's why the model becomes valuable?\\n \\nreply\\n\\n\", \"\\n> But if the resulting model doesn't contain information about individuals, how does this help targeting individuals for the campaign?I don't know exactly what they were modeling, but from the published reports, it sounds like they were trying to predict big 5 personality characteristics (conscientousness, neuroticism, openness, extraversion, agreeableness) from FB profile data (e.g. likes, dislikes, bio, post content, etc.). So in that case, the model would contain weights that measure the strength of relationship between characteristics like \\\"likes punk rock music\\\" and \\\"openness\\\". That description really only literally applies to a linear model - but nonlinear models are, for these purposes, the same.\\n \\nreply\\n\\n\", \"\\nIs there a reason that people are only talking about the privacy angle?People very much don't want these models to exist. They don't want a predictive model which will guess their affiliation just by providing unrelated Activity bread crumbs.That's why I assumed this whole issue has exploded recently.Not the privacy, but the implications.\\n \\nreply\\n\\n\", \"\\n> I know what they trained on because it's been reported on.What reason do you have to think their data set consisted of only what has been reported?How do you know anything about the models they used?\\n \\nreply\\n\\n\", \"\\nSince the source data was deleted - according to current standards and policies - their hands are probably technically clean. But there may be another angle of attack.In the US, you're not allowed to benefit directly from a crime you committed. For example, if you rob a bank, you can't buy your mother a car with the money and say \\\"sorry, it's gone!\\\" when the police come knocking.With that line of reasoning and if there was a legal, privacy, or at least a TOS breach in collecting the data, the derivative machine learning models may be tainted also. Then again, it's likely impossible to prove exactly what data went into the model, so hard to establish which models might be tainted.\\n \\nreply\\n\\n\", \"\\nThat's a ridiculous response. If they managed to infer this characteristic from emails, what they would keep is a tool which, given that set of emails again, infer the same characteristics (and theoretically a similar set of emails). They would by no means be allowed to keep the kind of information you described.What is more relevant is a model which, given characteristics such as \\\"closeted homosexual with a deeply repressed leather fetish\\\", they would be able to infer other characteristics, such as support of particular political candidates, responsiveness towards targeted political or commercial ad campaigns, etc. That's what's relevant here.\\n \\nreply\\n\\n\", \"\\nAnalogy does not work here and is misleading. You cannot do much if anything with 20 most representative pixels (if there is such a thing) but you can infer highly valuable characteristics about the person. Yes, you cannot recreate the original data but what you end up is potentially much worse (sensitive/private) than the original data.\\n \\nreply\\n\\n\", \"\\nThat's not really true, and is kind of a fundamental misunderstanding of how these things work.\\n \\nreply\\n\\n\", \"\\nUnless the data is completely random it's not crazy to say that the data can be reconstructed from a reduced version.If you have a million points that largely fall on a 3-dimensional line and you project that into 2 dimensions, you can easily recover that lost dimension with losses relative to the deviation. And that loss may not even matter depending on the kinds of data and margins of error you're working in.\\n \\nreply\\n\\n\", \"\\nThis is actually a nice illustration of the central problem with this argument: the more personally identifiable a piece of information is, the less recoverable it'll be, and vice-versa. If all of the points of data are on some n-dimensional line, then obviously all of them can easily be recovered, but knowing all those things about a person doesn't actually tell you any more about them than knowing just one of those things. Conversely, if the points of data are very random then it'll only require a handful of points to uniquely identify a person and find the entry in the original data set with all their other information, but dimensionality reduction will have to throw that data away - you simply won't be able to recover that information from the model. (We actually know from the literature on de-anonymization that a lot of data falls into the second category.)\\n \\nreply\\n\\n\", \"\\nExcept that that toy example bears no resemblance to the actual situation.\\n \\nreply\\n\\n\", \"\\nHow many dimensions were they working with and how much variance and correlation was there in the features? What's the margin of error for the end product?\\n \\nreply\\n\\n\", \"\\n> \\\"will society grant me rights to have derivative forms removed or adjusted too? \\\"I am in favor of no. Imagine I build a gender classification model off public tweets, and then you later delete your twitter account and demand my model not be used because it was trained off 'your data'.I am in the camp that so long as the data isn't traceable back to you specifically, then don't put any information out there you are not OK with sticking around.\\n \\nreply\\n\\n\", \"\\nI guess the question is what are you trying to protect? The model is fundamentally lossy as it is a rank reduction method so your original data is gone (i.e. no one would be able to accuse you of liking a particular controversial post, just that you are likely to like that post). So it sort of has the differential privacy thing going on. I guess it is another question as to if such models should be built at all. I think the fidelity of the models will answer that in time, if they work really well it is scary, if they are poor models they will cease to be used. I suspect that it will be in the middle and highly sensitive to the quality of the original data and the quality of the implementation like all ml applications.\\n \\nreply\\n\\n\", \"\\nThis analogy immediately reminds me that with enough high res / low res pairs we can rebuild a high resolution image from its low res version with fairly good results. Wonder if the same could be done here.\\n \\nreply\\n\\n\", \"\\nIf I tell you I like furry porn, is there a way for me to make sure you forget that? This has lots of implications, many of them placing the \\\"blame\\\" on me for telling you this.\\n \\nreply\\n\\n\", \"\\nThat\\u2019s strangely specific....Hmmm\\n \\nreply\\n\\n\", \"\\nFinally, I was waiting for someone to talk about the model itself. It makes sense that SVD or something like it (PCA, co-occurrence, etc) would be used.But I also wonder what exactly you are going to do with the predictions. What exactly do you show to someone to make them more likely to go and vote if they are inclined to vote your way, or make them stay at home otherwise? Is there evidence that whatever you're showing actually works? Or do you try to change people's minds? What do you do?Knowing how the state of things -in this case, people's voting inclinations- is not the same as knowing what to do, ie a strategy.I don't know how effective it is, I'd like to learn more. But I smell the possibility that these CA type firms are simply selling snakeoil to desperate political activists.\\n \\nreply\\n\\n\", \"\\nOne example I can provide is of gun control topics.If you understand someone's mentality on the subject you can decide if they see:1) An ad with someone breaking into a home and the homeowner defending themselves with a firearm (sell insurance?)2) A grandfather and grandson on a hunting trip (hunting supplies?)3) Or maybe gun violence hotline with powerful images.The people seeing these ads are under the assumption that everyone else sees them, not that it's specifically targeted at their personality type. These affect if you think other people understand your issue or not.\\nThus affecting your motivation and attitude.If you see an ad that fits your mindset, you think you're on the majority side. This was powerful in classic media, it's just as powerful now.\\n \\nreply\\n\\n\", \"\\n> The people seeing these ads are under the assumption that everyone else sees them, not that it's specifically targeted at their personality type.How long will that be true? Do people make that assumption about search results?\\n \\nreply\\n\\n\", \"\\nOutside of the tech bubble, simply saying \\\"yes\\\" would be disingenuous. They're not even asking the question in the first place\\n \\nreply\\n\\n\", \"\\nI think retargeting has thoroughly blown up the idea that ads online are shown to everyone. My non-technical acquaintances are very aware of why certain products follow them around the internet in ads.\\n \\nreply\\n\\n\", \"\\nBut do people assume the same thing about, say, Google search results? Promoted posts on Reddit? The ads (or natural posts) on Snapchat or Instagram?I agree that it's pretty obvious you're being retargeted when ads for camping supplies start showing up three days after you search for them on Amazon. But the practice of \\\"personalization\\\" of results and ads is far larger and deeper, to a degree that most people never seem to think about.\\n \\nreply\\n\\n\", \"\\nI think most people do assume the same with search results. How many do you think assume that the ads on the TV they see could be different than what the neighbor is seeing when watching the same channel with the same cable company?\\nI think a lot of people assume that others see the same news and the way people act you'd think they assume that others see the same things in their newsfeed/timeline / facebook thing - and wonder how others could have a different view.Even when I explain how ads can be different, I don't think people really want to believe it, or understand it, and they certainly do not realize the power of these targeting abilities..\\n \\nreply\\n\\n\", \"\\nBefore Netflix stopped showing the number of stars next to content, I used to sort of depend on it for choosing a movie. In fact, I sort of miss it now, and spend more time sifting through content undecided. That's because I am clueless about movies. I believe that there are people who are as unsure about electoral candidates (as in, at a given day, they don't favor one candidate above others) as I am about choosing movies. When push comes to shove (my wife's irritation quotient above threshold) in terms of making a decision, an advertisement that someone saw couple of days back can definitely assist in making a choice at the split second.\\n \\nreply\\n\\n\", \"\\nIt is reasonable to assume that a marketing message written with the profile of the targeted person in mind works better than generic message.In Facebook campaigns you can use certain things, such as user's interest, to select who sees your message.I'm not an expert on Facebook analytics, but I believe you can get pretty good stats on how your campaigns are working, how much promoted posts get shared etc.This sounds like the holy grail for advertising. You get to write your message for certain profile and get quick feedback how it worked. Even if the system is not perfect, you would have an advantage compared to somebody else who is spending the same amount of money and not using similar targeting.Maybe their model also allowed them to find social influencers with many followers. Being able to targer these people and get them to share your message would be really good.The article compares this to the effectiveness of traditional voter targeting methods. I'm not sure what the parameters used on those are, but maybe all of them are not available on FB, justifying the need for something else.\\n \\nreply\\n\\n\", \"\\n> What exactly do you show to someone to make them more likely to go and vote if they are inclined to vote your way, or make them stay at home otherwise?Qualitatively: show things that get them angry.Quantitatively: test and control pop splits.\\n \\nreply\\n\\n\", \"\\nHow do you test anything? There's only one vote, you can't iterate.\\n \\nreply\\n\\n\", \"\\nMaybe with polling?\\n \\nreply\\n\\n\", \"\\nData is terrible, especially for polarizing candidates like Trump. People simply lie in public about not voting for him, afraid of backlash that they will receive.\\n \\nreply\\n\\n\", \"\\nThey were going to vote anyway; nothing can tell you otherwise?\\n \\nreply\\n\\n\", \"\\n> Quantitatively: test and control pop splits.How do you actually do this? Presidential elections come once every 4 years.\\n \\nreply\\n\\n\", \"\\nAnd there's a big question mark over whether lessons learned (ie parameters) from one election are valid for the next.What if all the sensitivities are dependent on the length of the candidates' hair? It seems the total hair length of the two candidates was a maximum at the last election. Another time you might be sampling more towards the middle.\\n \\nreply\\n\\n\", \"\\nShortly after the election, I read something saying that the actual ads were targeted soundbites at specific demographics likely to vote Democrat, run shortly before the election with the intention of suppressing voter turnout.\\n \\nreply\\n\\n\", \"\\nSo negative advertising aimed the core constituency of the opposition's voters speaking to their deep seated concerns about their candidate.I could imagine this working on Dem voters who are wavering on Hillary with leads like \\\"she thinks the TPP is the gold standard\\\" etc.\\n \\nreply\\n\\n\", \"\\n> I don't know how effective it is, I'd like to learn more. But I smell the possibility that these CA type firms are simply selling snakeoil to desperate political activists.According to the article:\\\"The accuracy he claims suggests it works about as well as established voter-targeting methods based on demographics like race, age, and gender....the digital modeling Cambridge Analytica used was hardly the virtual crystal ball a few have claimed.\\\"It's pretty clear that they were selling snakeoil. In fact, the use of CA wasn't particularly helpful to anyone [1]...hiring them was just a prerequisite for obtaining campaign contributions from the Mercer family, who had put up the money behind CA [2].[1] http://www.businessinsider.com/cambridge-analytica-facebook-...[2] https://twitter.com/kenvogel/status/975756418128187393\\n \\nreply\\n\\n\", \"\\n>I don't know how effective it is, I'd like to learn more.Here is an interesting Ted Talk which discusses an FB experiment that details how effective minor UI changes can be on voter turnout (13:40)https://www.ted.com/talks/zeynep_tufekci_we_re_building_a_dy...\\n \\nreply\\n\\n\", \"\\nDoor to door canvassers these days carry devices that tell you what topics to bring up and what topics not to bring up at a certain address, even distinguishing between individuals at an address; some are told to demand a husband let them talk to the wife, for example.\\n \\nreply\\n\\n\", \"\\nI don't know about the specific campaigns that you are referring to, but in my experience a lot of the information used in campaigns I've been involved in comes from previous canvassing sessions. Political parties in most countries are involved at many levels where there are elections. Canvassing doesn't just take place for the big elections.One year they will have been round and had a lengthy discussion with Mrs X, but Mr X slammed the door in their face another time. This was somewhat lower tech: the information was printed out and attached to a clipboard.Most of the time this information is correct. It's more interesting when it's really incorrect. That said, some of the best sessions I've been involved in were where there was no information.\\n \\nreply\\n\\n\", \"\\nExactly. This is the old-fashioned approach to campaign targetting that Cambridge Analytica was trying (and failing) to replace: just send a bunch of volunteers to talk to them about who they're voting for and why, then put that in your big database. One of the dirty not-so-secrets about CA is that according to the Trump campaign, they were abandoned completely in favour of that old-fashioned approach because they were worse. Similarly, if you've been paying attention, you might have noticed a few insider stories about how one of the Hillary Clinton campaign's big screw-ups was underestimating the importance of that data compared to modern big data tech and basically throwing a lot of it in the trash. This didn't get nearly as much coverage as the idea that Cambridge Analytica, Trump, and Facebook were conspiring to brainwash the population, probably because it was less juicy a narrative and kind of embarassing to the Clinton campaign and the DNC.\\n \\nreply\\n\\n\", \"\\nI am really puzzled by the Cambridge Analytica scandal. It's not particularly savory, but is there something happening here that it wasn't basically already known about how Facebook worked? By the protests of their own executive, the system was working as designed, and at worst Cambridge Analytica misled them about how they intended to use the data, right? There was no actual security breach here, as far as I can understand it.\\n \\nreply\\n\\n\", \"\\nThere doesn't have to be a security breach for it to be a very bad example of using data collected in one way for a completely different purpose. It violates the 'lawful basis for processing' part of privacy legislation.\\n \\nreply\\n\\n\", \"\\nWhich US legislation?\\n \\nreply\\n\\n\", \"\\nThat app collected data on many more than just US residents so more than just US legislation applies. This is one of those pesky little problems of doing stuff 'on the internet', especially when you start doing stuff that is purposefully or accidentally illegal.Besides that they apparently also used similar trickery in their consultancy for the Brexit side.https://www.theguardian.com/politics/2018/mar/26/pressure-gr...This is far from over.https://www.theguardian.com/commentisfree/2018/mar/23/plenty...\\n \\nreply\\n\\n\", \"\\nNo. I know at least one political consultancy that does a similar work in Spain, although I only know they work with data, use micro targeting and is run by a sociologist.\\n \\nreply\\n\\n\", \"\\nIt became a \\\"problem\\\" because it helped Trump win.\\n \\nreply\\n\\n\", \"\\nThis answer ignores some known details of the story. Cambridge Analytica didn't just buy targeted ads from Facebook. They used a sockpuppet to release a fake \\\"take a personality profile\\\" app, which then allowed them to gather tons of data against the Facebook terms of use.The CEO of Cambridge Analytica has also been recorded telling a (fake) potential client that they routinely blackmail people using prostitutes and who knows what else.So unless you can show that Clinton's campaign was doing the same things, your claim is a false equivalence.\\n \\nreply\\n\\n\", \"\\nThis is exactly it. At least it stops the news from droning on and on about Russia.I thought Clinton spent large amounts of money on data and the Democrats admitted the data was bad or at least that was their excuse. How much did CA pay for this data? I still find it crazy that Trump campaign spent 30% of what Hillary did and still won. The Russians used 100k$ worth of ads to sway the election. This stuff doesn't t add up.\\n \\nreply\\n\\n\", \"\\nYes, that's a good point. Russia, Cambridge Analytica... anything that allows people to feel like the Trump phenomenon is a nefarious foreign import rather than homegrown. I'm no fan of Trump, but I'm incredibly dismayed that all the Democrats have talked about since he was elected is \\\"Russian meddling.\\\"\\n \\nreply\\n\\n\", \"\\nDo you have any sources for your claim about how the Clinton campaign acquired FB data and how they used it? Was any of it acquired fraudulently and/or in violation of FB's ToS, like CA's data was?\\n \\nreply\\n\\n\", \"\\nDo you have any sources for your claim that the parent poster claimed the democrats purchased Facebook data?\\n \\nreply\\n\\n\", \"\\nSigh....the technical legality of obtaining the data is not the point of contention. Do you think that is what this is about, whether CA \\\"broke the law\\\"?\\n \\nreply\\n\\n\", \"\\nI said she spent money on data.https://www.washingtonpost.com/news/post-politics/wp/2016/11...https://www.cnn.com/2017/06/02/politics/hillary-clinton-dnc-...\\n \\nreply\\n\\n\", \"\\nIt doesn't have to add up, most people are too busy with their real lives to manually search and find reliable details (what we get from the media is not reliably unbiased or true), and then read and understand them, so they believe what they see and hear repeated over and over on the TV, radio, and newspaper: the American President is controlled by Vladimir Putin. Even most smart people don't seem to care about actual evidence.\\n \\nreply\\n\\n\", \"\\nWho said anything about a security breach? Most of the controversy has been about the company influencing elections using data scraped from people (and their friends) unaware of what the data was being used for.\\n \\nreply\\n\\n\", \"\\nThe degree to which it influenced the election is questionable. Despite all the headlines, I haven't yet seen any convincing analysis of the impact of facebook on the election (I'm not sure how one would even go about doing so). So far it seems like it's just a convenient vehicle for people that dislike the outcome of the election to express indignation.\\n \\nreply\\n\\n\", \"\\nDon't be naivehttp://www.bbc.com/news/world-43476762\\n \\nreply\\n\\n\", \"\\nCan you point to the part of that article containing evidence of to what degree they affected the election?\\n \\nreply\\n\\n\", \"\\nThe admission by the company executive.\\n \\nreply\\n\\n\", \"\\nThat's interesting. How would he know the degree to which he influenced the election? Believing any claims to somehow fact rather than plain old self-promotion seems rather naive, or am I missing something?\\n \\nreply\\n\\n\", \"\\nEven if we presume those sentiments are completely sincere and disinterested, I don't know why we should believe he is an authority on US elections whose claims can simply be accepted at face value.\\n \\nreply\\n\\n\", \"\\nI would tend to agree.It's often not hard to convince people of something they want to believe.\\n \\nreply\\n\\n\", \"\\nWhen talk about CA first emerged on HN before the election some posters found the original papers referred to. They were looking at pictures in the story and zoomed in to find the titles.I cannot find those posts for the life of me again. Not suggesting anything nefarious here, I just can't find them. Does anyone have a link to those early conversations or make copies of the papers?I made copies earlier but deleted them before I put them into my papers archive.\\n \\nreply\\n\\n\", \"\\nHere are some leads:https://news.ycombinator.com/item?id=14486365https://news.ycombinator.com/item?id=14393991https://news.ycombinator.com/item?id=14330547https://news.ycombinator.com/item?id=14284502https://news.ycombinator.com/item?id=13939814And the query: https://hn.algolia.com/?query=mercer&sort=byPopularity&prefi...\\n \\nreply\\n\\n\", \"\\nIsn't the accuracy of the predictions kind of orthogonal to the fact that they were basically lying in their attempts to change behavior?Using lies to convince someone to do something is going to be more effective than using truth, if that \\\"something\\\" is not in accordance with the truth.The comparison with netflix really breaks down there. You're not going to be able to convince me that I liked Crash, so recommendations based off of that aren't going to be very useful to me.But if you reinforce my false belief that Obama and Soros are gonna use the deep state to invoke Sharia Law on the 2nd amendment, then that might better convince me to vote for so and so.\\n \\nreply\\n\\n\", \"\\n> has revealed that his method worked much like the one Netflix uses to recommend movies.I'm not sure this is the model you want to emulate. The suggestions are terrible and continually getting worse.\\n \\nreply\\n\\n\", \"\\nIt sure seems that way. It's almost as if Netflix is giving in to pressure from content owners (which now includes themselves) to downplay or even weaken their suggestions.First, they got rid of those wonderful ranked lists that made us love Netflix in the first place, replacing them with the much more opaque cover art carousel view. Then they started mixing in lower-ranked items into the carousel. Finally, they switched from the five-star rating to the thumbs up and down buttons, which can't possibly give them as much information about your opinion.\\n \\nreply\\n\\n\", \"\\nCurrently I have zero trust in their rating system. Actually, zero is a bad number because if I see a high ranking I think I am going to dislike the suggestion. This is opposed to me previously trusting the system a lot.\\n \\nreply\\n\\n\", \"\\nI've heard rumours that the internal backlash at Netflix against 5 star ratings happened with Amy Schumer's most recent comedy special which had thousands of 1 star reviews on Netflix. Which was one of their most expensive comedy productions and whose release coincided not long before the switch over to vague thumbs up/down.Note, the comedy special was similarly panned across the press and social media as being repetitive of her previous work, extremely predictable punchlines, and when seemingly good later exposed to be highly derivative of other comedians work.But regardless of the reality/honesty of her rating it was apparently not good for Netflix's business when they let users destroy content they produced destroy it on their own web site via user generated content.So the suits (heavily swayes by their production studio and Hollywood) were able to convince the product team to hurt the UX for 90% in order to protect the popularity of the 10% of content they own.The truth may be good for consumers in almost all situations. But sadly the interests of executives dealing closely with high value B2B partners and investors tend to have a way to out-valuing the interests of the average user (not to mention far out valuing power users).So I guess we have to rely on 3rd party IMDb web extensions which inject into Netflix in order to get honest ratings.\\n \\nreply\\n\\n\", \"\\nBy \\\"heard rumours\\\" do you mean \\\"read about it on Breitbart?\\\".http://www.breitbart.com/big-hollywood/2017/03/18/netflix-sc...Notice that they're careful to say that they made the switch \\\"amid\\\" the special, not because of it. Also as far as I can tell, they have no actual data on the fact, and they're the only \\\"newspaper\\\" reporting it.\\n \\nreply\\n\\n\", \"\\nIt's a hard problem but Netflix's model represents the state of the art in machine learning for recommendations. Still scared of the singularity? :)\\n \\nreply\\n\\n\", \"\\nThere's a problem with all ML-based content recommendation, whether it's advertising, movies, whatever: the people who make the content have strong opinions about what should get shown to whom and when. If those people have negotiating leverage over your organization, your business team will compromise your recommendations to placate those people. That means that after you've build this beautiful model that minimizes whatever cost function you've chosen, there will be a bevy of business rules and content-specific score adjustments that will be overlaid. Over time, this shaggy, bad-assumption-laden system will dominate the user experience, and unless your management has a vested interest in maintaining the integrity of the recommendation system, your model will eventually be lost in a sea of human-generated noise. Not that I'm bitter about this or anything.\\n \\nreply\\n\\n\", \"\\nPersonally, I used to get substantially better results. It would suggest to me a movie I'd never seen before with a score above 90, I'd watch, and enjoy. Now I'm getting things like kids shows suggested to me. I double checked by history to make sure no one watched one on my account, but they keep popping up with high percentage. I also really like horror and get weird suggestions [1] for similar movies. After I watched Alien it suggested the Great British Baking show. I literally have 0 trust in the new ranking system. They push certain shows way too hard and TV shows I'm watching show up on page 3 of \\\"continue watching\\\".This is far from my previous stance of \\\"Huh, movie I've never seen before and in a different language? It's 95% so yeah, I'll give it a go.\\\"And before someone says anything, I do vote up and down quite frequently. But can't help but notice that my suggestions were better when I had the more nuanced star system.[1] https://i.imgur.com/MOt3XlL.png[1.1] How I'd rate these. Cars 3: not really interested. Liked the first though. Stitches: idk, doesn't look appealing. Teeth: Classic cult film but yeah... Big Mouth: I have ZERO interest in watching this, please stop suggesting. I have downvoted this! Waterboy: I like it, but far from 95%. I'll give it like an 80.\\n \\nreply\\n\\n\", \"\\nThe coverage of the Netflix ratings format switch has always been frustrating to me because what people are complaining about (you're not the only one; https://www.polygon.com/2017/4/7/15212718/netflix-rating-sys...) was predictable from the beginning.I do research in this area and it's fairly well established that when you go from something like five points to two points with ratings, you throw away tons of information. There's diminishing returns with numbers of points, but as you go lower you lose information.The \\\"ratings don't matter because what you want is implicit signals from peoples' actual behavior\\\" is also disingenous because the rating behavior is a behavior that's directly tied to the stimulus in question. Not saying that indirect behavioral correlates aren't useful, only that the rating is a very powerful, direct correlate that tends to be very specific. Going back to the topic of the thread, sure, all those Facebook likes are going to be useful in predicting how much you like a candidate, but you're sure as hell going to get a lot of information by just asking them \\\"on a scale of 1 to 5, how much do you approve of X?\\\"\\n \\nreply\\n\\n\", \"\\nSince switching from 1-5 start ratings to like or dislike it's gotten much worse.I honestly thought the 1-5 ratings were not useful since I either like or don't like movies, I am not interested in nuances. But, it's not working out as I expected.\\n \\nreply\\n\\n\", \"\\nReally? Because in terms of actual usefulness, I find youtube's suggestions to better...\\n \\nreply\\n\\n\", \"\\nSpotify is king IMO. Their 'discover weekly' playlist turns up gems every time.But that's because they dive into other user playlists that contain the same music you play. Pretty simple. I assume Youtube do somthing similar.If Netflix had playlists or a 'want to watch' feature I bet their recommendations would improve.\\n \\nreply\\n\\n\", \"\\nI tend to agree with you. The only thing I dislike about it is when I happen to open a link I'll randomly be sent or notice in an article that's something unlike what I normally enjoy or something I close right away and then for the next few days, or until I watch a bunch more of what I usually enjoy, the entire suggestion list is only things to do with that one random link.\\n \\nreply\\n\\n\", \"\\nYoutube does better than Netflix for me, but they still suggest stuff that I have already watched. Sometimes stuff I literally watched an hour ago. And I watched one Joe Rogan episode and it will not stop suggesting it to me but fails to notify me on things that I watch every episode on; like 3Blue1Brown, Robert Miles, or Rare Earth.\\n \\nreply\\n\\n\", \"\\nThere\\u2019s a bug confound with the catalog: Netflix\\u2019s streaming catalog is much smaller than their DVD selection was and it changes as licensing deals expire. No model can make up for that entirely.\\n \\nreply\\n\\n\", \"\\nI use the DVD one and I do find the suggestions are mostly pretty reasonable ones.\\n \\nreply\\n\\n\", \"\\nHow hard can it be? Any film that I didn't watch all the way to end ought to be a strong signal that I didn't enjoy it. Not that I want to watch other movies just like it.\\n \\nreply\\n\\n\", \"\\nOTOH, that you even chose to watch a movie is a sign that you are interested in the genre. Just because I abruptly stopped watching \\u201cStar Trek IX\\u201d, doesn\\u2019t mean that I have abandoned the sci-fi genre or even Star Trek\\n \\nreply\\n\\n\", \"\\nI have a friend who shares my account since I stay with him when I am in the UK. It isn't worth the effort to have two profiles.So, he likes horror movies and will watch any horror regardless of any signal that it's going to be poor. You can look at the viewing history and see he rarely goes beyond 5 minutes of watching any of them.I watch Netflix regularly throughout the year, he watches in phases that last a week or two and then nothing at all for months at a time (for reasons that should seem obvious by now).As a non-horror movie aficionado, I can say with certainty that only 2% of horror movies are ever worth watching and only 50% of these are any good. As a consequence, my personal viewing history includes almost no movies in this genre.My favoured genre is drama and I normally watch all the way through.You should be able to guess by now that I should rarely be recommended horror movies but, alas, Netflix thinks otherwise.Btw. I also rate movies I watch - my friend doesn't.\\n \\nreply\\n\\n\", \"\\nI\\u2019m confused \\u2014 you honestly expect Netflix\\u2019s model to have figured out that your profile is actually 2 people based on what you believe to be regular and obvious cyclical patterns? I would have to imagine that this is a relative edge case for Netflix, and there is no obvious answer for what to do with someone who mostly likes drama but for some reason, goes on a periodic horror binge.I assume that Netflix\\u2019s model has the premise that profiles aren\\u2019t in fact, very easy to create. I have separate profiles for my parents, as well as a test profile to see what happens when a user only seems to like the \\u201cHuman Centipede\\u201d trilogy.\\n \\nreply\\n\\n\", \"\\n2 people but only one consistent regular user. My recommendations are heavily biased towards the occasional user - who is also giving very strong negative feedback?It doesn't bother me since I know what I like. There are not that many good films that I'm not going to find them anyway.Someone, or a group of people, are being paid for nothing though. I don't know anyone who subscribes to Netflix because of their recommendation algorithm.\\n \\nreply\\n\\n\", \"\\n> as a test profile to see what happens when a user only seems to like the \\u201cHuman Centipede\\u201d trilogy.And? You can't just leave us hanging on that.\\n \\nreply\\n\\n\", \"\\nAfter creating the account and immediately giving a thumbs up to the trilogy, I've only occasionally logged in and feigned \\\"interest\\\" by clicking on the movies as if I'm about to watch them, or that I enjoy re-reading the synopses. The recommendations are all normal and not noticeably feces-related. But maybe I haven't yet met the threshold for the model to consider me a particularly engaged (or real) user.\\n \\nreply\\n\\n\", \"\\nI feel like watching 20 minutes of a movie and then downvoting should be a strong indicator. This does not appear to be the case. In fact, it seems to be more likely to appear as the first suggestion when I do this.\\n \\nreply\\n\\n\", \"\\nYes, it's a hard problem, but no, Netflix recommendations are nowhere near state of the art. They were at the time of Netflix prize, but since then the field has advanced a lot, while Netflix only dumbed down their recommendation models.\\n \\nreply\\n\\n\", \"\\nAre you saying all of Netflix suggestions are getting worse? How are you measuring that?Or are you saying your netflix suggestions are getting worse?Mine are pretty good and have held steady for a while, at least in terms of my own preferences, though I also am probably using it a little less as I've got Prime and Hulu now as well, so there's probably less times I'm randomly searching through Netflix and finding nothing.\\n \\nreply\\n\\n\", \"\\nI can only tell you what my personal experience is. I laid it out more in another comment under rjurney's reply.I'll also add that I am more frequently searching for 15 minutes then switching to another service. I used to find a movie to watch in 5 minutes. I am a big movie person too, and will watch most things. But I am also more aware that if I watch one show that is just \\\"meh\\\", then I am going to be bombarded with shows of similar quality for the next few weeks.Also, there is a fairly obvious pattern that shows up from the movies in \\\"my list\\\". Those do not seem to be weighted more heavily.\\n \\nreply\\n\\n\", \"\\nVery interesting article but I wish it went one step further. Why does it matter that cambridge analytica knew a user's big five or that they were an old, uneducated republican? How was this (inferred) data used?I assume they wrote/created different ads for different sets of users... but how many segments did they have? Did their graphic designer build 500 different ads, or was text/images dynamically inserted based on these variables? How did they figure out which message would resonate with each segment? How did they test something like this, with so many potential variables? Was this knowledge used only on facebook, or across all digital channels? Was it implemented in non-digital channels as well?I'd kill to have access to their campaign set ups.\\n \\nreply\\n\\n\", \"\\nI wonder if you learned something since your blatant failure last week where you gave all that trust into facebooks hands, and praised advertisements, one day before they've blown up.\\n \\nreply\\n\\n\", \"\\nBesides the data, its interesting how the actual targeting was performed.Does facebook provide an option to show a particular given ad to a particular given user? Or is it possible to select a group of people with a given set of likes? How fine-grained is facebook's audience selection mechanism for ads?Or was the targeting performed by creating fake groups, befriending people?\\n \\nreply\\n\\n\", \"\\nAm I understanding this correctly? Facebook user data (likes/profile info) was scraped to produce low-dimension feature vectors for users (similar to word2vec). These feature vectors were then run through some ML model to predict...what exactly? Targetability for effective political ads?\\n \\nreply\\n\\n\", \"\\nThey used it to predict political affiliation of people that don't explicitly state a party preference.The two parties already have a list of registered party members (and they can see who on Facebook explicitly states their party preference), for those members the main goal is higher turnout (they are the training data). The other voters they're interested in are unregistered (e.g. independent) voters that are likely to be on their side ideologically.The core idea is very simple, they believe that if someone says they're independent, but their preferences/features (age, gender, location, likes, posts) are predicting moderate or high likelihood of $PARTY affiliation, then showing this person political ads may move them from 'maybe vote for $PARTY' category and get them in the 'definitely vote for $PARTY' category.If you have continuous access to new Facebook data as you're serving ads, you can verify your ads are working on an individual basis by checking the predicted 'score' for $PARTY affiliation predicted by your model before and after an ad (I want to stress that this can be done on an _individual basis_). The likely sequence of events is that they did AB testing on different kinds of ads and found that fake inflammatory ads were most effective at achieving this goal in a very measurable way ($PARTY score), the resulting media/political atmosphere is collateral damage (hopefully unintended).Source: I am a data scientist / machine learning scientist and this is how I would do it and how it seems to me others. I don't work on political data but I have worked on personalized recommendations which are similar.\\n \\nreply\\n\\n\", \"\\nDid the app really grant them continuous access to user info? I thought it was a one-off thing - they get your data at the time of use (and your friends') and that's it.Plus the approach you outlined would require the user to like/dislike things based on the add they saw, so CA can observe a change in the predicted affiliation (they didn't have access to posts as far as I know). I don't think it would have that effect (even if the add influences you, I doubt that it would make you go unlike Obama's page for example). Not to mention that by any likelihood you shouldn't be able to verify that a particular add was shown to a given individual.I suspect it was a simpler use case - they would group users into segments, and then craft different add strategies for each one (maybe based on other research or just expert opinion).\\n \\nreply\\n\\n\", \"\\n> I suspect it was a simpler use case - they would group users into segments, and then craft different add strategies for each one (maybe based on other research or just expert opinion).It is in this last process that it is individual-based. It is in this last process that AB tests are done individually as a function of the specific strategy applied to him/her\\n \\nreply\\n\\n\", \"\\nIt seems like the purpose was narrowly tailoring messages, which is something political campaigns are really keen to do now (Obama's campaign was kind of a trailblazer here, right?).\\n \\nreply\\n\\n\", \"\\n> Obama's campaign was kind of a trailblazer here, right?It's a pretty big gap between using and abusing social media and as far as I know Obama's campaign did not 'narrowly tailor messages'. They did target broad groups using generic messages and they did quite effectively use social media presence to build support.But they did not - as far as I know, so please correct me if I'm wrong - go so far as to single out individuals or really small groups with the express intent of flipping their votes or targeting them with disinformation in order to try to stop them from voting.And Cambridge Analytica seems to have been doing just that if the currently available information is to be believed.\\n \\nreply\\n\\n\", \"\\nhttps://devumi.com/2017/12/social-media-case-study-how-barac...> The former president also hired Facebook co-founder Chris Hughes to help in developing his social media strategy. Obama furthered the use of Facebook for his 2012 re-election bid, utilizing it to encourage young people to cast their votes. His team developed a Facebook app that looked into supporters\\u2019 friends list to find younger voters. The team then asked supporters to share online content with these voters. More than 600,000 supporters responded to the call, sending content to over 5 million contacts.> During his presidency, Obama continued to use Facebook to reach out to the public. In 2016, he became the first president to go live on the site, just before his final State of the Union Address.\\n \\nreply\\n\\n\", \"\\nYes, that pretty much confirms what I wrote above. Your point being?Please read the article and compare what we know about Cambridge Analytica vs what the Obama campaign did, it is comparing snipers with someone setting off fireworks.\\n \\nreply\\n\\n\", \"\\nLook, I don't think anyone can realistically doubt that Obama's campaign was the first to effectively slice-and-dice the electorate and use social media to target them. You're arguing against a much more expansive claim than I'm making.\\n \\nreply\\n\\n\", \"\\nYou used the word 'narrowly', and in the context of a post about Cambridge Analytica that word has a pretty specific meaning.\\n \\nreply\\n\\n\", \"\\nNo it was Hillary's campaign in 2008 that was big on \\\"microtargeting\\\".ie, moving beyond \\\"soccer moms\\\" or \\\"defense dads\\\" but \\\"soccer moms with one kid and expensive tastes\\\".\\n \\nreply\\n\\n\", \"\\n[ Deleted. Nothing I say on HN ever matters. Move along ]\\n \\nreply\\n\\n\", \"\\n\\\"What sort of lies\\\" is pretty hand-wavy when it comes to labeling training data for a model. Are you summarizing from a source? I'm interested in the technical details of what happened.\\n \\nreply\\n\\n\", \"\\nVery few \\\"undecided\\\" voters truly are; elections are won and lost by getting your supporters to go to the polls. So if you wanted to use scurrilous, fake news to help your candidate, you'd be better off sending stories that will get your supporters really fired up and eager to vote and get their friends to vote, not trying to persuade the practically nonexistent undecided demographic.\\n \\nreply\\n\\n\", \"\\nAre you able to specify the sources based on which you are supporting these claims, namely that elections are not not decided by \\\"undecided\\\" voters but rather by pushing your supporters to the election polls?\\n \\nreply\\n\\n\", \"\\nI thought this was common enough knowledge not to want citations, but I think you will find these satisfactory.http://www.stat.columbia.edu/~gelman/research/unpublished/sw...https://www.politico.com/magazine/story/2014/01/independent-...https://www.thenation.com/article/what-everyone-gets-wrong-a...The last piece has a short summary of the salient point:> In fact, according to an analysis of voting patterns conducted by Michigan State University political scientist Corwin Smidt, those who identify as independents today are more stable in their support for one or the other party than were \\u201cstrong partisans\\u201d back in the 1970s. According to Dan Hopkins, a professor of government at the University of Pennsylvania, \\u201cindependents who lean toward the Democrats are less likely to back GOP candidates than are weak Democrats.\\u201d> While most independents vote like partisans, on average they\\u2019re slightly more likely to just stay home in November. \\u201cTypically independents are less active and less engaged in politics than are strong partisans,\\u201d says Smidt.> [...]> The conventional wisdom holds that the parties need independents to win general elections, but the reality is that they\\u2019re increasingly devoting their resources to getting their own voters\\u2014including their \\u201ccloset partisans\\u201d\\u2014out to the polls rather than trying to sway the dwindling number of genuine swing voters. \\u201cWe\\u2019ve seen a huge increase in technology and the ability to turn out the vote,\\u201d says Smidt. \\u201cSo in terms of a cost-benefit analysis, the parties and candidates see that it\\u2019s much easier to turn out people who agree with them than it is to change someone\\u2019s mind. And then there\\u2019s also the question of how many of us are even open to changing our minds.\\u201d\\n \\nreply\\n\\n\", \"\\nThis is the standard way of analysing this kind of data, and I'd be very surprised if the Obama campaign didn't use the same or very similar methods with the facebook data they obtained. The only difference is that Cambridge Analytica managed to obtain much more data over a wider demographic.\\n \\nreply\\n\\n\", \"\\narchived for future reference http://archive.is/dMIcN\\n \\nreply\\n\\n\", \"\\nI'm excited for GDPR.. The hard part is going to be getting the truth out of these companies about the actual extent of the data they hold on us\\n \\nreply\\n\\n\", \"\\nThe EU should do a unionwide ad campaign pointing out the fact that the much-maligned eurocrats have actually been working for years to fix this very difficult problem that's only now becoming apparent to the wider public. The timing couldn't be better as GDPR goes into effect just after the Facebook/Cambridge scandal.Unfortunately all EU institutions are terrible at marketing. If they did an ad campaign, it would probably be a TV commercial showing Jean-Claude Juncker giving a speech with subtitles in 15 languages.\\n \\nreply\\n\\n\", \"\\nThat \\\"hard part\\\" is going to be more than just hard; I think the word you're looking for there is \\\"impossible\\\". They don't have any way of knowing what data Google, Facebook, Amazon, or any other company has. As this article does a credible job of explaining, you have to understand a fair amount about statistics (PCA) and machine learning to even know if if you were looking at it, and they won't know where to look for it. They have no enforcement mechanism in mind, and they passed a law anyway, which effectively means \\\"you can't admit to having this\\\", which will mean that the more willing a company is to lie, the bigger an advantage they will have over their competitors.\\n \\nreply\\n\\n\", \"\\nSure. But at least when they are actually whistleblowed (by people who \\\"just\\\" work for the company) there is a law which can be used to call to a court the theople who are legally accountable for the company\\n \\nreply\\n\\n\"]}\n"}],"_postman_id":"db8d945e-1ac3-4aab-8767-7b22fead48d9"},{"name":"/comments","id":"be4fd8a2-2cfb-46b5-8e50-c40936603ae5","request":{"method":"POST","header":[{"key":"Content-Type","value":"application/octet-stream","description":"

The HTTP header should be specified as 'application/octet-stream'.

\n"}],"body":{"mode":"file","file":{}},"url":"https://www.summarizebot.com/api/comments?apiKey=YourAPIKey&filename=1.html","description":"

Extract comments from binary data. POST body should include file content in binary form. The HTTP header should be specified as 'application/octet-stream'.

\n","urlObject":{"protocol":"https","path":["api","comments"],"host":["www","summarizebot","com"],"query":[{"description":{"content":"

To use the API you will need an API key. Please, register at http://www.summarizebot.com/summarization_business.html to get your personal API key.

\n","type":"text/plain"},"key":"apiKey","value":"YourAPIKey"},{"description":{"content":"

Name of the file, e.g. filename=1.txt.

\n","type":"text/plain"},"key":"filename","value":"1.html"}],"variable":[]}},"response":[{"id":"ef0a6575-42ec-4dc1-b818-cae23c973959","name":"Comments POST result","originalRequest":{"method":"POST","header":[{"key":"Content-Type","value":"application/octet-stream","description":"The HTTP header should be specified as 'application/octet-stream'."}],"body":{"mode":"file","file":{}},"url":{"raw":"https://www.summarizebot.com/api/comments?apiKey=YourAPIKey&filename=1.html","protocol":"https","host":["www","summarizebot","com"],"path":["api","comments"],"query":[{"key":"apiKey","value":"YourAPIKey","description":"To use the API you will need an API key. Please, register at http://www.summarizebot.com/summarization_business.html to get your personal API key."},{"key":"filename","value":"1.html","description":"Name of the file, e.g. filename=1.txt."}]},"description":"Extract comments from binary data. POST body should include file content in binary form. The HTTP header should be specified as 'application/octet-stream'."},"status":"OK","code":200,"_postman_previewlanguage":"json","header":[{"name":"connection","key":"connection","value":"Keep-Alive","description":"Options that are desired for the connection"},{"name":"content-length","key":"content-length","value":"19","description":"The length of the response body in octets (8-bit bytes)"},{"name":"content-type","key":"content-type","value":"application/json","description":"The mime type of this content"},{"name":"date","key":"date","value":"Thu, 29 Mar 2018 21:22:09 GMT","description":"The date and time that the message was sent"},{"name":"keep-alive","key":"keep-alive","value":"timeout=5, max=100","description":"Custom header"},{"name":"server","key":"server","value":"Apache/2.4.7 (Ubuntu)","description":"A name for the server"}],"cookie":[],"responseTime":null,"body":"{\r\n \"comments\": [\r\n \"\\nSpoiler warning. Article punchline ahead.\\\"The whole point of a dimension reduction model is to mathematically represent the data in simpler form. It’s as if Cambridge Analytica took a very high-resolution photograph, resized it to be smaller, and then deleted the original. The photo still exists — and as long as Cambridge Analytica’s models exist, the data effectively does too.\\\"That's an eloquent piece of explanation of a very important point.\\nAnd apropos the discussion about privacy legislation, it's also going to be a very interesting point. Will the Cambridge Analyticas of the world be able to claim they have held on to no personal data, when strictly speaking the raw data has indeed been deleted after being used to create a derivative work that can for all important purposes be used to recreate the original?\\nAssuming I find out I'm being profiled and demand to have my data removed, will society grant me rights to have derivative forms removed or adjusted too?\\nI'm somewhat pessimistic that legal hairsplitting about matters like these will make enforcement very difficult.\\n \\nreply\\n\\n\",\r\n \"\\n> when strictly speaking the raw data has indeed been deleted after being used to create a derivative work that can for all important purposes be used to recreate the original?To be precise, you almost certainly cannot use this data to recreate anything remotely resembling the original dataset. This type of dimensionality reduction would throw away enormous volumes of data. There is no meaningful sense in which you can reconstruct the data from it.What they have done is distill some insights about people from this data. It's arguable whether they should be allowed to keep those insights, but there's no privacy risk there really.It's honestly kind of disingenuous to describe dimensionality reduction in the way that they do here. It is like reducing the resolution of a photo, but it'd best be described as reducing that resolution to say, the 20 most representative pixels. There's no real sense in which the photo still exists.\\n \\nreply\\n\\n\",\r\n \"\\nThis type being ... something like PCA? It's up to the user how much to actually reduce the dimensionality.The pixel analogy is bad, but to use it anyway -- you get to choose how many pixels you keep. You could keep literally all of them.\\n \\nreply\\n\\n\",\r\n \"\\nThat's only accurate in the sense that because an LSTM's hidden layer is much smaller in dimension than the data on which it is trained, there is less information in it.However, it concisely represents a manifold in a much larger dimensional space and effectively captures most of the information in it.It may be (and is) lossy, but don't underestimate the expressive power of a deep neural network.\\n \\nreply\\n\\n\",\r\n \"\\nYou're throwing out buzzwords instead of addressing the response.It's dimensionality reduction. You cannot recover the original object. It's like using a shadow to reconstruct the face of the person casting the shadow.Note this has nothing to do with the expressive power of a deep neural network. You are by definition trying to throw away noisy aspects of the data and generalize a lower dimensional manifold from a high dimensional space. If it's not lossy, it won't generalize.\\n \\nreply\\n\\n\",\r\n \"\\nYou're right that it's really just a form of dimensionality reduction. My point was just that it's a more powerful form of dimensionality reduction than PCA or NMDS.[Edit: and that the salient characteristics are likely contained in the model.]\\n \\nreply\\n\\n\",\r\n \"\\nPrecisely because it's more powerful, it doesn't encode the identifying information of the original data. Something like PCA likely would retain identifying characteristics (depending on how many low-rank vectors you drop).\\n \\nreply\\n\\n\",\r\n \"\\nOutside of the fact that they have identities for all of the people whose data they acquired, yes, it would be harder to reconstruct individual people with it than PCA because of the direct interpretability of its data.\\n \\nreply\\n\\n\",\r\n \"\\nThey claim to have deleted that data. If they haven't deleted the data, then of course it's still an invasion of privacy. But the ML model really has nothing to do with it.\\n \\nreply\\n\\n\",\r\n \"\\nI think the ML model has a lot to do with it in this case.\\nOne of the arguments I expect to see is that \\\"Oh, no! We removed all the data. It's gone. I mean, that was only a few hundred megabytes per person anyway, but we just calculate a few thousand numbers from it and save in our system, then delete the data. That's less data per person than is needed to show a short cute cat GIF. What harm could we possibly do with that?\\\"\\n \\nreply\\n\\n\",\r\n \"\\nIs this basically a choice between .mp3 and .ogg, png vs jpg vs gif?\\n \\nreply\\n\\n\",\r\n \"\\nIt’s kind of comparable.Regardless, I still think having the most relevant features already extracted is all they need to ask many of the questions they might want to. The point is that that’s still quite bad.\\n \\nreply\\n\\n\",\r\n \"\\nRight, I was just trying to confirm an analogy. It seems like this stuff is like a lossy codec for traits.\\n \\nreply\\n\\n\",\r\n \"\\nAlternatively, \\\"If I take a FLAC you own, make a 320kbps MP3 from it, and store it on my laptop, am I still in possession of any IP belonging to you?\\\"\\n \\nreply\\n\\n\",\r\n \"\\nI think a better analogy might be \\\"If I take a few hundred thousand MP3s and come up with a clever way to reduce each to a short representation of its genre, mood, tempo, etc that can be used to identify similar music, then throw away all the original MP3s, am I still in possession of the original music\\\". The whole point is to turn the individual data into broad, general categorisations that are easier to handle because they contain much less information. Remember, they're using this for ad targeting, and the reason they're doing it is so they can target broad groups of people rather than having to manually go through and target ads at each individual one by one.\\n \\nreply\\n\\n\",\r\n \"\\nI like that analogy. I'll make it more tenuous with - \\\"I took a copy of your album collection without your permission, ripped them to MP3, played them so much everyone is sick of them. but you've still got all the original CD's you don't even use, so no problem right?\\\"On this tangent, IP ownership for deep learning models is interesting - how to you prove (in court) someone has/hasn't copied model/stolen a training set? If you fed someone else's training/model into your system, how easy is it to prove? Will we see the equivalent of map 'trap streets' in trained CNN models?Which led me to: https://medium.com/@dtunkelang/the-end-of-intellectual-prope...\\n \\nreply\\n\\n\",\r\n \"\\nWhere do neural nets come into this? The Koscinski-Stillwell-Graepel paper talks about using the reduced-dimensionality data with logistic regression.\\n \\nreply\\n\\n\",\r\n \"\\nOnly if the \\\"true\\\" data actually lives in a lower dimensional manifold and the data acurrately can encode it with low noise. I doubt anyone can tell who you will vote for depending on which cat videos you liked, no matter how magic your regressor.\\n \\nreply\\n\\n\",\r\n \"\\nI do think that the most significant components of a personality will likely be targetable with a relatively low nuclear norm.And, for example, where someone's proclivity on the exploration/exploitation spectrum, if you will, (IE, how strongly do they respond to fear-based messaging) falls is probably quite predictable from a spectrum of likes.Cat pictures may be less informative, but not all of these people clicked exclusively on feline fuzzy photos.\\n \\nreply\\n\\n\",\r\n \"\\nI'm not an expert on personality so won't disagree (except to say that I am a little sceptical of a static personality profile actually existing and I think people who always vote a certain way would be the easiest to regress and also the most useless to target). \\nAs I said in another post, it really depends on what part of your privacy you are trying to protect. It is also a mistake to think of anything on the internet as a private forum.\\n \\nreply\\n\\n\",\r\n \"\\nEspecially when the dataset probably isn't that high in entropy. Something like PCA can drop the dimensionality by significant amounts as long as the data has enough clear signals in it.\\n \\nreply\\n\\n\",\r\n \"\\n\\\" cannot use this data to recreate anything remotely resembling the original dataset.\\\"This by itself may be mostly true perhaps - and many of the comments get into ways of playing with this dataset to make it better, I don't have experience with those methods, but,what I have not seen anyone mention, if you have this dumbed down dataset, the original is gone.. you can still combine with other data sets that are either public or previously created and likely fine tune;dumbed down set + public voter records + public arrest records + previous whatever records - sort, match, what's left over.and pretty much recreate what you needed from the original, maybe not 100%, but I would guess you could get really close.\\n \\nreply\\n\\n\",\r\n \"\\n> To be precise, you almost certainly cannot use this data to recreate anything remotely resembling the original dataset. This type of dimensionality reduction would throw away enormous volumes of data. There is no meaningful sense in which you can reconstruct the data from it.First off, I think that's wrong. The idea is after all to keep the information that will result in the smallest error compared to the original on the dimensions one cares about. Withing what the model emphasizes a reconstruction can be not only \\\"remotely resembling the original dataset\\\" but as closely resembling the original dataset as is possible with the capacity of the representation.Next, I'm really not talking only about the particular method described in the post. It's definitely possible to choose to make a light enough reduction to preserve the aspects of the information one is interested in, and to optimize for recall rather than generalization.\\nA more realistic context is going to be that some information about the affected individuals is still exposed or kept (maybe in a compact derived form), which would in many cases give excellent possibilites to restore information accurately enough that claims to have the removed the data are effectively deceptive.Even for cases where the models are in good faith created only to \\\"distill some insights\\\" I'm skeptical that they really are useless for recovering individual information. I'm by no means an expert in differential privacy but I do listen when it comes up, and a lot of what we see from that field seems to come down to being able to trade off the relation between keeping the data useful and how many pieces of additional information (or assumptions and brute force) are needed to break the integrity protections. With surprises that tend to be on the side of 'Oops. Turns out this clever trick can recover the originals easier than we thought.'> It's honestly kind of disingenuous to describe dimensionality reduction in the way that they do here. It is like reducing the resolution of a photo, but it'd best be described as reducing that resolution to say, the 20 most representative pixels. There's no real sense in which the photo still exists.In my honest opinion the original analogy does an excellent job of intuitively explaining that most of the informative aspects of the data are kept (we can still see just fine what's in the image) while irrelevant details are discarded, and that is probably what was intended.If anything comes off as disingenuous in that context it's your representation that it's like a strong reduction in the pixel domain (where it does indeed destroy a lot of the information). What can be done is much more like running the picture through a high-performance Imagenet classifier and keeping the 20 (or 2048, or whatever's needed) most informative values at a level that corresponds strongly to semantic content of the picture, and holding on the model.\\nWe could probably generate images that people would have a hard time distinguishing from the original with that.\\n \\nreply\\n\\n\",\r\n \"\\n> distill some insights about people from this datai'd argue that insight is the bit that's important, and the bit that's the privacy risk.\\n \\nreply\\n\\n\",\r\n \"\\n> It's arguable whether they should be allowed to keep those insights, but there's no privacy risk there really.So if Google has distilled someone's emails over the years into \\\"closeted homosexual with a deeply repressed leather fetish\\\", that's not an invasion of their privacy as long as they throw away the source materials?\\n \\nreply\\n\\n\",\r\n \"\\nAs long as they retain no data which could specifically identify the original person, yes. There is nothing wrong with building segmentation models as long as they aren't specific enough to identify a specific person.My concern would be, how granular is too granular? What if we added \\\"and live in zip code 12355 and is registered Green Party\\\"? This now gets eerily specific, and might be sufficient to identify an individual.\\n \\nreply\\n\\n\",\r\n \"\\nWhy would they ever discard that? Why would there be a granularity where ML suddenly stops working? Why would you even stop at one model per person, instead of one model per mood, or modes of thought at different stress points?\\n \\nreply\\n\\n\",\r\n \"\\nIf they kept information like that, then yes that would be an invasion of privacy. But that sort of information is almost certainly not encoded in an ML model trained on 50 million people's data.\\n \\nreply\\n\\n\",\r\n \"\\nLet's say I take age and income of everyone in a city and train a regression model that predicts income from age. The model has slope and intercept that \\\"encode\\\" the information from all the people.It would not be possible to make inferences about the income of any particular person from the slope and intercept, so it would be ok to share those values in, say, a journal article, even though disclosing income of a particular person would not be ok.\\n \\nreply\\n\\n\",\r\n \"\\nHow do you know what CA trained on, or what's possible? Do you have qualifications in ML?\\n \\nreply\\n\\n\",\r\n \"\\nI know what they trained on because it's been reported on. They got around 50 million people's FB profiles, and a smaller subset's (300k, I think) personality test results.I use ML models every day in my work, and understand how they function. It is true that individuals information is probabilistically encoded into the parameters of the model. However, if the model is any good, the people they trained on's information is encoded only a bit more than that of the entire population.There is sort of a privacy issue in the following sense: The models they've built have learned relationships between preferences and personalities that they wouldn't otherwise have been able to learn. But these relationships are abstract. They are not tethered to any particular, identifiable individual.A reasonable argument can be made that those learned relationships are, in a sense, stolen property. And I think arguments along those lines are interesting things that we'll have to explore as this sort of thing becomes more common. But the idea that this model invades individuals privacy just isn't really true.\\n \\nreply\\n\\n\",\r\n \"\\nBut if the resulting model doesn't contain information about individuals, how does this help targeting individuals for the campaign?Edit: is it that the model is then applied to only strictly public data about the person? If so I guess the interesting question then becomes whether the model is definitely not anything near overfitting (i.e. containing enough information to match a person's public data directly since it was trained on it (amongst other data))? (I'm not an ML developer.)Edit 2: also, going with your comparison with the \\\"20 most representative pixels\\\", it seems interesting then that 'this much' (although not exactly sure how much) information can be inferred from a public profile when just also knowing enough about the whole Facebook population. OK, so perhaps a human would be able to infer about as much, but doesn't scale, and that's why the model becomes valuable?\\n \\nreply\\n\\n\",\r\n \"\\n> But if the resulting model doesn't contain information about individuals, how does this help targeting individuals for the campaign?I don't know exactly what they were modeling, but from the published reports, it sounds like they were trying to predict big 5 personality characteristics (conscientousness, neuroticism, openness, extraversion, agreeableness) from FB profile data (e.g. likes, dislikes, bio, post content, etc.). So in that case, the model would contain weights that measure the strength of relationship between characteristics like \\\"likes punk rock music\\\" and \\\"openness\\\". That description really only literally applies to a linear model - but nonlinear models are, for these purposes, the same.\\n \\nreply\\n\\n\",\r\n \"\\nIs there a reason that people are only talking about the privacy angle?People very much don't want these models to exist. They don't want a predictive model which will guess their affiliation just by providing unrelated Activity bread crumbs.That's why I assumed this whole issue has exploded recently.Not the privacy, but the implications.\\n \\nreply\\n\\n\",\r\n \"\\n> I know what they trained on because it's been reported on.What reason do you have to think their data set consisted of only what has been reported?How do you know anything about the models they used?\\n \\nreply\\n\\n\",\r\n \"\\nSince the source data was deleted - according to current standards and policies - their hands are probably technically clean. But there may be another angle of attack.In the US, you're not allowed to benefit directly from a crime you committed. For example, if you rob a bank, you can't buy your mother a car with the money and say \\\"sorry, it's gone!\\\" when the police come knocking.With that line of reasoning and if there was a legal, privacy, or at least a TOS breach in collecting the data, the derivative machine learning models may be tainted also. Then again, it's likely impossible to prove exactly what data went into the model, so hard to establish which models might be tainted.\\n \\nreply\\n\\n\",\r\n \"\\nThat's a ridiculous response. If they managed to infer this characteristic from emails, what they would keep is a tool which, given that set of emails again, infer the same characteristics (and theoretically a similar set of emails). They would by no means be allowed to keep the kind of information you described.What is more relevant is a model which, given characteristics such as \\\"closeted homosexual with a deeply repressed leather fetish\\\", they would be able to infer other characteristics, such as support of particular political candidates, responsiveness towards targeted political or commercial ad campaigns, etc. That's what's relevant here.\\n \\nreply\\n\\n\",\r\n \"\\nAnalogy does not work here and is misleading. You cannot do much if anything with 20 most representative pixels (if there is such a thing) but you can infer highly valuable characteristics about the person. Yes, you cannot recreate the original data but what you end up is potentially much worse (sensitive/private) than the original data.\\n \\nreply\\n\\n\",\r\n \"\\nThat's not really true, and is kind of a fundamental misunderstanding of how these things work.\\n \\nreply\\n\\n\",\r\n \"\\nUnless the data is completely random it's not crazy to say that the data can be reconstructed from a reduced version.If you have a million points that largely fall on a 3-dimensional line and you project that into 2 dimensions, you can easily recover that lost dimension with losses relative to the deviation. And that loss may not even matter depending on the kinds of data and margins of error you're working in.\\n \\nreply\\n\\n\",\r\n \"\\nThis is actually a nice illustration of the central problem with this argument: the more personally identifiable a piece of information is, the less recoverable it'll be, and vice-versa. If all of the points of data are on some n-dimensional line, then obviously all of them can easily be recovered, but knowing all those things about a person doesn't actually tell you any more about them than knowing just one of those things. Conversely, if the points of data are very random then it'll only require a handful of points to uniquely identify a person and find the entry in the original data set with all their other information, but dimensionality reduction will have to throw that data away - you simply won't be able to recover that information from the model. (We actually know from the literature on de-anonymization that a lot of data falls into the second category.)\\n \\nreply\\n\\n\",\r\n \"\\nExcept that that toy example bears no resemblance to the actual situation.\\n \\nreply\\n\\n\",\r\n \"\\nHow many dimensions were they working with and how much variance and correlation was there in the features? What's the margin of error for the end product?\\n \\nreply\\n\\n\",\r\n \"\\n> \\\"will society grant me rights to have derivative forms removed or adjusted too? \\\"I am in favor of no. Imagine I build a gender classification model off public tweets, and then you later delete your twitter account and demand my model not be used because it was trained off 'your data'.I am in the camp that so long as the data isn't traceable back to you specifically, then don't put any information out there you are not OK with sticking around.\\n \\nreply\\n\\n\",\r\n \"\\nI guess the question is what are you trying to protect? The model is fundamentally lossy as it is a rank reduction method so your original data is gone (i.e. no one would be able to accuse you of liking a particular controversial post, just that you are likely to like that post). So it sort of has the differential privacy thing going on. I guess it is another question as to if such models should be built at all. I think the fidelity of the models will answer that in time, if they work really well it is scary, if they are poor models they will cease to be used. I suspect that it will be in the middle and highly sensitive to the quality of the original data and the quality of the implementation like all ml applications.\\n \\nreply\\n\\n\",\r\n \"\\nThis analogy immediately reminds me that with enough high res / low res pairs we can rebuild a high resolution image from its low res version with fairly good results. Wonder if the same could be done here.\\n \\nreply\\n\\n\",\r\n \"\\nIf I tell you I like furry porn, is there a way for me to make sure you forget that? This has lots of implications, many of them placing the \\\"blame\\\" on me for telling you this.\\n \\nreply\\n\\n\",\r\n \"\\nThat’s strangely specific....Hmmm\\n \\nreply\\n\\n\",\r\n \"\\nFinally, I was waiting for someone to talk about the model itself. It makes sense that SVD or something like it (PCA, co-occurrence, etc) would be used.But I also wonder what exactly you are going to do with the predictions. What exactly do you show to someone to make them more likely to go and vote if they are inclined to vote your way, or make them stay at home otherwise? Is there evidence that whatever you're showing actually works? Or do you try to change people's minds? What do you do?Knowing how the state of things -in this case, people's voting inclinations- is not the same as knowing what to do, ie a strategy.I don't know how effective it is, I'd like to learn more. But I smell the possibility that these CA type firms are simply selling snakeoil to desperate political activists.\\n \\nreply\\n\\n\",\r\n \"\\nOne example I can provide is of gun control topics.If you understand someone's mentality on the subject you can decide if they see:1) An ad with someone breaking into a home and the homeowner defending themselves with a firearm (sell insurance?)2) A grandfather and grandson on a hunting trip (hunting supplies?)3) Or maybe gun violence hotline with powerful images.The people seeing these ads are under the assumption that everyone else sees them, not that it's specifically targeted at their personality type. These affect if you think other people understand your issue or not.\\nThus affecting your motivation and attitude.If you see an ad that fits your mindset, you think you're on the majority side. This was powerful in classic media, it's just as powerful now.\\n \\nreply\\n\\n\",\r\n \"\\n> The people seeing these ads are under the assumption that everyone else sees them, not that it's specifically targeted at their personality type.How long will that be true? Do people make that assumption about search results?\\n \\nreply\\n\\n\",\r\n \"\\nOutside of the tech bubble, simply saying \\\"yes\\\" would be disingenuous. They're not even asking the question in the first place\\n \\nreply\\n\\n\",\r\n \"\\nI think retargeting has thoroughly blown up the idea that ads online are shown to everyone. My non-technical acquaintances are very aware of why certain products follow them around the internet in ads.\\n \\nreply\\n\\n\",\r\n \"\\nBut do people assume the same thing about, say, Google search results? Promoted posts on Reddit? The ads (or natural posts) on Snapchat or Instagram?I agree that it's pretty obvious you're being retargeted when ads for camping supplies start showing up three days after you search for them on Amazon. But the practice of \\\"personalization\\\" of results and ads is far larger and deeper, to a degree that most people never seem to think about.\\n \\nreply\\n\\n\",\r\n \"\\nI think most people do assume the same with search results. How many do you think assume that the ads on the TV they see could be different than what the neighbor is seeing when watching the same channel with the same cable company?\\nI think a lot of people assume that others see the same news and the way people act you'd think they assume that others see the same things in their newsfeed/timeline / facebook thing - and wonder how others could have a different view.Even when I explain how ads can be different, I don't think people really want to believe it, or understand it, and they certainly do not realize the power of these targeting abilities..\\n \\nreply\\n\\n\",\r\n \"\\nBefore Netflix stopped showing the number of stars next to content, I used to sort of depend on it for choosing a movie. In fact, I sort of miss it now, and spend more time sifting through content undecided. That's because I am clueless about movies. I believe that there are people who are as unsure about electoral candidates (as in, at a given day, they don't favor one candidate above others) as I am about choosing movies. When push comes to shove (my wife's irritation quotient above threshold) in terms of making a decision, an advertisement that someone saw couple of days back can definitely assist in making a choice at the split second.\\n \\nreply\\n\\n\",\r\n \"\\nIt is reasonable to assume that a marketing message written with the profile of the targeted person in mind works better than generic message.In Facebook campaigns you can use certain things, such as user's interest, to select who sees your message.I'm not an expert on Facebook analytics, but I believe you can get pretty good stats on how your campaigns are working, how much promoted posts get shared etc.This sounds like the holy grail for advertising. You get to write your message for certain profile and get quick feedback how it worked. Even if the system is not perfect, you would have an advantage compared to somebody else who is spending the same amount of money and not using similar targeting.Maybe their model also allowed them to find social influencers with many followers. Being able to targer these people and get them to share your message would be really good.The article compares this to the effectiveness of traditional voter targeting methods. I'm not sure what the parameters used on those are, but maybe all of them are not available on FB, justifying the need for something else.\\n \\nreply\\n\\n\",\r\n \"\\n> What exactly do you show to someone to make them more likely to go and vote if they are inclined to vote your way, or make them stay at home otherwise?Qualitatively: show things that get them angry.Quantitatively: test and control pop splits.\\n \\nreply\\n\\n\",\r\n \"\\nHow do you test anything? There's only one vote, you can't iterate.\\n \\nreply\\n\\n\",\r\n \"\\nMaybe with polling?\\n \\nreply\\n\\n\",\r\n \"\\nData is terrible, especially for polarizing candidates like Trump. People simply lie in public about not voting for him, afraid of backlash that they will receive.\\n \\nreply\\n\\n\",\r\n \"\\nThey were going to vote anyway; nothing can tell you otherwise?\\n \\nreply\\n\\n\",\r\n \"\\n> Quantitatively: test and control pop splits.How do you actually do this? Presidential elections come once every 4 years.\\n \\nreply\\n\\n\",\r\n \"\\nAnd there's a big question mark over whether lessons learned (ie parameters) from one election are valid for the next.What if all the sensitivities are dependent on the length of the candidates' hair? It seems the total hair length of the two candidates was a maximum at the last election. Another time you might be sampling more towards the middle.\\n \\nreply\\n\\n\",\r\n \"\\nShortly after the election, I read something saying that the actual ads were targeted soundbites at specific demographics likely to vote Democrat, run shortly before the election with the intention of suppressing voter turnout.\\n \\nreply\\n\\n\",\r\n \"\\nSo negative advertising aimed the core constituency of the opposition's voters speaking to their deep seated concerns about their candidate.I could imagine this working on Dem voters who are wavering on Hillary with leads like \\\"she thinks the TPP is the gold standard\\\" etc.\\n \\nreply\\n\\n\",\r\n \"\\n> I don't know how effective it is, I'd like to learn more. But I smell the possibility that these CA type firms are simply selling snakeoil to desperate political activists.According to the article:\\\"The accuracy he claims suggests it works about as well as established voter-targeting methods based on demographics like race, age, and gender....the digital modeling Cambridge Analytica used was hardly the virtual crystal ball a few have claimed.\\\"It's pretty clear that they were selling snakeoil. In fact, the use of CA wasn't particularly helpful to anyone [1]...hiring them was just a prerequisite for obtaining campaign contributions from the Mercer family, who had put up the money behind CA [2].[1] http://www.businessinsider.com/cambridge-analytica-facebook-...[2] https://twitter.com/kenvogel/status/975756418128187393\\n \\nreply\\n\\n\",\r\n \"\\n>I don't know how effective it is, I'd like to learn more.Here is an interesting Ted Talk which discusses an FB experiment that details how effective minor UI changes can be on voter turnout (13:40)https://www.ted.com/talks/zeynep_tufekci_we_re_building_a_dy...\\n \\nreply\\n\\n\",\r\n \"\\nDoor to door canvassers these days carry devices that tell you what topics to bring up and what topics not to bring up at a certain address, even distinguishing between individuals at an address; some are told to demand a husband let them talk to the wife, for example.\\n \\nreply\\n\\n\",\r\n \"\\nI don't know about the specific campaigns that you are referring to, but in my experience a lot of the information used in campaigns I've been involved in comes from previous canvassing sessions. Political parties in most countries are involved at many levels where there are elections. Canvassing doesn't just take place for the big elections.One year they will have been round and had a lengthy discussion with Mrs X, but Mr X slammed the door in their face another time. This was somewhat lower tech: the information was printed out and attached to a clipboard.Most of the time this information is correct. It's more interesting when it's really incorrect. That said, some of the best sessions I've been involved in were where there was no information.\\n \\nreply\\n\\n\",\r\n \"\\nExactly. This is the old-fashioned approach to campaign targetting that Cambridge Analytica was trying (and failing) to replace: just send a bunch of volunteers to talk to them about who they're voting for and why, then put that in your big database. One of the dirty not-so-secrets about CA is that according to the Trump campaign, they were abandoned completely in favour of that old-fashioned approach because they were worse. Similarly, if you've been paying attention, you might have noticed a few insider stories about how one of the Hillary Clinton campaign's big screw-ups was underestimating the importance of that data compared to modern big data tech and basically throwing a lot of it in the trash. This didn't get nearly as much coverage as the idea that Cambridge Analytica, Trump, and Facebook were conspiring to brainwash the population, probably because it was less juicy a narrative and kind of embarassing to the Clinton campaign and the DNC.\\n \\nreply\\n\\n\",\r\n \"\\nI am really puzzled by the Cambridge Analytica scandal. It's not particularly savory, but is there something happening here that it wasn't basically already known about how Facebook worked? By the protests of their own executive, the system was working as designed, and at worst Cambridge Analytica misled them about how they intended to use the data, right? There was no actual security breach here, as far as I can understand it.\\n \\nreply\\n\\n\",\r\n \"\\nThere doesn't have to be a security breach for it to be a very bad example of using data collected in one way for a completely different purpose. It violates the 'lawful basis for processing' part of privacy legislation.\\n \\nreply\\n\\n\",\r\n \"\\nWhich US legislation?\\n \\nreply\\n\\n\",\r\n \"\\nThat app collected data on many more than just US residents so more than just US legislation applies. This is one of those pesky little problems of doing stuff 'on the internet', especially when you start doing stuff that is purposefully or accidentally illegal.Besides that they apparently also used similar trickery in their consultancy for the Brexit side.https://www.theguardian.com/politics/2018/mar/26/pressure-gr...This is far from over.https://www.theguardian.com/commentisfree/2018/mar/23/plenty...\\n \\nreply\\n\\n\",\r\n \"\\nNo. I know at least one political consultancy that does a similar work in Spain, although I only know they work with data, use micro targeting and is run by a sociologist.\\n \\nreply\\n\\n\",\r\n \"\\nIt became a \\\"problem\\\" because it helped Trump win.\\n \\nreply\\n\\n\",\r\n \"\\nThis answer ignores some known details of the story. Cambridge Analytica didn't just buy targeted ads from Facebook. They used a sockpuppet to release a fake \\\"take a personality profile\\\" app, which then allowed them to gather tons of data against the Facebook terms of use.The CEO of Cambridge Analytica has also been recorded telling a (fake) potential client that they routinely blackmail people using prostitutes and who knows what else.So unless you can show that Clinton's campaign was doing the same things, your claim is a false equivalence.\\n \\nreply\\n\\n\",\r\n \"\\nThis is exactly it. At least it stops the news from droning on and on about Russia.I thought Clinton spent large amounts of money on data and the Democrats admitted the data was bad or at least that was their excuse. How much did CA pay for this data? I still find it crazy that Trump campaign spent 30% of what Hillary did and still won. The Russians used 100k$ worth of ads to sway the election. This stuff doesn't t add up.\\n \\nreply\\n\\n\",\r\n \"\\nYes, that's a good point. Russia, Cambridge Analytica... anything that allows people to feel like the Trump phenomenon is a nefarious foreign import rather than homegrown. I'm no fan of Trump, but I'm incredibly dismayed that all the Democrats have talked about since he was elected is \\\"Russian meddling.\\\"\\n \\nreply\\n\\n\",\r\n \"\\nDo you have any sources for your claim about how the Clinton campaign acquired FB data and how they used it? Was any of it acquired fraudulently and/or in violation of FB's ToS, like CA's data was?\\n \\nreply\\n\\n\",\r\n \"\\nDo you have any sources for your claim that the parent poster claimed the democrats purchased Facebook data?\\n \\nreply\\n\\n\",\r\n \"\\nSigh....the technical legality of obtaining the data is not the point of contention. Do you think that is what this is about, whether CA \\\"broke the law\\\"?\\n \\nreply\\n\\n\",\r\n \"\\nI said she spent money on data.https://www.washingtonpost.com/news/post-politics/wp/2016/11...https://www.cnn.com/2017/06/02/politics/hillary-clinton-dnc-...\\n \\nreply\\n\\n\",\r\n \"\\nIt doesn't have to add up, most people are too busy with their real lives to manually search and find reliable details (what we get from the media is not reliably unbiased or true), and then read and understand them, so they believe what they see and hear repeated over and over on the TV, radio, and newspaper: the American President is controlled by Vladimir Putin. Even most smart people don't seem to care about actual evidence.\\n \\nreply\\n\\n\",\r\n \"\\nWho said anything about a security breach? Most of the controversy has been about the company influencing elections using data scraped from people (and their friends) unaware of what the data was being used for.\\n \\nreply\\n\\n\",\r\n \"\\nThe degree to which it influenced the election is questionable. Despite all the headlines, I haven't yet seen any convincing analysis of the impact of facebook on the election (I'm not sure how one would even go about doing so). So far it seems like it's just a convenient vehicle for people that dislike the outcome of the election to express indignation.\\n \\nreply\\n\\n\",\r\n \"\\nDon't be naivehttp://www.bbc.com/news/world-43476762\\n \\nreply\\n\\n\",\r\n \"\\nCan you point to the part of that article containing evidence of to what degree they affected the election?\\n \\nreply\\n\\n\",\r\n \"\\nThe admission by the company executive.\\n \\nreply\\n\\n\",\r\n \"\\nThat's interesting. How would he know the degree to which he influenced the election? Believing any claims to somehow fact rather than plain old self-promotion seems rather naive, or am I missing something?\\n \\nreply\\n\\n\",\r\n \"\\nEven if we presume those sentiments are completely sincere and disinterested, I don't know why we should believe he is an authority on US elections whose claims can simply be accepted at face value.\\n \\nreply\\n\\n\",\r\n \"\\nI would tend to agree.It's often not hard to convince people of something they want to believe.\\n \\nreply\\n\\n\",\r\n \"\\nWhen talk about CA first emerged on HN before the election some posters found the original papers referred to. They were looking at pictures in the story and zoomed in to find the titles.I cannot find those posts for the life of me again. Not suggesting anything nefarious here, I just can't find them. Does anyone have a link to those early conversations or make copies of the papers?I made copies earlier but deleted them before I put them into my papers archive.\\n \\nreply\\n\\n\",\r\n \"\\nHere are some leads:https://news.ycombinator.com/item?id=14486365https://news.ycombinator.com/item?id=14393991https://news.ycombinator.com/item?id=14330547https://news.ycombinator.com/item?id=14284502https://news.ycombinator.com/item?id=13939814And the query: https://hn.algolia.com/?query=mercer&sort=byPopularity&prefi...\\n \\nreply\\n\\n\",\r\n \"\\nIsn't the accuracy of the predictions kind of orthogonal to the fact that they were basically lying in their attempts to change behavior?Using lies to convince someone to do something is going to be more effective than using truth, if that \\\"something\\\" is not in accordance with the truth.The comparison with netflix really breaks down there. You're not going to be able to convince me that I liked Crash, so recommendations based off of that aren't going to be very useful to me.But if you reinforce my false belief that Obama and Soros are gonna use the deep state to invoke Sharia Law on the 2nd amendment, then that might better convince me to vote for so and so.\\n \\nreply\\n\\n\",\r\n \"\\n> has revealed that his method worked much like the one Netflix uses to recommend movies.I'm not sure this is the model you want to emulate. The suggestions are terrible and continually getting worse.\\n \\nreply\\n\\n\",\r\n \"\\nIt sure seems that way. It's almost as if Netflix is giving in to pressure from content owners (which now includes themselves) to downplay or even weaken their suggestions.First, they got rid of those wonderful ranked lists that made us love Netflix in the first place, replacing them with the much more opaque cover art carousel view. Then they started mixing in lower-ranked items into the carousel. Finally, they switched from the five-star rating to the thumbs up and down buttons, which can't possibly give them as much information about your opinion.\\n \\nreply\\n\\n\",\r\n \"\\nCurrently I have zero trust in their rating system. Actually, zero is a bad number because if I see a high ranking I think I am going to dislike the suggestion. This is opposed to me previously trusting the system a lot.\\n \\nreply\\n\\n\",\r\n \"\\nI've heard rumours that the internal backlash at Netflix against 5 star ratings happened with Amy Schumer's most recent comedy special which had thousands of 1 star reviews on Netflix. Which was one of their most expensive comedy productions and whose release coincided not long before the switch over to vague thumbs up/down.Note, the comedy special was similarly panned across the press and social media as being repetitive of her previous work, extremely predictable punchlines, and when seemingly good later exposed to be highly derivative of other comedians work.But regardless of the reality/honesty of her rating it was apparently not good for Netflix's business when they let users destroy content they produced destroy it on their own web site via user generated content.So the suits (heavily swayes by their production studio and Hollywood) were able to convince the product team to hurt the UX for 90% in order to protect the popularity of the 10% of content they own.The truth may be good for consumers in almost all situations. But sadly the interests of executives dealing closely with high value B2B partners and investors tend to have a way to out-valuing the interests of the average user (not to mention far out valuing power users).So I guess we have to rely on 3rd party IMDb web extensions which inject into Netflix in order to get honest ratings.\\n \\nreply\\n\\n\",\r\n \"\\nBy \\\"heard rumours\\\" do you mean \\\"read about it on Breitbart?\\\".http://www.breitbart.com/big-hollywood/2017/03/18/netflix-sc...Notice that they're careful to say that they made the switch \\\"amid\\\" the special, not because of it. Also as far as I can tell, they have no actual data on the fact, and they're the only \\\"newspaper\\\" reporting it.\\n \\nreply\\n\\n\",\r\n \"\\nIt's a hard problem but Netflix's model represents the state of the art in machine learning for recommendations. Still scared of the singularity? :)\\n \\nreply\\n\\n\",\r\n \"\\nThere's a problem with all ML-based content recommendation, whether it's advertising, movies, whatever: the people who make the content have strong opinions about what should get shown to whom and when. If those people have negotiating leverage over your organization, your business team will compromise your recommendations to placate those people. That means that after you've build this beautiful model that minimizes whatever cost function you've chosen, there will be a bevy of business rules and content-specific score adjustments that will be overlaid. Over time, this shaggy, bad-assumption-laden system will dominate the user experience, and unless your management has a vested interest in maintaining the integrity of the recommendation system, your model will eventually be lost in a sea of human-generated noise. Not that I'm bitter about this or anything.\\n \\nreply\\n\\n\",\r\n \"\\nPersonally, I used to get substantially better results. It would suggest to me a movie I'd never seen before with a score above 90, I'd watch, and enjoy. Now I'm getting things like kids shows suggested to me. I double checked by history to make sure no one watched one on my account, but they keep popping up with high percentage. I also really like horror and get weird suggestions [1] for similar movies. After I watched Alien it suggested the Great British Baking show. I literally have 0 trust in the new ranking system. They push certain shows way too hard and TV shows I'm watching show up on page 3 of \\\"continue watching\\\".This is far from my previous stance of \\\"Huh, movie I've never seen before and in a different language? It's 95% so yeah, I'll give it a go.\\\"And before someone says anything, I do vote up and down quite frequently. But can't help but notice that my suggestions were better when I had the more nuanced star system.[1] https://i.imgur.com/MOt3XlL.png[1.1] How I'd rate these. Cars 3: not really interested. Liked the first though. Stitches: idk, doesn't look appealing. Teeth: Classic cult film but yeah... Big Mouth: I have ZERO interest in watching this, please stop suggesting. I have downvoted this! Waterboy: I like it, but far from 95%. I'll give it like an 80.\\n \\nreply\\n\\n\",\r\n \"\\nThe coverage of the Netflix ratings format switch has always been frustrating to me because what people are complaining about (you're not the only one; https://www.polygon.com/2017/4/7/15212718/netflix-rating-sys...) was predictable from the beginning.I do research in this area and it's fairly well established that when you go from something like five points to two points with ratings, you throw away tons of information. There's diminishing returns with numbers of points, but as you go lower you lose information.The \\\"ratings don't matter because what you want is implicit signals from peoples' actual behavior\\\" is also disingenous because the rating behavior is a behavior that's directly tied to the stimulus in question. Not saying that indirect behavioral correlates aren't useful, only that the rating is a very powerful, direct correlate that tends to be very specific. Going back to the topic of the thread, sure, all those Facebook likes are going to be useful in predicting how much you like a candidate, but you're sure as hell going to get a lot of information by just asking them \\\"on a scale of 1 to 5, how much do you approve of X?\\\"\\n \\nreply\\n\\n\",\r\n \"\\nSince switching from 1-5 start ratings to like or dislike it's gotten much worse.I honestly thought the 1-5 ratings were not useful since I either like or don't like movies, I am not interested in nuances. But, it's not working out as I expected.\\n \\nreply\\n\\n\",\r\n \"\\nReally? Because in terms of actual usefulness, I find youtube's suggestions to better...\\n \\nreply\\n\\n\",\r\n \"\\nSpotify is king IMO. Their 'discover weekly' playlist turns up gems every time.But that's because they dive into other user playlists that contain the same music you play. Pretty simple. I assume Youtube do somthing similar.If Netflix had playlists or a 'want to watch' feature I bet their recommendations would improve.\\n \\nreply\\n\\n\",\r\n \"\\nI tend to agree with you. The only thing I dislike about it is when I happen to open a link I'll randomly be sent or notice in an article that's something unlike what I normally enjoy or something I close right away and then for the next few days, or until I watch a bunch more of what I usually enjoy, the entire suggestion list is only things to do with that one random link.\\n \\nreply\\n\\n\",\r\n \"\\nYoutube does better than Netflix for me, but they still suggest stuff that I have already watched. Sometimes stuff I literally watched an hour ago. And I watched one Joe Rogan episode and it will not stop suggesting it to me but fails to notify me on things that I watch every episode on; like 3Blue1Brown, Robert Miles, or Rare Earth.\\n \\nreply\\n\\n\",\r\n \"\\nThere’s a bug confound with the catalog: Netflix’s streaming catalog is much smaller than their DVD selection was and it changes as licensing deals expire. No model can make up for that entirely.\\n \\nreply\\n\\n\",\r\n \"\\nI use the DVD one and I do find the suggestions are mostly pretty reasonable ones.\\n \\nreply\\n\\n\",\r\n \"\\nHow hard can it be? Any film that I didn't watch all the way to end ought to be a strong signal that I didn't enjoy it. Not that I want to watch other movies just like it.\\n \\nreply\\n\\n\",\r\n \"\\nOTOH, that you even chose to watch a movie is a sign that you are interested in the genre. Just because I abruptly stopped watching “Star Trek IX”, doesn’t mean that I have abandoned the sci-fi genre or even Star Trek\\n \\nreply\\n\\n\",\r\n \"\\nI have a friend who shares my account since I stay with him when I am in the UK. It isn't worth the effort to have two profiles.So, he likes horror movies and will watch any horror regardless of any signal that it's going to be poor. You can look at the viewing history and see he rarely goes beyond 5 minutes of watching any of them.I watch Netflix regularly throughout the year, he watches in phases that last a week or two and then nothing at all for months at a time (for reasons that should seem obvious by now).As a non-horror movie aficionado, I can say with certainty that only 2% of horror movies are ever worth watching and only 50% of these are any good. As a consequence, my personal viewing history includes almost no movies in this genre.My favoured genre is drama and I normally watch all the way through.You should be able to guess by now that I should rarely be recommended horror movies but, alas, Netflix thinks otherwise.Btw. I also rate movies I watch - my friend doesn't.\\n \\nreply\\n\\n\",\r\n \"\\nI’m confused — you honestly expect Netflix’s model to have figured out that your profile is actually 2 people based on what you believe to be regular and obvious cyclical patterns? I would have to imagine that this is a relative edge case for Netflix, and there is no obvious answer for what to do with someone who mostly likes drama but for some reason, goes on a periodic horror binge.I assume that Netflix’s model has the premise that profiles aren’t in fact, very easy to create. I have separate profiles for my parents, as well as a test profile to see what happens when a user only seems to like the “Human Centipede” trilogy.\\n \\nreply\\n\\n\",\r\n \"\\n2 people but only one consistent regular user. My recommendations are heavily biased towards the occasional user - who is also giving very strong negative feedback?It doesn't bother me since I know what I like. There are not that many good films that I'm not going to find them anyway.Someone, or a group of people, are being paid for nothing though. I don't know anyone who subscribes to Netflix because of their recommendation algorithm.\\n \\nreply\\n\\n\",\r\n \"\\n> as a test profile to see what happens when a user only seems to like the “Human Centipede” trilogy.And? You can't just leave us hanging on that.\\n \\nreply\\n\\n\",\r\n \"\\nAfter creating the account and immediately giving a thumbs up to the trilogy, I've only occasionally logged in and feigned \\\"interest\\\" by clicking on the movies as if I'm about to watch them, or that I enjoy re-reading the synopses. The recommendations are all normal and not noticeably feces-related. But maybe I haven't yet met the threshold for the model to consider me a particularly engaged (or real) user.\\n \\nreply\\n\\n\",\r\n \"\\nI feel like watching 20 minutes of a movie and then downvoting should be a strong indicator. This does not appear to be the case. In fact, it seems to be more likely to appear as the first suggestion when I do this.\\n \\nreply\\n\\n\",\r\n \"\\nYes, it's a hard problem, but no, Netflix recommendations are nowhere near state of the art. They were at the time of Netflix prize, but since then the field has advanced a lot, while Netflix only dumbed down their recommendation models.\\n \\nreply\\n\\n\",\r\n \"\\nAre you saying all of Netflix suggestions are getting worse? How are you measuring that?Or are you saying your netflix suggestions are getting worse?Mine are pretty good and have held steady for a while, at least in terms of my own preferences, though I also am probably using it a little less as I've got Prime and Hulu now as well, so there's probably less times I'm randomly searching through Netflix and finding nothing.\\n \\nreply\\n\\n\",\r\n \"\\nI can only tell you what my personal experience is. I laid it out more in another comment under rjurney's reply.I'll also add that I am more frequently searching for 15 minutes then switching to another service. I used to find a movie to watch in 5 minutes. I am a big movie person too, and will watch most things. But I am also more aware that if I watch one show that is just \\\"meh\\\", then I am going to be bombarded with shows of similar quality for the next few weeks.Also, there is a fairly obvious pattern that shows up from the movies in \\\"my list\\\". Those do not seem to be weighted more heavily.\\n \\nreply\\n\\n\",\r\n \"\\nVery interesting article but I wish it went one step further. Why does it matter that cambridge analytica knew a user's big five or that they were an old, uneducated republican? How was this (inferred) data used?I assume they wrote/created different ads for different sets of users... but how many segments did they have? Did their graphic designer build 500 different ads, or was text/images dynamically inserted based on these variables? How did they figure out which message would resonate with each segment? How did they test something like this, with so many potential variables? Was this knowledge used only on facebook, or across all digital channels? Was it implemented in non-digital channels as well?I'd kill to have access to their campaign set ups.\\n \\nreply\\n\\n\",\r\n \"\\nI wonder if you learned something since your blatant failure last week where you gave all that trust into facebooks hands, and praised advertisements, one day before they've blown up.\\n \\nreply\\n\\n\",\r\n \"\\nBesides the data, its interesting how the actual targeting was performed.Does facebook provide an option to show a particular given ad to a particular given user? Or is it possible to select a group of people with a given set of likes? How fine-grained is facebook's audience selection mechanism for ads?Or was the targeting performed by creating fake groups, befriending people?\\n \\nreply\\n\\n\",\r\n \"\\nAm I understanding this correctly? Facebook user data (likes/profile info) was scraped to produce low-dimension feature vectors for users (similar to word2vec). These feature vectors were then run through some ML model to predict...what exactly? Targetability for effective political ads?\\n \\nreply\\n\\n\",\r\n \"\\nThey used it to predict political affiliation of people that don't explicitly state a party preference.The two parties already have a list of registered party members (and they can see who on Facebook explicitly states their party preference), for those members the main goal is higher turnout (they are the training data). The other voters they're interested in are unregistered (e.g. independent) voters that are likely to be on their side ideologically.The core idea is very simple, they believe that if someone says they're independent, but their preferences/features (age, gender, location, likes, posts) are predicting moderate or high likelihood of $PARTY affiliation, then showing this person political ads may move them from 'maybe vote for $PARTY' category and get them in the 'definitely vote for $PARTY' category.If you have continuous access to new Facebook data as you're serving ads, you can verify your ads are working on an individual basis by checking the predicted 'score' for $PARTY affiliation predicted by your model before and after an ad (I want to stress that this can be done on an _individual basis_). The likely sequence of events is that they did AB testing on different kinds of ads and found that fake inflammatory ads were most effective at achieving this goal in a very measurable way ($PARTY score), the resulting media/political atmosphere is collateral damage (hopefully unintended).Source: I am a data scientist / machine learning scientist and this is how I would do it and how it seems to me others. I don't work on political data but I have worked on personalized recommendations which are similar.\\n \\nreply\\n\\n\",\r\n \"\\nDid the app really grant them continuous access to user info? I thought it was a one-off thing - they get your data at the time of use (and your friends') and that's it.Plus the approach you outlined would require the user to like/dislike things based on the add they saw, so CA can observe a change in the predicted affiliation (they didn't have access to posts as far as I know). I don't think it would have that effect (even if the add influences you, I doubt that it would make you go unlike Obama's page for example). Not to mention that by any likelihood you shouldn't be able to verify that a particular add was shown to a given individual.I suspect it was a simpler use case - they would group users into segments, and then craft different add strategies for each one (maybe based on other research or just expert opinion).\\n \\nreply\\n\\n\",\r\n \"\\n> I suspect it was a simpler use case - they would group users into segments, and then craft different add strategies for each one (maybe based on other research or just expert opinion).It is in this last process that it is individual-based. It is in this last process that AB tests are done individually as a function of the specific strategy applied to him/her\\n \\nreply\\n\\n\",\r\n \"\\nIt seems like the purpose was narrowly tailoring messages, which is something political campaigns are really keen to do now (Obama's campaign was kind of a trailblazer here, right?).\\n \\nreply\\n\\n\",\r\n \"\\n> Obama's campaign was kind of a trailblazer here, right?It's a pretty big gap between using and abusing social media and as far as I know Obama's campaign did not 'narrowly tailor messages'. They did target broad groups using generic messages and they did quite effectively use social media presence to build support.But they did not - as far as I know, so please correct me if I'm wrong - go so far as to single out individuals or really small groups with the express intent of flipping their votes or targeting them with disinformation in order to try to stop them from voting.And Cambridge Analytica seems to have been doing just that if the currently available information is to be believed.\\n \\nreply\\n\\n\",\r\n \"\\nhttps://devumi.com/2017/12/social-media-case-study-how-barac...> The former president also hired Facebook co-founder Chris Hughes to help in developing his social media strategy. Obama furthered the use of Facebook for his 2012 re-election bid, utilizing it to encourage young people to cast their votes. His team developed a Facebook app that looked into supporters’ friends list to find younger voters. The team then asked supporters to share online content with these voters. More than 600,000 supporters responded to the call, sending content to over 5 million contacts.> During his presidency, Obama continued to use Facebook to reach out to the public. In 2016, he became the first president to go live on the site, just before his final State of the Union Address.\\n \\nreply\\n\\n\",\r\n \"\\nYes, that pretty much confirms what I wrote above. Your point being?Please read the article and compare what we know about Cambridge Analytica vs what the Obama campaign did, it is comparing snipers with someone setting off fireworks.\\n \\nreply\\n\\n\",\r\n \"\\nLook, I don't think anyone can realistically doubt that Obama's campaign was the first to effectively slice-and-dice the electorate and use social media to target them. You're arguing against a much more expansive claim than I'm making.\\n \\nreply\\n\\n\",\r\n \"\\nYou used the word 'narrowly', and in the context of a post about Cambridge Analytica that word has a pretty specific meaning.\\n \\nreply\\n\\n\",\r\n \"\\nNo it was Hillary's campaign in 2008 that was big on \\\"microtargeting\\\".ie, moving beyond \\\"soccer moms\\\" or \\\"defense dads\\\" but \\\"soccer moms with one kid and expensive tastes\\\".\\n \\nreply\\n\\n\",\r\n \"\\n[ Deleted. Nothing I say on HN ever matters. Move along ]\\n \\nreply\\n\\n\",\r\n \"\\n\\\"What sort of lies\\\" is pretty hand-wavy when it comes to labeling training data for a model. Are you summarizing from a source? I'm interested in the technical details of what happened.\\n \\nreply\\n\\n\",\r\n \"\\nVery few \\\"undecided\\\" voters truly are; elections are won and lost by getting your supporters to go to the polls. So if you wanted to use scurrilous, fake news to help your candidate, you'd be better off sending stories that will get your supporters really fired up and eager to vote and get their friends to vote, not trying to persuade the practically nonexistent undecided demographic.\\n \\nreply\\n\\n\",\r\n \"\\nAre you able to specify the sources based on which you are supporting these claims, namely that elections are not not decided by \\\"undecided\\\" voters but rather by pushing your supporters to the election polls?\\n \\nreply\\n\\n\",\r\n \"\\nI thought this was common enough knowledge not to want citations, but I think you will find these satisfactory.http://www.stat.columbia.edu/~gelman/research/unpublished/sw...https://www.politico.com/magazine/story/2014/01/independent-...https://www.thenation.com/article/what-everyone-gets-wrong-a...The last piece has a short summary of the salient point:> In fact, according to an analysis of voting patterns conducted by Michigan State University political scientist Corwin Smidt, those who identify as independents today are more stable in their support for one or the other party than were “strong partisans” back in the 1970s. According to Dan Hopkins, a professor of government at the University of Pennsylvania, “independents who lean toward the Democrats are less likely to back GOP candidates than are weak Democrats.”> While most independents vote like partisans, on average they’re slightly more likely to just stay home in November. “Typically independents are less active and less engaged in politics than are strong partisans,” says Smidt.> [...]> The conventional wisdom holds that the parties need independents to win general elections, but the reality is that they’re increasingly devoting their resources to getting their own voters—including their “closet partisans”—out to the polls rather than trying to sway the dwindling number of genuine swing voters. “We’ve seen a huge increase in technology and the ability to turn out the vote,” says Smidt. “So in terms of a cost-benefit analysis, the parties and candidates see that it’s much easier to turn out people who agree with them than it is to change someone’s mind. And then there’s also the question of how many of us are even open to changing our minds.”\\n \\nreply\\n\\n\",\r\n \"\\nThis is the standard way of analysing this kind of data, and I'd be very surprised if the Obama campaign didn't use the same or very similar methods with the facebook data they obtained. The only difference is that Cambridge Analytica managed to obtain much more data over a wider demographic.\\n \\nreply\\n\\n\",\r\n \"\\narchived for future reference http://archive.is/dMIcN\\n \\nreply\\n\\n\",\r\n \"\\nI'm excited for GDPR.. The hard part is going to be getting the truth out of these companies about the actual extent of the data they hold on us\\n \\nreply\\n\\n\",\r\n \"\\nThe EU should do a unionwide ad campaign pointing out the fact that the much-maligned eurocrats have actually been working for years to fix this very difficult problem that's only now becoming apparent to the wider public. The timing couldn't be better as GDPR goes into effect just after the Facebook/Cambridge scandal.Unfortunately all EU institutions are terrible at marketing. If they did an ad campaign, it would probably be a TV commercial showing Jean-Claude Juncker giving a speech with subtitles in 15 languages.\\n \\nreply\\n\\n\",\r\n \"\\nThat \\\"hard part\\\" is going to be more than just hard; I think the word you're looking for there is \\\"impossible\\\". They don't have any way of knowing what data Google, Facebook, Amazon, or any other company has. As this article does a credible job of explaining, you have to understand a fair amount about statistics (PCA) and machine learning to even know if if you were looking at it, and they won't know where to look for it. They have no enforcement mechanism in mind, and they passed a law anyway, which effectively means \\\"you can't admit to having this\\\", which will mean that the more willing a company is to lie, the bigger an advantage they will have over their competitors.\\n \\nreply\\n\\n\",\r\n \"\\nSure. But at least when they are actually whistleblowed (by people who \\\"just\\\" work for the company) there is a law which can be used to call to a court the theople who are legally accountable for the company\\n \\nreply\\n\\n\"\r\n ]\r\n}"}],"_postman_id":"be4fd8a2-2cfb-46b5-8e50-c40936603ae5"},{"name":"/video","id":"c799fd04-209c-40cf-ab3f-377bcd6ebee2","request":{"method":"GET","header":[],"body":{"mode":"formdata","formdata":[]},"url":"https://www.summarizebot.com/api/video?apiKey=YourAPIKey&url=https://www.htmlgoodies.com/tutorials/web_graphics/article.php/3480061/How-To-Add-a-YouTube-Video-to-Your-Web-Site.htm","description":"

Extract video information from a given url.

\n","urlObject":{"protocol":"https","path":["api","video"],"host":["www","summarizebot","com"],"query":[{"description":{"content":"

To use the API you will need an API key. Please, register at http://www.summarizebot.com/summarization_business.html to get your personal API key.

\n","type":"text/plain"},"key":"apiKey","value":"YourAPIKey"},{"description":{"content":"

Article or web page url.

\n","type":"text/plain"},"key":"url","value":"https://www.htmlgoodies.com/tutorials/web_graphics/article.php/3480061/How-To-Add-a-YouTube-Video-to-Your-Web-Site.htm"}],"variable":[]}},"response":[{"id":"da162519-ef2e-405d-b3da-c8560feef344","name":"Video GET Result","originalRequest":{"method":"GET","header":[],"body":{"mode":"formdata","formdata":[]},"url":{"raw":"https://www.summarizebot.com/api/video?apiKey=YourAPIKey&url=https://www.htmlgoodies.com/tutorials/web_graphics/article.php/3480061/How-To-Add-a-YouTube-Video-to-Your-Web-Site.htm","protocol":"https","host":["www","summarizebot","com"],"path":["api","video"],"query":[{"key":"apiKey","value":"YourAPIKey","description":"To use the API you will need an API key. Please, register at http://www.summarizebot.com/summarization_business.html to get your personal API key."},{"key":"url","value":"https://www.htmlgoodies.com/tutorials/web_graphics/article.php/3480061/How-To-Add-a-YouTube-Video-to-Your-Web-Site.htm","description":"Article or web page url."}]},"description":"Extract video information from a given url."},"status":"OK","code":200,"_postman_previewlanguage":"json","header":[{"name":"connection","key":"connection","value":"Keep-Alive","description":"Options that are desired for the connection"},{"name":"content-length","key":"content-length","value":"124","description":"The length of the response body in octets (8-bit bytes)"},{"name":"content-type","key":"content-type","value":"application/json","description":"The mime type of this content"},{"name":"date","key":"date","value":"Thu, 29 Mar 2018 21:33:34 GMT","description":"The date and time that the message was sent"},{"name":"keep-alive","key":"keep-alive","value":"timeout=5, max=100","description":"Custom header"},{"name":"server","key":"server","value":"Apache/2.4.7 (Ubuntu)","description":"A name for the server"}],"cookie":[],"responseTime":null,"body":"{\"video\": [{\"source\": \"http://www.youtube.com/embed/hqiNL4Hn04A\", \"height\": \"390\", \"width\": \"480\", \"provider\": \"youtube\"}]}\n"}],"_postman_id":"c799fd04-209c-40cf-ab3f-377bcd6ebee2"},{"name":"/video","id":"0d716ba2-87dd-4904-8d99-5df049a9c3a8","request":{"method":"POST","header":[{"key":"Content-Type","value":"application/octet-stream","description":"

The HTTP header should be specified as 'application/octet-stream'.

\n"}],"body":{"mode":"file","file":{}},"url":"https://www.summarizebot.com/api/video?apiKey=YourAPIKey&filename=1.html","description":"

Extract video information from binary data. POST body should include file content in binary form. The HTTP header should be specified as 'application/octet-stream'.

\n","urlObject":{"protocol":"https","path":["api","video"],"host":["www","summarizebot","com"],"query":[{"description":{"content":"

To use the API you will need an API key. Please, register at http://www.summarizebot.com/summarization_business.html to get your personal API key.

\n","type":"text/plain"},"key":"apiKey","value":"YourAPIKey"},{"description":{"content":"

Name of the file, e.g. filename=1.html.

\n","type":"text/plain"},"key":"filename","value":"1.html"}],"variable":[]}},"response":[{"id":"e0106d9c-39db-4172-81b5-52aa45c6d79e","name":"Video POST result","originalRequest":{"method":"POST","header":[{"key":"Content-Type","value":"application/octet-stream","description":"The HTTP header should be specified as 'application/octet-stream'."}],"body":{"mode":"file","file":{}},"url":{"raw":"https://www.summarizebot.com/api/video?apiKey=YourAPIKey&filename=1.html","protocol":"https","host":["www","summarizebot","com"],"path":["api","video"],"query":[{"key":"apiKey","value":"YourAPIKey","description":"To use the API you will need an API key. Please, register at http://www.summarizebot.com/summarization_business.html to get your personal API key."},{"key":"filename","value":"1.html","description":"Name of the file, e.g. filename=1.html."}]},"description":"Extract video information from binary data. POST body should include file content in binary form. The HTTP header should be specified as 'application/octet-stream'."},"status":"OK","code":200,"_postman_previewlanguage":"json","header":[{"name":"connection","key":"connection","value":"Keep-Alive","description":"Options that are desired for the connection"},{"name":"content-length","key":"content-length","value":"19","description":"The length of the response body in octets (8-bit bytes)"},{"name":"content-type","key":"content-type","value":"application/json","description":"The mime type of this content"},{"name":"date","key":"date","value":"Thu, 29 Mar 2018 21:22:09 GMT","description":"The date and time that the message was sent"},{"name":"keep-alive","key":"keep-alive","value":"timeout=5, max=100","description":"Custom header"},{"name":"server","key":"server","value":"Apache/2.4.7 (Ubuntu)","description":"A name for the server"}],"cookie":[],"responseTime":null,"body":"{\n \"video\": [\n {\n \"source\": \"http://www.youtube.com/embed/hqiNL4Hn04A\",\n \"height\": \"390\",\n \"width\": \"480\",\n \"provider\": \"youtube\"\n }\n ]\n}"}],"_postman_id":"0d716ba2-87dd-4904-8d99-5df049a9c3a8"},{"name":"/faces","id":"43d39e68-550f-4d8c-8a32-f740cb3f0550","request":{"method":"GET","header":[],"body":{"mode":"formdata","formdata":[]},"url":"https://www.summarizebot.com/api/faces?apiKey=YourAPIKey&url=https://facefacts.scot/images/science/Q2_high_health_f.jpg","description":"

Detect faces from a given image url.

\n","urlObject":{"protocol":"https","path":["api","faces"],"host":["www","summarizebot","com"],"query":[{"description":{"content":"

To use the API you will need an API key. Please, register at http://www.summarizebot.com/summarization_business.html to get your personal API key.

\n","type":"text/plain"},"key":"apiKey","value":"YourAPIKey"},{"description":{"content":"

Image url.

\n","type":"text/plain"},"key":"url","value":"https://facefacts.scot/images/science/Q2_high_health_f.jpg"}],"variable":[]}},"response":[{"id":"f676926e-831c-435a-9a22-0da9ad3c54fa","name":"Faces GET result","originalRequest":{"method":"GET","header":[],"body":{"mode":"formdata","formdata":[]},"url":{"raw":"https://www.summarizebot.com/api/faces?apiKey=YourAPIKey&url=https://facefacts.scot/images/science/Q2_high_health_f.jpg","protocol":"https","host":["www","summarizebot","com"],"path":["api","faces"],"query":[{"key":"apiKey","value":"YourAPIKey","description":"To use the API you will need an API key. Please, register at http://www.summarizebot.com/summarization_business.html to get your personal API key."},{"key":"url","value":"https://facefacts.scot/images/science/Q2_high_health_f.jpg","description":"Image url."}]},"description":"Detect faces from a given image url."},"status":"OK","code":200,"_postman_previewlanguage":"json","header":[{"name":"connection","key":"connection","value":"Keep-Alive","description":"Options that are desired for the connection"},{"name":"content-length","key":"content-length","value":"70","description":"The length of the response body in octets (8-bit bytes)"},{"name":"content-type","key":"content-type","value":"application/json","description":"The mime type of this content"},{"name":"date","key":"date","value":"Sat, 31 Mar 2018 11:12:10 GMT","description":"The date and time that the message was sent"},{"name":"keep-alive","key":"keep-alive","value":"timeout=5, max=100","description":"Custom header"},{"name":"server","key":"server","value":"Apache/2.4.7 (Ubuntu)","description":"A name for the server"}],"cookie":[],"responseTime":null,"body":"{\"faces\": [{\"y\": \"147\", \"x\": \"61\", \"width\": \"276\", \"height\": \"276\"}]}\n"}],"_postman_id":"43d39e68-550f-4d8c-8a32-f740cb3f0550"},{"name":"/faces","id":"6eecac85-e4c2-4859-b01c-f7b99a8d4077","request":{"method":"POST","header":[{"key":"Content-Type","value":"application/octet-stream","description":"

The HTTP header should be specified as 'application/octet-stream'.

\n"}],"body":{"mode":"file","file":{}},"url":"https://www.summarizebot.com/api/faces?apiKey=YourAPIKey","description":"

Detect faces from image binary data. POST body should include file content in binary form. The HTTP header should be specified as 'application/octet-stream'.

\n","urlObject":{"protocol":"https","path":["api","faces"],"host":["www","summarizebot","com"],"query":[{"description":{"content":"

To use the API you will need an API key. Please, register at http://www.summarizebot.com/summarization_business.html to get your personal API key.

\n","type":"text/plain"},"key":"apiKey","value":"YourAPIKey"}],"variable":[]}},"response":[{"id":"1a1ae6cc-c621-4303-af3c-d3a3944cd13d","name":"Faces POST result","originalRequest":{"method":"POST","header":[{"key":"Content-Type","value":"application/octet-stream","description":"The HTTP header should be specified as 'application/octet-stream'."}],"body":{"mode":"file","file":{}},"url":{"raw":"https://www.summarizebot.com/api/faces?apiKey=YourAPIKey","protocol":"https","host":["www","summarizebot","com"],"path":["api","faces"],"query":[{"key":"apiKey","value":"YourAPIKey","description":"To use the API you will need an API key. Please, register at http://www.summarizebot.com/summarization_business.html to get your personal API key."}]},"description":"Detect faces from image binary data. POST body should include file content in binary form. The HTTP header should be specified as 'application/octet-stream'."},"status":"OK","code":200,"_postman_previewlanguage":"json","header":[{"name":"connection","key":"connection","value":"Keep-Alive","description":"Options that are desired for the connection"},{"name":"content-length","key":"content-length","value":"19","description":"The length of the response body in octets (8-bit bytes)"},{"name":"content-type","key":"content-type","value":"application/json","description":"The mime type of this content"},{"name":"date","key":"date","value":"Thu, 29 Mar 2018 21:22:09 GMT","description":"The date and time that the message was sent"},{"name":"keep-alive","key":"keep-alive","value":"timeout=5, max=100","description":"Custom header"},{"name":"server","key":"server","value":"Apache/2.4.7 (Ubuntu)","description":"A name for the server"}],"cookie":[],"responseTime":null,"body":"{\n \"faces\": [\n {\n \"y\": \"147\",\n \"x\": \"61\",\n \"width\": \"276\",\n \"height\": \"276\"\n }\n ]\n}"}],"_postman_id":"6eecac85-e4c2-4859-b01c-f7b99a8d4077"},{"name":"/images","id":"06cbd927-739a-41ba-82b0-47e3fb001fcf","request":{"method":"GET","header":[],"body":{"mode":"formdata","formdata":[]},"url":"https://www.summarizebot.com/api/images?apiKey=YourAPIKey&url=http://www.sharksider.com/images/facts-about-sharks-top1.jpg&tags=3","description":"

Image recognition from a given url.

\n","urlObject":{"protocol":"https","path":["api","images"],"host":["www","summarizebot","com"],"query":[{"description":{"content":"

To use the API you will need an API key. Please, register at http://www.summarizebot.com/summarization_business.html to get your personal API key.

\n","type":"text/plain"},"key":"apiKey","value":"YourAPIKey"},{"description":{"content":"

Image url.

\n","type":"text/plain"},"key":"url","value":"http://www.sharksider.com/images/facts-about-sharks-top1.jpg"},{"description":{"content":"

Maximum count of image tags to return.

\n","type":"text/plain"},"key":"tags","value":"3"}],"variable":[]}},"response":[{"id":"f765977d-201a-4894-92dc-2fef2ebb225b","name":"Images GET result","originalRequest":{"method":"GET","header":[],"body":{"mode":"formdata","formdata":[]},"url":{"raw":"https://www.summarizebot.com/api/images?apiKey=YourAPIKey&url=http://www.sharksider.com/images/facts-about-sharks-top1.jpg&tags=3","protocol":"https","host":["www","summarizebot","com"],"path":["api","images"],"query":[{"key":"apiKey","value":"YourAPIKey","description":"To use the API you will need an API key. Please, register at http://www.summarizebot.com/summarization_business.html to get your personal API key."},{"key":"url","value":"http://www.sharksider.com/images/facts-about-sharks-top1.jpg","description":"Image url."},{"key":"tags","value":"3","description":"Maximum count of image tags to return."}]},"description":"Image recognition from a given url."},"status":"OK","code":200,"_postman_previewlanguage":"json","header":[{"name":"connection","key":"connection","value":"Keep-Alive","description":"Options that are desired for the connection"},{"name":"content-length","key":"content-length","value":"274","description":"The length of the response body in octets (8-bit bytes)"},{"name":"content-type","key":"content-type","value":"application/json","description":"The mime type of this content"},{"name":"date","key":"date","value":"Sat, 31 Mar 2018 12:16:40 GMT","description":"The date and time that the message was sent"},{"name":"keep-alive","key":"keep-alive","value":"timeout=5, max=100","description":"Custom header"},{"name":"server","key":"server","value":"Apache/2.4.7 (Ubuntu)","description":"A name for the server"}],"cookie":[],"responseTime":null,"body":"{\"tags\": [{\"confidence\": \"0.521507\", \"name\": \"tiger shark, Galeocerdo cuvieri\"}, {\"confidence\": \"0.193058\", \"name\": \"great white shark, white shark, man-eater, man-eating shark, Carcharodon carcharias\"}, {\"confidence\": \"0.0935028\", \"name\": \"hammerhead, hammerhead shark\"}]}\n"}],"_postman_id":"06cbd927-739a-41ba-82b0-47e3fb001fcf"},{"name":"/images","id":"84ee9cb8-97f4-4fca-abae-ac7880222992","request":{"method":"POST","header":[{"key":"Content-Type","value":"application/octet-stream","description":"

The HTTP header should be specified as 'application/octet-stream'.

\n"}],"body":{"mode":"file","file":{}},"url":"https://www.summarizebot.com/api/images?apiKey=YourAPIKey&filename=1.jpg&tags=3","description":"

Image recognition from binary data. POST body should include file content in binary form. The HTTP header should be specified as 'application/octet-stream'.

\n","urlObject":{"protocol":"https","path":["api","images"],"host":["www","summarizebot","com"],"query":[{"description":{"content":"

To use the API you will need an API key. Please, register at http://www.summarizebot.com/summarization_business.html to get your personal API key.

\n","type":"text/plain"},"key":"apiKey","value":"YourAPIKey"},{"description":{"content":"

Name of the file, e.g. filename=1.jpg

\n","type":"text/plain"},"key":"filename","value":"1.jpg"},{"description":{"content":"

Maximum count of image tags to return.

\n","type":"text/plain"},"key":"tags","value":"3"}],"variable":[]}},"response":[{"id":"f32a3fc7-e20a-4df9-aad4-b7c32962b92b","name":"Images POST result","originalRequest":{"method":"POST","header":[{"key":"Content-Type","value":"application/octet-stream","description":"The HTTP header should be specified as 'application/octet-stream'."}],"body":{"mode":"file","file":{}},"url":{"raw":"https://www.summarizebot.com/api/images?apiKey=YourAPIKey&filename=1.jpg&tags=3","protocol":"https","host":["www","summarizebot","com"],"path":["api","images"],"query":[{"key":"apiKey","value":"YourAPIKey","description":"To use the API you will need an API key. Please, register at http://www.summarizebot.com/summarization_business.html to get your personal API key."},{"key":"filename","value":"1.jpg","description":"Name of the file, e.g. filename=1.jpg"},{"key":"tags","value":"3","description":"Maximum count of image tags to return."}]},"description":"Image recognition from binary data. POST body should include file content in binary form. The HTTP header should be specified as 'application/octet-stream'."},"status":"OK","code":200,"_postman_previewlanguage":"json","header":[{"name":"connection","key":"connection","value":"Keep-Alive","description":"Options that are desired for the connection"},{"name":"content-length","key":"content-length","value":"19","description":"The length of the response body in octets (8-bit bytes)"},{"name":"content-type","key":"content-type","value":"application/json","description":"The mime type of this content"},{"name":"date","key":"date","value":"Thu, 29 Mar 2018 21:22:09 GMT","description":"The date and time that the message was sent"},{"name":"keep-alive","key":"keep-alive","value":"timeout=5, max=100","description":"Custom header"},{"name":"server","key":"server","value":"Apache/2.4.7 (Ubuntu)","description":"A name for the server"}],"cookie":[],"responseTime":null,"body":"{\r\n \"tags\": [\r\n {\r\n \"confidence\": \"0.521507\",\r\n \"name\": \"tiger shark, Galeocerdo cuvieri\"\r\n },\r\n {\r\n \"confidence\": \"0.193058\",\r\n \"name\": \"great white shark, white shark, man-eater, man-eating shark, Carcharodon carcharias\"\r\n },\r\n {\r\n \"confidence\": \"0.0935028\",\r\n \"name\": \"hammerhead, hammerhead shark\"\r\n }\r\n ]\r\n}"}],"_postman_id":"84ee9cb8-97f4-4fca-abae-ac7880222992"}],"id":"adac15ce-a6a7-48e1-96b6-5d6227d28fa4","_postman_id":"adac15ce-a6a7-48e1-96b6-5d6227d28fa4","description":""}]}