Data Science News Flash: 08-01-2019

The latest data science articles - algorithmically curated, ranked, and summarized just for you.

News Flash is a weekly publication that features the top news stories for a specific topic. The stories are algorithmically curated, evaluated for quality, and ranked so that you can stay on top of the most important developments. Additionally, the most important sentences for each story are extracted and displayed as highlights so you can get a sense of what each story is about. If you want more information for a particular story, just click the heading or image to read the entire article.

You can see the other topics we have News Flashes available for here and sign up to receive any that you're interested in.

AI Job Roles: How to Become a Data Scientist, AI Developer, or Machine Learning Engineer


  • Owing to the large amount of data processed in AI roles, a deep understanding of mathematical and data wrangling processes is integral for the job.
  • Machine learning engineers can also consider a more generalized role in an environment where various concepts of AI can be used in conjunction with each other.
  • Knowledge in fields, such as natural language processing, computer vision, deep learning, and visualization, is required for the role of a machine learning engineer.
  • Data analysts are discrete from machine learning and AI-related roles, as they directly focus on deriving insights from data using machine learning as a tool.
  • Data scientists use machine learning as one of the tools in their vast arsenal, with the end goal of deriving useful insights from data and providing them to the company.

Models for integrating data science teams within organizations


  • A data scientist (DS) is an engineer with skills in data processing, analysis, and model building; and data science is their work.
  • In the center-of-excellence (CoE) model, also known as the research model, the expectation is that the data science team would work independently to identify big bets and build prototypes.
  • In the accounting model, also known as the BI model, the data science team would produce reports and presentations on a recurring basis (usually monthly and quarterly).
  • In this model, it is believed that easy and straightforward access to data by product managers, designers, engineering managers, and engineers would lessen or remove the need for a data science role.
  • Where a single product is under development, I recommend the PDS model as the best in efficiency and effectiveness in leveraging data for the business.

Artificial intelligence in Americas digital city


  • Its beyond the scope of this brief to describe machine learning in greater detail, but you can learn more through Brookingss Blueprint for the Future of AI.
  • While AI and machine learning are uniquely well-suited to help manage the challenges facing cities and metropolitan areas, AI is not a panacea.
  • There is a unique set of challenges related to the design and deployment of AI systems, many of which already appear in cities across the United States.
  • In New Orleans, the citys Office of Performance and Accountability used machine learning and public data to predict where fire-related deaths were most likely to occur, helping the fire department better target operations.
  • In New York City and Washington, both cities use a system called ShotSpotter and public data to better locate and assess gun fire.

International Conference on Machine Learning 2019


  • Every summer, machine learning researchers and practitioners from all over the world gather for the annual week-long International Conference on Machine Learning (ICML).
  • Professionals from academia and industry present and share their cutting-edge research in machine learning, artificial intelligence, statistics, and data science.
  • As machine learning algorithms become more and more complex (in terms of the number of parameters they need to make accurate predictions), the field of active learning focuses on guiding machine learning algorithms to train a model on-demand.
  • Another interesting talk revolved around the notion of Data Shapley Value, aimed at quantifying the value of each labeled training point for a machine learning classifier.
  • At the end of the day, the Data Shapley Value re-weighs training data for a machine learning algorithm, where a higher value of it points in the direction of a trusted data point, and a low value of it represents an observation for which a learning algorithm has low confidence in it.

Citizen Data Scientists' and Humanized Machine Learning


  • The answer lies with humanized machine learning platforms, which are making advanced ML capabilities accessible to business problem owners, enabling the rise of the citizen data scientist.
  • The challenge for business problem owners be they a C-level executive, analyst or even operations manager is effectively understanding their data to drive further business value and optimize processes.
  • Enter the citizen data scientists employees not operating in dedicated data science or analytics roles, who can use a humanized ML platform to explore their data and deploy models to unlock its value.
  • This is a milestone in empowering data owners to quickly master their own data and complete operations at scale, without significant investment or expertise.
  • A humanized ML platform provides citizen data scientists with greater accessibility to the capabilities required to quickly prepare and visualize data, and subsequently build, deploy and manage a suitable model.

Three pitfalls to avoid in machine learning


  • Researchers at TAE Technologies in California and at Google are using machine learning to optimize equipment that produces a high-energy plasma.
  • For the model to predict the effect of adding a couple of atoms to a molecule, each molecule in the test set should have a partner in the training set that is a couple of atoms different.
  • We were pleased when we arrived at a model that predicted well, for given settings, whether the plasmas energy would be high.
  • An eye examination at Aravind hospital in Madurai, India, where staff and Google researchers are trying to automate diagnoses of blindness caused by diabetes.
  • We are at an amazing point computational power, data and algorithms are coming together to produce great opportunities for discoveries with the assistance of machine learning.

AI-driven medical tools could worsen inequalities


  • Unlike computer programs that rigidly follow rules written by humans, both machine learning and deep learning algorithms can look at a data set, learn from it, and make new predictions.
  • In the 1990s, Caruana worked on a project that tried using an earlier form of machine learning to predict whether a patient with pneumonia was a low-risk or a high-risk case.
  • The AI predictions do best when applied to massive data sets, such as in China, which has an advantage in training AI systems thanks to access to large populations and patient data.
  • A lot of the AI discussion has been about how to democratize healthcare, and I want to see that happening, says Effy Vayena, a bioethicist at the Federal Institute of Technology in Switzerland.
  • In his 2019 book Deep Medicine, Eric Topol, director and founder of the Scripps Research Translational Institute, talks about creating essentially a supercharged medical Sirian AI assistant to take notes about the interactions between doctors and their patients, enter those notes in electronic health records, and remind physicians to ask about relevant parts of the patients history.

Lessons on How to Lie with Statistics


  • Another trick used to mislead consumers of data is to avoid listing relevant numbers that describe a dataset, such as the count of observations, the spread of the data (range), the uncertainty about the data (standard error), the quantiles of the data, and so on.
  • the mean and median of a distribution are the same only if it is normal and we live in a world with mostly non-normal data.
  • Data is often on scales with which we are unfamiliar, and we need a comparison to other numbers to know if a statistic represents a real change.
  • Any number represents a distillation of a set of data, which was taken on a sample of a population by mistake-prone humans, using imperfect tools, in constantly changing conditions at a single point in time.
  • More data is not a panacea, but more data with debate, multiple analyses, and scrutiny can lead to better real-world decisions and thats what we hope for as data literate citizens.

Using Machine Learning To Identify Smartphone Users By The Way They Walk


  • A biometric that can be revealed by leveraging a smart devices accelerometer data is that of how a user walks, which is also known as gait.
  • The smartphone is placed in the subjects pocket during data collection, and the subjects are asked to walk in a natural manner around their environment.
  • Due to the sensor data collection app running on top of an Android OS, data collection will not occur at a fixed sample rate.
  • Initially, a feature space, which is an n-dimensional vector space created by the n specified features of the data that is to be classified, is formed.
  • A confusion matrix is defined by a matrix C, where C_i,j is set to the number of data points that belong to class i but have been predicted to be in group j.

Produced and Sponsored by:

Innovative Data Science & Advanced Analytics Solutions

Provide Feedback | Unsubscribe