Data Science News Flash: 08-15-2019

The latest data science articles - algorithmically curated, ranked, and summarized just for you.

News Flash is a weekly publication that features the top news stories for a specific topic. The stories are algorithmically curated, evaluated for quality, and ranked so that you can stay on top of the most important developments. Additionally, the most important sentences for each story are extracted and displayed as highlights so you can get a sense of what each story is about. If you want more information for a particular story, just click on it to read the entire article.

You can see the other topics we have News Flashes available for here and sign up to receive any that you're interested in.

Becoming A Machine Learning Engineer: Advice From Experts


  • In many roles earlier in my career I started as a software engineer and then as machine learning problems came up, I transitioned towards machine learning engineer.
  • For people like me, who don't have an engineering background, definitely learning Python, learning fundamentals of computer engineering like data structures, algorithms all that stuff is super important.
  • If they're trying to transition from already being a software engineer to being more of a machine learning engineer, definitely learning Python is huge getting familiar with some of the toolkits, like PyTorch.
  • We're seeing actually a huge demand for machine learning engineers and data scientists now, and there just arent enough people with degrees in those areas.
  • I hook up all the data pipelines and all the connections, that kind of data engineering, so that we're able to do machine learning engineering.

The future of data science and AI points to automatic tools


  • Emerging about a decade ago from roots in statistical modeling and data analysis, data scientists are employed to help companies adopt data-centric approaches to their organizations.
  • Since data scientists have a comprehensive understanding of data, they work well in moving organizations towards machine learning, deep learning and AI adoption because they generally have the same data-driven goals.
  • The first is that prospective data scientists must have backgrounds in advanced math and statistics, advanced analytics and perhaps machine learning and AI.
  • Because a company can't find or afford a team of data scientists doesn't mean it needs to abandon its data science goals or lose sight of advanced machine learning or AI opportunities.
  • Traditional data scientists will still be needed to run very complex analysis of data, but for the most part, basic analysis will move to citizen data scientist roles due to increasingly easy-to-use tools.

The most powerful idea in data science


  • You'll find the other kinds of patterns in your data too - thats the big challenge at the heart of data science.
  • Machine learning is an approach to making many similar decisions that involves algorithmically finding patterns in your data and using these to react correctly to brand new data.
  • In fact, getting a solution that handles old noise instead of new data is what the term overfitting means in machine learning.
  • Assuming that the pattern you (or your machine) pulled out of your data exists outside your imagination, which kind is it?
  • To win at data science, simply turn one dataset into (at least) two by splitting your data.

Beyond Clustering: The New Methods that are Pushing the Future of Unsupervised Learning


  • If you ask any group of data science students about the types of machine learning algorithms, they will answer without hesitation.
  • While supervised methods lead the current wave of innovation in areas such as deep learning, there is very little doubt that the future of artificial intelligence(AI) will transition towards more unsupervised forms of learning.
  • Looking into the future of AI through a pragmatic lens, we can accelerate the ability of AI agents to learn independently faster than we can build high quality data sets.
  • More specifically, there are three forms of unsupervised learning methods that are related to points 2, 3 and 4: transfer learning, generative adversarial methods, and autoregressive models.
  • As AI evolves, we will start hitting roadblocks with the availability of high quality datasets and would need to look for more organic forms of learning.

3 Ways AI Improves Manufacturing Intelligence


  • An IIoT software platform for storing and processing all machine data was put into place for SpectaSymbols multiple oil wells.
  • The data being analyzed with AI-driven machine learning has been the enabler for a business-focused, custom application expressly designed for assessing well performance, and condition monitoring through AI analytics.
  • The resulting data lake created from input via the manufacturing process was integrated with SRFs ERP to close the loop on the entire manufacturing value chain.
  • Real-Time machine data was used as a feedback loop to more accurately define the optimum settings of the machine to ensure product quality and machine reliability.
  • Whether you are looking to achieve machine uptime, to minimize costs, or to increase operational efficiencies, machine learning through cloud-hosted data can have an important role to play.

Probability Learning II: How Bayes Theorem is applied in Machine Learning


  • In Supervised Machine Learning, when we want to train a model the main building blocks are a set of data points that contain features (the attributes that define such data points),the labels of such data point (the numeric or categorical tag which we later want to predict on new data points), and a hypothesis function or model that links such features with their corresponding labels.
  • The goal of this training would be to reduce the mentioned loss function, so that the predictions that the model makes for the known data points, are close to the actual values of the labels of such data points.
  • P(X) is the density function common to all the data points, P(x|wi) is the density function of the data points belonging to class wi, and P(wi) is the prior distribution of class wi.
  • We have seen how Bayes theorem is used in Machine learning; both in regression and classification, to incorporate previous knowledge into our models and improve them.
  • You can take a look at my other posts on Data Science and Machine Learning here.

Geospatial Analytics: An $86,000,000,000 Opportunity


  • Since Charles Piquet and John Snow first applied the concept of spatial analysis to maps of cholera outbreaks in 19th century Europe, analysis of geographic location data has become a huge industry.
  • In this article, we will explore some of the key features of todays geospatial analytics market, and where to look for future innovation.
  • Turning satellite data into actionable forecasts on market fluctuations can give traders the edge, and as such, there are several geospatial analytics firms focusing initially on this deep-pocketed market segment.
  • This shift will enable further opportunities in applying machine learning and AI algorithms to these data sets, with new use cases and applications.
  • By cross-referencing satellite imagery and analytics with AR technologies, construction, surveying, asset maintenance and many other industrial activities will become much more effective and time efficient.

How to manage impostor syndrome in data science


  • I work at a data science mentorship startup where I probably spend 20% of my time helping data scientists overcome impostor syndrome and the self-doubt that comes with it.
  • I've seen it hold back more aspiring data scientists and machine learning engineers than I can count.
  • No company has ever hired a data scientist or ML engineer because of what they didn't know or couldn't do.
  • If you feel like an impostor because you don't know everything about data science, start by changing the way you think about what a data scientist is.
  • In the sense that you don't belong in data science, but only that you might not be job-ready quite yet.

Lets Talk About Data Pricing - Part I


  • Imagine you now have a dataset of medical data you want to make available for training machine learning models.
  • This medical data concerns a niche but very important problem, and as far as you know, nobody else has ever put a similar dataset on the market.
  • Existing players that operate under a fixed price policy are mostly data aggregators who buy data from a variety of sources, aggregate and transform it, before reselling it, as well as data providers with strong brands such as Bloomberg.
  • We envision a world where both priced data and data commons are readily available, getting data for your research or machine learning model only takes a few clicks and data markets handle hundreds of thousands of transactions every day.
  • In this series of articles, we seek to provide a framework for pricing data, recognizing that new economic mechanisms and very practical techniques are needed, with the aim of reducing the existing friction of data marketplaces and helping an efficient data market to emerge.

Produced and Sponsored by:

Innovative Data Science & Advanced Analytics Solutions

Provide Feedback | Unsubscribe