Data Science News Flash: 09-05-2019

The latest data science articles - algorithmically curated, ranked, and summarized just for you.


News Flash is a weekly publication that features the top news stories for a specific topic. The stories are algorithmically curated, evaluated for quality, and ranked so that you can stay on top of the most important developments. Additionally, the most important sentences for each story are extracted and displayed as highlights so you can get a sense of what each story is about. If you want more information for a particular story, just click on it to read the entire article.

You can see the other topics we have News Flashes available for here and sign up to receive any that you're interested in.



Data science vs. machine learning: understanding the difference and what it means today

Highlights:

  • The data science v machine learning confusion comes from the fact that both terms have a significant grip on the collective imagination of the tech and business world.
  • Although machine learning eventually wins out, ‘data science’ was becoming particularly important at a time when these twin trends were starting to grow.
  • The confusion around the relationship between machine learning and data science stems from the fact that the two trends go hand in hand – or at least they used to.
  • The word that deserves your attention is multi-disciplinary as this underlines what makes data science unique and why it stands outside of the more specific taxonomy of machine learning terms.
  • The machine learning revolution might have started in data science, but it has rapidly expanded far beyond that strict discipline.



College Grads Need These Data Science Skills

Highlights:

  • According to LinkedIn, the top three skills that new graduates are learning in the six months following graduation are data visualization, data modeling and Python.
  • According to LinkedIn, data modeling is the second most popular skill that recent grads are investing in learning.
  • Rebecca Merrett, lead instructor at Data Science Dojo, which offers data science bootcamps, agrees.
  • Of where you are on the career spectrum, there are plenty of places to learn data visualization, data modeling and Python.
  • Stephen Bailey, data scientist and analytics tool expert at Immuta, a data governance platform, has two pieces of advice for people who want to learn these skills.



How to learn data science: from data mining to machine learning

Highlights:

  • If you’re trying to learn data science and become a data scientist it can be easy to fall down a rabbit hole of machine learning or data processing.
  • Machine learning, data manipulation, data visualization – these are all ultimately technological methods for performing statistical analysis really well.
  • Called data manipulation or data munging, it’s really all about managing and cleaning data from different sources so it can be used for analytics projects.
  • Although Machine learning and artificial intelligence are huge trends in their own right, they are nevertheless closely aligned with data science.
  • It’s a data scientist’s job to use machine learning and artificial intelligence in a way that can drive business value.



How Can Machine Learning Enhance Supply Chain Management Analytics?

Highlights:

  • The supply chain analytics market is likely to witness growth due to increasing awareness about benefits of handling business data along with forecasting accuracy in supply chain analytics solutions.
  • According to a report published by Grand View Research, Inc., increasing use of machine learning in supply chain analytics is anticipated to witness a substantial growth over the years.
  • Machine learning enables forecasting with no need of data inspection as it is an application that mainly uses artificial intelligence (AI).
  • Applying machine learning algorithms and AI-based techniques to improve supply chains begins with the use of data sets that have the capability to handle complex processes.
  • The vast potential of AI to improve efficiencies and provide optimized solutions along with machine learning in supply chain boosts organization management capabilities.



Train sklearn 100x faster

Highlights:

  • As compute gets cheaper and time to market for machine learning solutions becomes more critical, we’ve explored options for speeding up model training.
  • Outside of the space of neural networks and deep learning, we find that much of our compute time for training models is not spent on training a single model on a single dataset.
  • This is Spark’s native machine learning library, supporting many of the same algorithms as scikit-learn for classification and regression problems.
  • While this sounds like it may be a sufficient solution for distributing scikit-learn style machine learning workloads, it distributes training in a way that doesn’t solve the kind of parallelism of interest to us.
  • For the random forest example, we want to broadcast the training data in full to each executor, fit an independent decision tree on each executor, and bring those fitted decision trees back to the driver to assemble a random forest.



Accelerating AI With GPU Virtualization In The Cloud

Highlights:

  • In July, VMware acquired Bitfusion, a company whose technology virtualizes compute accelerators with the goal of enabling modern workloads like artificial intelligence and data analytics to take full advantage of systems with GPUs or with FPGAs.
  • VMware’s push to leverage virtualization for accelerated AI and machine learning workloads was on display at the recent VMworld conference, where Nvidia announced it was bringing its virtual GPU capabilities to vSphere and VMware Cloud on Amazon Web Services (AWS) to accelerate AI and data science workloads.
  • The company unveiled its Virtual Compute Server (vComputeServer) software, which not only works with VMware Cloud on AWS but also vSphere, vCenter, vMotion and VMware Cloud.
  • In addition, having GPU acceleration will enable organizations to leverage Nvidia’s RAPIDS GPU acceleration libraries for data science workloads, including deep learning, machine learning and data analytics.
  • AI and machine learning will play key roles in addressing those challenges, which VMware Pat Gelsinger noted when talking about the partnerships with the GPU maker and AWS and the GPU acceleration services they are now bringing to the cloud.



Goodway Group Announces Key New Hires to Launch Data Science and Analytics Division

Highlights:

  • To spearhead the new division, the Goodway Group has hired Lluis Canet, vice president of data science and analytics, and Benjamin Diesbach, lead data insights analyst.
  • Philadelphia, PA — Goodway Group, the digital partner advertisers trust to deliver campaign performance and media efficiency, has created a new Data Science and Analytics division that will work across Goodway’s departments to leverage advanced analytics and machine learning for its clients.
  • Spearheading the new division are new hires Lluis Canet, vice president of data science and analytics, and Benjamin Diesbach, lead data insights analyst.
  • To Goodway, Canet has held senior data science and analytics-focused roles for 21st Century Fox, JP Morgan Chase, and PulsePoint.
  • Canet holds a bachelor and master of science degree in telecommunication engineering from the Technical University of Valencia in Spain, and he is based in Los Angeles, California.



Semi-supervised learning to improve translation of guest reviews

Highlights:

  • In these cases, creating paired samples at the scale required to fit complex models (like deep learning models), can be prohibitively expensive.
  • For some tasks, unpaired data samples are available for a fraction of the cost, or in much greater abundance, making semi-supervised learning attractive.
  • The classical supervised learning set-up in machine translation is where we train a sequence-to-sequence model using a training dataset.
  • We see that the GP model was not able to translate adequately (it wrongly translated to “small table”), while the fine-tuned model got it right.
  • The optimal way to create synthetic data-pairs from unpaired data makes a compromise between smoothness (producing probable pairs) and diversity (in our case.



Produced and Sponsored by:

Innovative Data Science & Advanced Analytics Solutions



Provide Feedback | Unsubscribe