Python for Data Science NPTEL Week 3 Assignment Answers

Are you looking for the Python for Data Science NPTEL Week 3 Assignment Answers 2024? You’ve come to the right place! This guide offers detailed solutions to the Week 3 assignment questions, helping you solidify your understanding of Python programming and its applications in data science.

Course Link: Click Here

Table of Contents

Python for Data Science Nptel Week 3 Assignment Answers

Python for Data Science Nptel Week 3 Assignment Answers (July-Dec 2024)

Q1. Which of the following is the correct approach to fill missing values in case of categorical variable? Mean median Mode None of the above

Answer: Mode

Q2. Of the following set of statements, which of them can be used to extract the column Type as a separate dataframe? df_cars[[‘Type’]] df_cars.iloc[[:, 1] df_cars.loc[:, [‘Type’]] None of the above

Answer : df_cars[[‘Type’]] , df_cars.loc[:, [‘Type’]]

For answers or latest updates join our telegram channel: Click here to join

These are Python for Data Science Nptel Week 3 Assignment Answers

Q3. The method df_cars.describe() will give description of which of the following column? Car name Brand Price (in lakhs) All of the above

Answer : Price (in lakhs)

Q4. Which pandas function is used to stack the dataframes vertically? pd.merge() pd.concat() join() None of the above

Answer : pd.concat()

Q5. Which of the following are libraries in Python? Pandas Matplotlib NumPy All of the above

Answer: All of the above

Q6. Which of the following variable have null values? ID Company Review Date Rating

Answer: Review Date

Q7.Which of the following countries have maximum locations of cocoa manufacturing companies? U.K. U.S.A. Canada France

Answer: U.S.A.

Q8. After checking the data summary, which feature requires a data conversion considering the data values held? Rating Review date Company Bean origin

Answer: Review date

Q9. What is the maximum rating of chocolates? 1.00 5.00 3.18 4.00

Answer: 5.00

Q10.What is the output of the following code?

[bool, int, float, float, str] [str, int, float, float, str] [bool, int, float, int, str]

[bool, int, int, float, str]

Answer: [bool, int, float, float, str]

Q11 .What does df.info() provide? Summary of the DataFrame, including the number of non-null entries. The first 5 rows of the DataFrame The data types of the columns The correlation matrix of the DataFrame

Answer: Summary of the DataFrame, including the number of non-null entries.

Q12.What will be the output of the following code?

[1, 2] [1, 3, 5] [1, 2, 3, 4, 5] [5, 4, 3, 2, 1]

Answer: [1, 3, 5]

Python for Data Science NPTEL All weeks: Click Here

More Nptel Courses: https://progiez.com/nptel-assignment-answers

Python for Data Science NPTEL Week 3 Assignment Answers (Jan-Apr 2024)

Course name: Python For Data Science

Course Link:  Click Here

Q1. Which of the following is the correct approach to fill missing values in case of categorical variable? Mean Median Mode None of the above

Answer: a, c

Answer: Price (in lakhs)

These are Python for Data Science NPTEL Week 3 Assignment Answers

Answer: pd.concat()

Q7. Which of the following countries have maximum locations of cocoa manufacturing companies? U.K. U.S.A. Canada France

Q10. What will be the output of the following code? [bool, int, float, float, str] [str, int, float, float, str] [bool, int, float, int, str] [bool, int, int, float, str]

More Weeks of Python for Data Science:  Click here

More Nptel Courses:  Click here

Python for Data Science NPTEL Week 3 Assignment Answers (Jan-Apr 2023 )

Course Name: Python for Data Science

Q1. Which of the following is the correct approach to fill missing values in case of categorical variable? a. Mean b. Median c. Mode d. None of the above

Answer: c. Mode

Assume a pandas dataframe df_cars which when printed is as shown below. Based on this information, answer questions 2 and 3.

Q2. Of the following set of statements, which of them can be used to extract the column Type as a separate dataframe? a. df cars[[’Type’]] b. df cars.iloc[[:, 1] c. df cars.loc[:, [’Type’]] d. None of the above

Q3. The method df_cars.describe() will give description of which of the following column? a. Car name b. Brand c. Price (in lakhs) d. All of the above

Answer: c. Price (in lakhs)

Q4. Which pandas function is used to stack the dataframes vertically? a. pd.merge() b. pd.concat() c. join() d. None of the above

Answer: b. pd.concat()

Q5. Which of the following are liabraries in Python? a. Pandas b. Matplotlib c. NumPy d. All of the above

Answer: d. All of the above

Read the comma-separated values file hotel bookings.csv as a dataframe data hotel and answer questions 6 – 8. Please refer to Hotel Bookings Data Description.pdf for data and variable description.

Q6. Choose the appropriate command(s) to filter those booking details whose reservation-status are a No-show?

Answer: b, d

Q7. From the same data, find how many bookings were not canceled in the year 2017? a. 9064 b. 6231 c. 9046 d. None of the above

Answer: a. 9064

Q9. From the total bookings that were made in 2017 and not canceled, which month had the highest number of repeated guests? a. July b. February c. January d. None of the above

Answer: a. July

Q9. What will be the output of the following code?

a. [bool, int, float, float, str] b. [str, int, float, float, str] c. [bool, int, float, int, str] d. [bool, int, int, float, str]

Answer: a. [bool, int, float, float, str]

Q10. Which command is used to generate the plot shown below?

a. plt.plot(x, linestyle = “-”) b. plt.plot(x, linestyle = “–”) c. plt.plot(x, linestyle = “-.”) d. plt.plot(x, linestyle = “:”)

Answer: a. plt.plot(x, linestyle = “-”)

More Weeks of Python for Data Science NPTEL:  Click here

More NPTEL courses:  https://progiez.com/nptel

Python for Data Science NPTEL Week 3 Assignment Answers (July-Dec2022 )

Course name: Python for Data Science

Link to Enroll:  Click Here

Q1. Choose the appropriate command(s) to filter those booking details whose  reservation_status  are a No-show? a. data_hotel_ns datahotel. loc[data_hotel.reservation_status=’No-Show’] b. data_hotel_ns = data_hotel[ data _hotel. reservation_status = “No-Show’] c. data hotel_ns = data_hotel. reservation_status.loc [data_hotel.isin([‘No-Show’])] d. data_hotel_ns = data_hotel.loc [data hotel. reservation_status. isin([ No-Show’])]

Answer:b, d

Q2. From the same data, find how many bookings were not cancelled in the year 2017?

a. 9064 b. 6231 c. 9046 d. None of the above

Q3. From the total bookings that were made in 2017 and not cancelled, which month had the highest number of repeated guests? a. July b. February c. January d. None of the above

Answer: c. January

Q4. Which of the following commands can be used to create a variable Flag, and set the values as Premium when the  rating  is equal to or greater than 3.25, and otherwise as Regular? a. dt_cocoa[°Flag’] = [“Premium” if x 3.25 else “Regular” for x in dt_cocoa[‘Rating’ ]] b. dt_cocoa[“Flag’] = [“Premium” if x 3.25 else “Regular” for x in dt_cocoa[ _` Rating ‘]] c. dt_cocoa[“Flag”] = np.where(dt_cocoa[ “Rating”] < 3.25, “Regular”, “Premium”) d. None of the above

Answer: b, c

Q5. Which instruction can be used to impute the missing values in the column Review Data from the dataframe  dt_cocoa  by grouping the records company–wise?

Q6. After checking the data summary, which feature requires a data conversion considering the data values held? a. Rating b. Review Date c. Company d. None of the above

Answer: b. Review Date

Q7 .  What is the maximum average rating for the cocoa companies based out of Guatemala? a. 43. b. 53. c. 42. d. None of the above

Answer: c. 42.

Q8. Which pandas function is used to stack the dataframes vertically? a. pd.merge() b. pd.concat() c. join() d. None of the above

Q9. Of the following set of statements, which of them can be used to extract the column Direction as a separate dataframe?

a. df_weather[[_`Direction ‘ ]] b. df_weather.iloc[:,0] c. df_weather.loc[:.[ ‘Direction ‘]] d. None of the above

Answer: a, b

Q10. Which one of these students’ average score across all subjects was the lowest? Which subject has the highest average score across students?

a. Harini, Maths b. Sathi, Maths c. Harini, Physics d. Rekha, Maths

Answer: b. Sathi, Maths

image

Navigation Menu

Search code, repositories, users, issues, pull requests..., provide feedback.

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly.

To see all available qualifiers, see our documentation .

nptel-assignments

Here are 65 public repositories matching this topic..., kishanrajput23 / nptel-the-joy-of-computing-using-python.

Study materials related to this course.

  • Updated Oct 27, 2023

souraavv / NPTEL-DAA-Programming-Assignment-Solutions

Programming assignments of NPTEL DAA course taken by Prof. Madhavan Mukund of Chennai Mathematical Institute.

  • Updated Dec 8, 2022

progiez / nptel-assignment-answers

NPTEL Assignment Answers and Solutions 2024 (July-Dec). Get Answers of Week 1 2 3 4 5 6 7 8 8 10 11 12 for all courses. This guide offers clear and accurate answers for your all assignments across various NPTEL courses

  • Updated Sep 11, 2024

kishanrajput23 / NPTEL-Programming-In-java

  • Updated Apr 14, 2022

omunite215 / NPTEL-Programming-in-Java-Ultimate-Guide

I am sharing my journey of studying a course on Programming in Java taught by Prof.Debasis Samanta Sir IIT Kharagpur

  • Updated Dec 4, 2023

kadeep47 / NPTEL-Getting-Started-With-Competitive-Programming

[Aug - Oct 2023] Solutions for NPTEL Course Getting started with competitive programming weekly assignment.

  • Updated Jul 24, 2024

biophilic16 / NPTEL-Answers

Nptel assignment answer for Java Programming.

  • Updated Apr 12, 2024

Md-Awaf / NPTEL-Course-Getting-started-with-Competitive-Programming

Solutions for NPTEL Course Getting started with competitive programming weekly assignment.

  • Updated Apr 20, 2023

rvutd / NPTEL-Joy-of-Computing-2020

Programming Assignment Solutions

  • Updated May 5, 2020

avinashyadav16 / The-Joy-of-Computing-Using-Pyhton

12 Weeks long NPTEL Elective MOOC Course's codes, assignments and solutions. If you want to contribute and keep it updated with the new content, then please fork and raise pull request.

  • Updated Oct 30, 2023
  • Jupyter Notebook

guru-shreyansh / NPTEL-Programming-in-Java

The sole intention behind this repository is to help the beginners in Java with the course contents.

  • Updated Aug 1, 2021

gxuxhxm / NPTEL-The-Joy-of-Computing-using-Python

NPTEL-The-Joy-of-Computing-using-Python with NOTES and Weekly quizes Answers

  • Updated Jun 25, 2024

roopeshsn / embedded-system-design-nptel

Embedded System Design Course Materials - NPTEL

  • Updated May 6, 2022

gunjanmimo / NPTEL-The-Joy-of-Computing-using-Python

  • Updated Jan 26, 2020

AdishiSood / The-Joy-of-Computing-using-Python

  • Updated Apr 28, 2021

NPTEL-Course / Programming-Data-Structures-And-Algorithms-Using-Python

Nptel Course Solutions : Programming, Data Structures And Algorithms Using Python

  • Updated Nov 30, 2020

iamrudhresh / NPTEL-JAVA-PROGRAMMING

Welcome to the NPTEL "Programming in Java" course repository! This repository hosts a comprehensive collection of programming assignments, quizzes, and test solutions for the NPTEL "Programming in Java" course offered in the years 2024, 2022, and 2020.

  • Updated Apr 18, 2024

ShishiraB / Programming-Data-Structures-And-Algorithms-Using-Python

This is a repository where i have tried to give explaination

  • Updated Mar 1, 2023

NPTEL-Course / Google-Cloud-Computing-Foundations

Nptel Course Solution : Google Cloud Computing Foundations

  • Updated Nov 19, 2020

code-reaper08 / NPTEL-Practice-Repo

Practice repo for NPTEL 📚 Programming, Data Structures and Algorithms.

  • Updated Aug 27, 2021

Improve this page

Add a description, image, and links to the nptel-assignments topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the nptel-assignments topic, visit your repo's landing page and select "manage topics."

NPTEL Python for Data Science Assignment 3 Answers 2023

NPTEL Python for Data Science Assignment 3 Answers 2023:-  All the Answers provided below to help the students as a reference, You must submit your assignment at your own knowledge.

NPTEL Python For Data Science Week 3 Assignment Answer 2023

1. Which of the following is the correct approach to fill missing values in case of categorical variable?

  • None of the above

2. Of the following set of statements, which of them can be used to extract the column Type as a separate dataframe?

  • df_cars[[‘Type’]]
  • df_cars.iloc[[:, 1]
  • df_cars.loc[:, [‘Type’]]

3. The method df_cars.describe() will give description of which of the following column?

  • Price (in lakhs)
  • All of the above

4. Which pandas function is used to stack the dataframes vertically?

  • pd.concat()

5. Which of the following are libraries in Python?

6. Which of the following variable have null values?

  • Review Date

7. Which of the following countries have maximum locations of cocoa manufacturing companies?

8. After checking the data summary, which feature requires a data conversion considering the data values held?

  • Review date
  • Bean origin

9. What is the maximum rating of chocolates?

NPTEL Python for Data Science Assignment 3 Answers 2023

  • [bool, int, float, float, str]
  • [str, int, float, float, str]
  • [bool, int, float, int, str]
  • [bool, int, int, float, str]

NPTEL Python for Data Science Assignment 3 Answers 2022 [July-Dec]

1. Choose the appropriate command(s) to filter those booking details whose  reservation_status  are a No-show? a. data_hotel_ns datahotel. loc[data_hotel.reservation_status=’No-Show’] b. data_hotel_ns = data_hotel[ data _hotel.reservation_status = “No-Show’] c. data hotel_ns = data_hotel.reservation_status.loc[data_hotel . isin([‘No-Show’])] d. data_hotel_ns = data_hotel.loc [data hotel.reservation_status.isin([ No-Show’])]

2. From the same data, find how many bookings were not canceled in the year 2017? a. 9064 b. 6231 c. 9046 d. None of the above

Answers will be Uploaded Shortly and it will be Notified on Telegram, So  JOIN NOW

NPTEL Python for Data Science Assignment 3 Answers 2023

3. From the total bookings that were made in 2017 and not canceled , which month had the highest number of repeated guests? a. July b. February c. January d. None of the above

4. Which of the following commands can be used to create a variable Flag, and set the values as Premium when the  rating  is equal to or greater than 3.25, and otherwise as Regular? a. dt_cocoa[°Flag’] = [“Premium” if x 3.25 else “Regular” for x in dt_cocoa[‘Rating’ ]] b. dt_cocoa[“Flag’] = [“Premium” if x 3.25 else “Regular” for x in dt_cocoa[ ‘ Rating ‘]] c. dt_cocoa[“Flag”] = np.where(dt_cocoa[ “Rating”] < 3.25, “Regular”, “Premium”) d. None of the above

5. Which instruction can be used to impute the missing values in the column Review Data from the dataframe  dt_cocoa  by grouping the records company–wise?

6. After checking the data summary, which feature requires a data conversion considering the data values held? a. Rating b. Review Date c. Company d. None of the above

👇 For Week 04 Assignment Answers 👇

7 . What is the maximum average rating for the cocoa companies based out of Guatemala? a. 4 b. 3.5 c. 3.42 d. None of the above

8. Which pandas function is used to stack the dataframes vertically? a. pd.merge() b. pd.concat() c. join() d. None of the above

9. Of the following set of statements, which of them can be used to extract the column Direction as a separate dataframe? a. df_weather[[ ‘ Direction ‘ ]] b. df_weather.iloc[:,0] c. df_weather.loc[: . [ ‘Direction ‘]] d. None of the above

10. Which one of these students’ average score across all subjects was the lowest? Which subject has the highest average score across students? a. Harini, Maths b. Sathi, Maths c. Harini, Physics d. Rekha, Maths

For More NPTEL Answers:-  CLICK HERE Join Our Telegram:-  CLICK HERE

About Python For Data Science

The course aims at equipping participants to be able to use python programming for solving data science problems.

CRITERIA TO GET A CERTIFICATE

Average assignment score = 25% of average of best 3 assignments out of the total 4 assignments given in the course. Exam score = 75% of the proctored certification exam score out of 100

Final score = Average assignment score + Exam score

YOU WILL BE ELIGIBLE FOR A CERTIFICATE ONLY IF AVERAGE ASSIGNMENT SCORE >=10/25 AND EXAM SCORE >= 30/75. If one of the 2 criteria is not met, you will not get the certificate even if the Final score >= 40/100.

NPTEL Python for Data Science Assignment 3 Answers [Jan 2022]

Q1. Data from the file “brand_data.csv“ has to be loaded into a pandas dataframe. A snippet of the data is shown below:

NPTEL Python for Data Science Assignment 3 Answers 2023

What is the right instruction to read the file into a dataframe df_brand with 4 separate columns?

Answer:- (B), (C) & (D)

👇 FOR NEXT WEEK ASSIGNMENT ANSWERS 👇

Q2. For the same file above “brand_data.csv“ , which parameter in pd.read_csv will help to load dataframe df_brand with the selected columns as shown below?

NPTEL Python for Data Science Assignment 3 Answers 2023

(A) index_col  (B) skiprows  (C) usecols  (D) None of the above

Answer:- (C) usecols 

Q3. Data from the file “weather.xlsx“ has to be loaded into a pandas dataframe df_weather which when printed is as shown below:

NPTEL Python for Data Science Assignment 3 Answers 2023

Of the following set of statements which of them can be used to move the column “Direction” into a separate dataframe

Answer:- (A), (B), (C) & (D)

Q4. Referring to the same dataframe df_weather in Question (3), which statement/statements will help to print the last row from the dataframe?

Answer:- (B) & (D)

Q5. In reference to the same dataframe df_weather, we add an additional column ‘Hot_day’ to determine whether the day is hot or not based on the values in the Temperature column. What will the print statement derive?

NPTEL Python for Data Science Assignment 3 Answers 2023

(A) True  (B) SyntaxError  (C) False  (D) None of the above

Answer:- (C) False 

Q6. What statement would give the number of columns in a dataframe df?

(A) len(df.columns)  (B) len(df)  (C) df.size  (D) All of the above

Answer:- (A) len(df.columns) 

Q7. A file “Students.csv” contains the attendance and total scores of three separate students. This data is loaded into a dataframe df_study and a pandas crosstab is applied on the same dataframe which results in the following output

NPTEL Python for Data Science Assignment 3 Answers 2023

Which student scored the maximum average score of all three subjects? Which subject has the best average score for all three students?

(A) Harini,Chemistry  (B) Rekha,Physics  (C) Harini,Physics  (D) Rekha,Maths

Answer:- (D) Rekha,Maths

Q8. The following histogram shows the number of books read in a year:

NPTEL Python for Data Science Assignment 3 Answers 2023

Find the mean and median in the above histogram.

(A) 7,8  (B) 8,9  (C) 8.5,7  (D) 8,8  (E) None of the above

Answer:- (D) 8,8 

Q9. For the following box plot, which among the given options are the median and the outlier?

NPTEL Python for Data Science Assignment 3 Answers 2023

(A) 15, 52  (B) 22, 52  (C) 13.5, 29  (D) 25, 50

Answer:- (B) 22, 52 

Q10. A dataframe df_logs has the following data.

NPTEL Python for Data Science Assignment 3 Answers 2023

All the NaN / Null values in the column C1 can be replaced by zero value by executing which of the following statements?

(A) df_logs[‘C1’].fillna(0,inplace = True)  (B) df_logs.fillna(0,inplace = True)  (C) df_logs.fillna(0,inplace = False)  (D) df_logs[‘C1’].fillna(df_logs[‘B1’],inplace = True)

Answer:- (A) df_logs[‘C1’].fillna(0,inplace = True) 

Disclaimer :- We do not claim 100% surety of solutions, these solutions are based on our sole expertise, and by using posting these answers we are simply looking to help students as a reference, so we urge do your assignment on your own.

For More NPTEL Answers:-  CLICK HERE

Join Our Telegram:-  CLICK HERE

NPTEL Python for Data Science Assignment 3 Answers 2022:-  All the Answers provided below to help the students as a reference, You must submit your assignment at your own knowledge.

Leave a Comment Cancel reply

You must be logged in to post a comment.

Please Enable JavaScript in your Browser to Visit this Site.

Python Land

Python for Data Science: A Learning Roadmap

Python is the language of choice for most of the data science community. This article is a road map to learning Python for Data Science. It’s suitable for starting data scientists and for those already there who want to learn more about using Python for data science.

We’ll fly by all the essential elements data scientists use while providing links to more thorough explanations. This way, you can skip the stuff you already know and dive right into what you don’t know. Along the way, I’ll guide you to the essential Python packages used by the data science community.

I recommend you bookmark this page to return to it easily. And last but not least: this page is a continuous work in progress. I’ll be adding content and links, and I’d love to get your feedback too. So if you find something you think belongs here along your journey, don’t hesitate to message me .

Table of Contents

  • 1 What is Data Science?
  • 2 Learn Python
  • 3 Learn the command-line
  • 4 A Data Science Working environment
  • 5 Reading data
  • 6 Crunching data
  • 7 Visualization
  • 8 Keep learning

What is Data Science?

Before we start, though, I’d like to describe what I see as data science more formally. While I assume you have a general idea of what data science is, it’s still a good idea to define it more specifically. It’ll also help us define a clear learning path.

As you may know, giving a single, all-encompassing definition of a data scientist is hard. If we ask ten people, I’m sure it will result in at least eleven definitions of data science. So here’s my take on it.

Working with data

To be a data scientist means knowing a lot about several areas. But first and foremost, you have to get comfortable with data. What kinds of data are there, how can it be stored, and how can it be retrieved? Is it real-time data or historical data? Can it be queried with SQL? Is it text, images, video, or a combination of these?

How you manage and process your data depends on a number of properties or qualities that allow us to describe it more accurately. These are also called the five V’s of data:

  • Volume : how much data is there?
  • Velocity : how quickly is the data flowing? What is its timeliness (e.g., is it real-time data?)
  • Variety : are there different types and data sources, or just one type?
  • Veracity : the data quality; is it complete, is it easy to parse, is it a steady stream?
  • Value : at the end of all your processing, what value does the data bring to the table? Think of useful insights for management.

Although you’ll hear about these five V’s more often in the world of data engineering and big data, I strongly believe that they apply to all of the areas of expertise and are a nice way of looking at data.

Programming / scripting

In order to read, process, and store data, you need to have basic programming skills. You don’t need to be a software engineer, and you probably don’t need to know about software design, but you do need a certain level of scripting skills .

There are fantastic libraries and tools out there for data scientists. For many data science jobs, all you need to do is combine the right tools and libraries. However, you need to know one or more programming languages to do so. Python has proven itself to be an ideal language for data science for several reasons:

  • It’s easy to learn
  • You can use it both interactively and in the form of scripts
  • There are (literally) tons of useful libraries out there

There’s a reason the data science community has embraced Python initially. During the past years, however, many new super-useful Python libraries came out specifically for data science.

Math and statistics

As if the above skills aren’t hard enough on their own, you also need a fairly good knowledge of math, statistics, and working scientifically.

Visualization

Eventually, you want to present your results to your team, manager, or world! For that, you’ll need to visualize your results. You need to know about creating basic graphs, pie charts, histograms, and potting data on a map.

Expert knowledge

Each working field has or requires:

  • specific terminology,
  • its own rules and regulations,
  • expert knowledge.

Generally, you’ll need to dive into what makes a field what it is. You can’t analyze data from a specific field of expertise without understanding the basic terminology and rules.

So what is a data scientist?

Coming back to our original question: what is data science? Or: what makes someone a data scientist? You need at least basic skills in all the subject areas named above. Every data scientist will have different levels of these skills. You can be strong in one, and weak in another. That’s OK.

For example, if you come from a math background, you’ll be great at the math part, but perhaps you’ll have a hard time wrestling with the data initially. On the other hand, some data scientists come from the AI/machine learning world and will tend toward that part of the job and less toward other parts. It doesn’t matter too much: ultimately, we all need to learn and fill in the gaps. The differences are what make this field exciting and full of learning opportunities!

Learn Python

The first stop when you want to use Python for Data Science: learning Python. If you’re completely new to Python, start learning the language itself first:

  • Start with my free Python tutorial or the premium Python for Beginners course
  • Check out our Python learning resources page for books and other useful websites

Learn the command-line

It helps a lot if you are comfortable on the command line. It’s one of those things you have to get started with and get used to. Once you do, you’ll find that you use it more and more since it is so much more efficient than using GUIs for everything. Using the command line will make you a much more versatile computer user, and you’ll quickly discover that some command-line tools can do what would otherwise be a big, ugly script and a full day of work.

The good news: it’s not as hard as you might think. We have a fairly extensive chapter on this site about using the Unix command line , the basic shell commands you need to know, creating shell scripts , and even Bash multiprocessing! I strongly recommend you check it out.

A Data Science Working environment

There are roughly two ways of using Python for Data Science:

  • Creating and running scripts
  • Using an interactive shell, like a REPL or a notebook

Jupyter Lab interactive notebook example

Interactive notebooks have become extremely popular within the data science community, but you should certainly not rule out the power of a simple Python script to do some grunt work. Both have their place.

Check out our detailed article about the advantage of Jupyter Notebook . You’ll learn about the advantages of using it for data science, how it works, and how to install it. There, you’ll also learn when a notebook is a right choice and when you’re better off writing a script.

Reading data

There are many ways to get the data you need to analyze. We’ll quickly go over the most common ways of getting data, and I’ll point you to some of the best libraries to get the job done.

Data from local files

Often, the data will be stored on a file system, so you need to be able to open and read files with Python . If the data is formatted in JSON, you need a Python JSON parser . Python can do this natively. If you need to read YAML data, there’s a Python YAML parser as well.

Data from an API

Data will often be offered to you through a REST API. In the world of Python, one of the most used and most user-friendly libraries to fetch data over HTTP is called Requests. With requests, fetching data from an API can be as simple as this:

This is the absolute basic use-case, but requests has you covered too when you need to POST data, when you need to login to an API, etcetera. There will be plenty of examples on the Requests website itself and on sites like StackOverflow.

Scraping data from the World Wide Web

Sometimes, data is not available through an easy-to-parse API but only from a website. If the data is only available from a website, you will need to retrieve it from the raw HTML and JavaScript. Doing this is called scraping, and it can be hard. But like with everything, the Python ecosystem has you covered!

Before you consider scraping data, you need to realize a few things, though:

  • A website’s structure can change without notice. There are no guarantees, so your scraper can break at any time.
  • Not all websites allow you to scrape them. Some websites will actively try to detect scrapers and block them.
  • Even if a website allows scraping (or doesn’t care), you are responsible for doing so in an orderly fashion. It’s not difficult to take down a site with a simple Python script just by making many requests in a short time span. Please realize that you might break the law by doing so. A less extreme outcome is that your IP address will be banned for life on that website (and possibly on other sites as well)
  • Most websites offer a robots.txt file. You should respect such a file.

Good scrapers will have options to limit the so-called crawl rate and will have the option to respect robots.txt files too. In theory, you can create your own scraper with, for example, the Requests library, but I strongly recommend against it. It’s a lot of work, and it’s easy to mess up and get banned.

Instead, you should look at Scrapy , which is a mature, easy-to-use library to build a high-quality web scraper.

Crunching data

One of the reasons why Python is so popular for Data Science are the following two libraries:

  • NumPy : “The fundamental package for scientific computing with Python.”
  • Pandas: “a fast, powerful, flexible, and easy-to-use open-source data analysis and manipulation tool.”

Let’s look at these two in a little more detail!

NumPy’s strength lies in working with arrays of data. These can be one-dimensional arrays, multi-dimensional arrays, and matrices. NumPy also offers a lot of mathematical operations that can be applied to these data structures.

NumPy’s core functionality is mostly implemented in C, making it very, very fast compared to regular Python code. Hence, Aas long as you use NumPy arrays and operations, your code can be as fast or faster than someone doing the same operations in a fast and compiled language. You can learn more in my introduction to NumPy .

Like NumPy, Pandas offers us ways to work with in-memory data efficiently. Both libraries have an overlap in functionality. An important distinction is that Pandas offers us something called DataFrames. DataFrames are comparable to how a spreadsheet works, and you might know data frames from other languages, like R.

Pandas is the right tool for you when working with tabular data, such as data stored in spreadsheets or databases. pandas will help you to explore, clean, and process your data. 

Every Python data scientist needs to visualize his or her results at some point, and there are many ways to visualize your work with Python. However, if I were allowed to recommend only one library, it would be a relatively new one: Streamlit.

Streamlit is so powerful that it deserves a separate article to demonstrate what it has to offer. But to summarize: Streamlit allows you to turn any script into a full-blown, interactive web application without the need to know HTML, CSS, and JavaScript. All that with just a few lines of code. It’s truly powerful; go read about Streamlit!

Streamlit uses many well-known packages internally. You can always opt to use those instead, but Streamlit makes using them a lot easier. Another cool feature of Streamlit is that most figures and tables allow you to easily export them to an image or CSV file as well.

Another more mature product is Dash. Like Streamlit, it allows you to create and host web apps to visualize data quickly. To get an idea of what Dash can do, head to their documentation .

Keep learning

You can read the book ‘Python for Data Science’ by Jake Vanderplas for free right here . The book is from 2016, so it’s a bit dated. For example, at the time, Streamlit didn’t exist. Also, the book explains IPython , which is at the core of what is now Jupyter Notebook . The functionality is mostly the same, so it’s still useful.

Get certified with our courses

Learn Python properly through small, easy-to-digest lessons, progress tracking, quizzes to test your knowledge, and practice sessions. Each course will earn you a downloadable course certificate.

The Python Course for Beginners

Related articles

  • Jupyter Notebook: How to Install and Use
  • Python CSV: Read And Write CSV Files
  • 4 Ways To Read a Text File With Python
  • Python Learning Resources

Leave a Comment Cancel reply

You must be logged in to post a comment.

Data Science III with python (Class notes)

Arvind Krishna

March 24, 2023

These are class notes for the course STAT303-3. This is not the course text-book. You are required to read the relevant sections of the book as mentioned on the course website.

The course notes are currently being written, and will continue to being developed as the course progresses (just like the class notes last quarter). Please report any typos / mistakes / inconsistencies / issues with the class notes / class presentations in your comments here . Thank you!

Python Data Science Handbook

Jake VanderPlas

Book Cover

This website contains the full text of the Python Data Science Handbook by Jake VanderPlas; the content is available on GitHub in the form of Jupyter notebooks.

The text is released under the CC-BY-NC-ND license , and code is released under the MIT license .

If you find this content useful, please consider supporting the work by buying the book !

Table of Contents ¶

Preface ¶, 1. ipython: beyond normal python ¶.

  • Help and Documentation in IPython
  • Keyboard Shortcuts in the IPython Shell
  • IPython Magic Commands
  • Input and Output History
  • IPython and Shell Commands
  • Errors and Debugging
  • Profiling and Timing Code
  • More IPython Resources

2. Introduction to NumPy ¶

  • Understanding Data Types in Python
  • The Basics of NumPy Arrays
  • Computation on NumPy Arrays: Universal Functions
  • Aggregations: Min, Max, and Everything In Between
  • Computation on Arrays: Broadcasting
  • Comparisons, Masks, and Boolean Logic
  • Fancy Indexing
  • Sorting Arrays
  • Structured Data: NumPy's Structured Arrays

3. Data Manipulation with Pandas ¶

  • Introducing Pandas Objects
  • Data Indexing and Selection
  • Operating on Data in Pandas
  • Handling Missing Data
  • Hierarchical Indexing
  • Combining Datasets: Concat and Append
  • Combining Datasets: Merge and Join
  • Aggregation and Grouping
  • Pivot Tables
  • Vectorized String Operations
  • Working with Time Series
  • High-Performance Pandas: eval() and query()
  • Further Resources

4. Visualization with Matplotlib ¶

  • Simple Line Plots
  • Simple Scatter Plots
  • Visualizing Errors
  • Density and Contour Plots
  • Histograms, Binnings, and Density
  • Customizing Plot Legends
  • Customizing Colorbars
  • Multiple Subplots
  • Text and Annotation
  • Customizing Ticks
  • Customizing Matplotlib: Configurations and Stylesheets
  • Three-Dimensional Plotting in Matplotlib
  • Geographic Data with Basemap
  • Visualization with Seaborn

5. Machine Learning ¶

  • What Is Machine Learning?
  • Introducing Scikit-Learn
  • Hyperparameters and Model Validation
  • Feature Engineering
  • In Depth: Naive Bayes Classification
  • In Depth: Linear Regression
  • In-Depth: Support Vector Machines
  • In-Depth: Decision Trees and Random Forests
  • In Depth: Principal Component Analysis
  • In-Depth: Manifold Learning
  • In Depth: k-Means Clustering
  • In Depth: Gaussian Mixture Models
  • In-Depth: Kernel Density Estimation
  • Application: A Face Detection Pipeline
  • Further Machine Learning Resources

Appendix: Figure Code ¶

  • Stack Overflow for Teams Where developers & technologists share private knowledge with coworkers
  • Advertising & Talent Reach devs & technologists worldwide about your product, service or employer brand
  • OverflowAI GenAI features for Teams
  • OverflowAPI Train & fine-tune LLMs
  • Labs The future of collective knowledge sharing
  • About the company Visit the blog

Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Get early access and see previews of new features.

problem with Assignment 3 , Introduction_to_Data_Science_in_Python coursera

I have some problem with Assignment 3 ( https://github.com/AparaV/intro-to-data-science-with-python/tree/master/assignment-03 ). I expect there are no 'Nan' in result. but there are 'Nan' value. this is my first time to study programming language. It would be great if someone could tell me what is wrong with my python code. here is the picture of wanted result and result using codes below

stability resonance's user avatar

I try apply the regex replace first then replace the country name like this.

And this line

Change to this

The only NaN left is Iran at 2015

Natthaphon Hongcharoen's user avatar

  • after i remove that line, i can see 'Nan' in my result. add links of pictures of wanted and unwanted results! thanks for your advice! –  stability resonance Commented Nov 3, 2019 at 14:10
  • No the problem is about that line, it's more about the excel file –  Natthaphon Hongcharoen Commented Nov 3, 2019 at 14:50
  • I don't have a PC right now so I can't check it but I suppose you have empty cell or NaN or ... in the excel –  Natthaphon Hongcharoen Commented Nov 3, 2019 at 14:52
  • omg!! thank you!! i understand that the main problem is 'regular expression' code! –  stability resonance Commented Nov 4, 2019 at 11:13
  • Also you misspell 'Iran, Islamic Rep.' –  Natthaphon Hongcharoen Commented Nov 4, 2019 at 11:36

Your Answer

Reminder: Answers generated by artificial intelligence tools are not allowed on Stack Overflow. Learn more

Sign up or log in

Post as a guest.

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy .

Not the answer you're looking for? Browse other questions tagged python or ask your own question .

  • The Overflow Blog
  • The world’s largest open-source business has plans for enhancing LLMs
  • Looking under the hood at the tech stack that powers multimodal AI
  • Featured on Meta
  • Join Stack Overflow’s CEO and me for the first Stack IRL Community Event in...
  • User activation: Learnings and opportunities
  • What does a new user need in a homepage experience on Stack Overflow?
  • Announcing the new Staging Ground Reviewer Stats Widget

Hot Network Questions

  • What does "either" refer to in "We don't have to run to phone booths anymore, either"?
  • How to win a teaching award?
  • Count squares in my pi approximation
  • If someone threatens force to prevent another person from leaving, are they holding them hostage?
  • Can noun phrase have only one word?
  • How can I calculate derivative of eigenstates numerically?
  • What is an apologetic to confront Schellenberg's non-resistant divine hiddenness argument?
  • Reparing a failed joint under tension
  • How to plausibly delay the creation of the telescope
  • Seeking an explanation for the change in order of widening operations from .NET Framework 4.8 to .NET 8
  • Annoying query "specify CRS for layer World Map" in latest QGIS versions
  • Stretched space in math mode
  • Determining Entropy in PHP
  • Has Macron's new government indicated whether it is "leaning" hard left or hard right?
  • How to fix: "Error dependency is not satisfiable"?
  • What's the strongest material known to humanity that we could use to make Powered Armor Plates?
  • Cutting a curve through a thick timber without waste
  • Counting the number of meetings
  • Bitcoin Core 28 Tests (testmempoolaccept rejected but submitpackage accepted)
  • Use the lower of two voltages when one is active
  • Movie from the fifties where aliens look human but wear sunglasses to hide that they have no irises (color) in their eyes - only whites!
  • Smallest prime q such that concatenation (p+q)"q is a prime
  • How does 「交換したていで」 mean "say you changed [the oil]"?
  • What was the newest chess piece

python for data science assignment 3 2023

swayam-logo

Python for Data Science

--> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> -->

Note: This exam date is subject to change based on seat availability. You can check final exam date on your hall ticket.

Page Visits

Course layout.

  • Reading files
  • Exploratory data analysis
  • Data preparation and preprocessing
  • Scatter plot
  • if-else family
  • for loop with if break
  • Predicting price of pre-owned cars
  • Classifying personal income

Books and references

Instructor bio.

python for data science assignment 3 2023

Prof. Ragunathan Rengasamy

Course certificate.

python for data science assignment 3 2023

DOWNLOAD APP

python for data science assignment 3 2023

SWAYAM SUPPORT

Please choose the SWAYAM National Coordinator for support. * :

IMAGES

  1. NPTEL || Python for Data Science || Assignment-3 Solution || Jan -April 2023

    python for data science assignment 3 2023

  2. NPTEL Python for Data Science Assignment 3 Answers 2023

    python for data science assignment 3 2023

  3. Python for Data Science Week 3 Assignment 3 Solution

    python for data science assignment 3 2023

  4. NPTEL Python For Data Science Assignment 3 Answers 2023

    python for data science assignment 3 2023

  5. NPTEL Python For Data Science Assignment 3 Answers 2023 » UNIQUE JANKARI

    python for data science assignment 3 2023

  6. Assignment Solution for Week 3: January 2023 (NPTEL

    python for data science assignment 3 2023

VIDEO

  1. NPTEL Python for Data Science Week 3 Quiz Assignment Solutions and Answers

  2. NPTEL Python for Data Science Week4 Quiz Assignment Solutions

  3. Python for Data Science

  4. NPTEL PYTHON FOR DATA SCIENCE ASSIGNMENT 2 ANSWERS

  5. NPTEL Python for Data Science Week 3 Quiz Assignment Solutions

  6. NPTEL Python for Data Science Week 1 Quiz Assignment Solutions

COMMENTS

  1. Python For Data Science

    Python For Data Science - - Announcements. NPTEL: Exam Registration is open now for Jan 2024 courses! Dear Candidate, Here is a golden opportunity for those who had previously enrolled in this course during the July 2023 semester, but could not participate in the exams or were absent/did not pass the exam for this course.

  2. Nptel Python for Data Science Assignment 3 Answers

    The course aims at equipping participants to be able to use python programming for solving data science problems. Join Channel Membership t...

  3. Python for Data Science

    Last date for data changes: Aug 18, 2023, ... Python for Data Science : Assignment 3 is live now!! Dear Learners, The lecture videos for Week 3 have been uploaded for the course "Python for Data Science". The lectures can be accessed using the following link: Link: https ...

  4. Introduction-to-Data-Science-in-python/Assignment+3 .ipynb at ...

    This repository contains Ipython notebooks of assignments and tutorials used in the course introduction to data science in python, part of Applied Data Science using Python Specialization from Univ...

  5. Python for Data Science Week 3: Assignment 3 Solutions || Jan 2023

    Python for Data Science Week 3: Assignment 3 Solutions || Jan 2023 #nptel #pythondatascience

  6. Python for Data Science NPTEL Week 3 Assignment Answers

    Q4. Which of the following commands can be used to create a variable Flag, and set the values as Premium when the rating is equal to or greater than 3.25, and otherwise as Regular? a. dt_cocoa[°Flag'] = ["Premium" if x 3.25 else "Regular" for x in dt_cocoa['Rating' ]]

  7. Python For Data Science

    24 Jul 2023: End Date : 18 Aug 2023: Enrollment Ends : 07 Aug 2023: Exam Registration Ends : 21 Aug 2023: ... by Douglas Montgomery 3. Mastering python for data science, Samir Madhavan. Instructor bio. Prof. Ragunathan Rengasamy IIT Madras. ... Average assignment score = 25% of average of best 3 assignments out of the total 4 assignments given ...

  8. Python for Data Science

    Module 1 • 3 hours to complete. In the first module of the Python for Data Science course, learners will be introduced to the fundamental concepts of Python programming. The module begins with the basics of Python, covering essential topics like introduction to Python.Next, the module delves into working with Jupyter notebooks, a popular ...

  9. tchagau/Introduction-to-Data-Science-in-Python

    This repository includes course assignments of Introduction to Data Science in Python on coursera by university of michigan - tchagau/Introduction-to-Data-Science-in-Python

  10. Python for Data Science || NPTEL Week 3 Assignment Solutions 2023

    Python For Data Science|| NPTEL Week 3 Assignment Solutions || @OPEducore Course: Python For Data ScienceOffered by: IIT MadrasDuration: 4 weeksStart Date ...

  11. nptel-assignments · GitHub Topics · GitHub

    Add this topic to your repo. To associate your repository with the nptel-assignments topic, visit your repo's landing page and select "manage topics." GitHub is where people build software. More than 100 million people use GitHub to discover, fork, and contribute to over 420 million projects.

  12. NPTEL Python for Data Science Assignment 3 Answers 2023

    Q1. Data from the file "brand_data.csv" has to be loaded into a pandas dataframe. A snippet of the data is shown below: What is the right instruction to read the file into a dataframe df_brand with 4 separate columns? Answer:- (B), (C) & (D) FOR NEXT WEEK ASSIGNMENT ANSWERS. Q2. For the same file above "brand_data.csv", which parameter ...

  13. Python for Data Science

    There will be a live interactive session where a Course team member will explain some sample problems, how they are solved - that will help you solve the weekly assignments. We invite you to join the session and get your doubts cleared and learn better. Date: August 7, 2022 - Sunday. Time: 04.00 PM - 05.00 PM.

  14. Python for Data Science: A Learning Roadmap • Python Land

    May 4, 2023. Python is the language of choice for most of the data science community. This article is a road map to learning Python for Data Science. It's suitable for starting data scientists and for those already there who want to learn more about using Python for data science. We'll fly by all the essential elements data scientists use ...

  15. Assignment 3 Solutions

    Biology Model QP I PUC 2023-24 PDF; Dsdv 5th lab program - gyghig; Dsdv 7th lab experiment; Programming, Data Structures And Algorithms Using Python - - Unit 10 - Week 4 Quiz ... NPTEL - PYTHON FOR DATA SCIENCE ASSIGNMENT - 3. Types of questions: MCQs - Multiple Choice Questions (a question has only one correct answer) MSQs - Multiple Select ...

  16. NPTEL Python for Data Science Week 3 Quiz Assignment Solutions

    🔊NPTEL Python for Data Science 2023 | https://techiestalk.in/⛳ABOUT THE COURSE :The course aims at equipping participants to be able to use python programmi...

  17. Data Science III with python (Class notes)

    17 Assignment 3 (Sections 21 & 22) 18 Assignment 4. 19 Assignment 5. ... E Datasets, assignment and project files. Table of contents. Preface; Data Science III with python (Class notes) STAT 303-3. Author. Arvind Krishna . Published. March 24, 2023. Preface. These are class notes for the course STAT303-3. This is not the course text-book.

  18. Python Data Science Handbook

    This website contains the full text of the Python Data Science Handbook by Jake VanderPlas; the content is available on GitHub in the form of Jupyter notebooks. The text is released under the CC-BY-NC-ND license, and code is released under the MIT license. If you find this content useful, please consider supporting the work by buying the book!

  19. Python For Data Science Week 3: Assignment 3 Answers || July

    Python For Data Science Week 3: Assignment 3 Answers || July - Dec 2023 || NPTEL1.Software Testing Week 2 : Assignment 2 Solutions || July - 2023 || NPTEL ...

  20. python

    I try apply the regex replace first then replace the country name like this. def energy(): energy = pd.read_excel('Energy Indicators.xls', sheet_name='Energy') energy ...

  21. Python for Data Science

    23 Jan 2023: End Date : 17 Feb 2023: Enrollment Ends : 06 Feb 2023: Exam Registration Ends : 20 Feb 2023: ... by Douglas Montgomery 3. Mastering python for data science, Samir Madhavan. Instructor bio. Prof. Ragunathan Rengasamy IIT Madras. ... Average assignment score = 25% of average of best 3 assignments out of the total 4 assignments given ...

  22. ASCVIT V1: Automatic Statistical Calculation, Visualization and

    During my studies, I attended a data science seminar and came into contact with the statistical programming language R for the first time. At the time, I was fascinated by the resulting potential uses. ... Python Data Science Handbook: Essential Tools for Working with Data. O'Reilly. URL [6] Fahrmeir, L., Künstler, R., Pigeot, I., & Tutz, G ...

  23. Python for Data Science|| WEEK-3 Quiz assignment Answers 2023 ...

    Python for Data Science|| WEEK-3 Quiz assignment Answers 2023||NPTEL||#SKumarEduThese are solutions regarding submission of NPTEL " PYTHON FOR DATA SCIENCE "...

  24. NPTEL Week 4 Assignment: Python for Data science July 2023

    Unlock the potential of Python for Data Science with our comprehensive guide to NPTEL's Week 4 Assignment in the course "Python for Data Science" for July 20...

Course Status : Completed
Course Type : Elective
Duration : 4 weeks
Category :
Credit Points : 1
Undergraduate
Start Date : 23 Jan 2023
End Date : 17 Feb 2023
Enrollment Ends : 06 Feb 2023
Exam Registration Ends : 20 Feb 2023
Exam Date : 26 Mar 2023 IST