Data Weblink: https://collegescorecard.ed.gov/data The data we will be using for this project is the college scorecard dataset, derived from the U.S. department of Education. Within this data set, are institution-level data files ranging from the years 1996 to 2020, containing the aggregated data from institutions across the country. Some information that can be found on each institution includes things such as enrollment, student aid, costs, and student outcome. With this dataset, we are offered a wide range of topics to choose from, and plan on evaluating our aggregate data through the creation of models for presentation. Being that the dataset provided is large, we will have to spend the majority of our time cleaning it to make our models as effective as possible. Some files that will be useful for cleaning the dataset include the Data Dictionary xlsx file, Technical Documentation for Institution-Level Data Files, and Technical Documentation for Data Files by Field of Study. Within the Data Dictionary excel file, are important variable names which will help us navigate through the dataset, and break it down into a more efficient set. The technical documentation pdfs will provide us with greater context for the data, which will also become useful in cleaning the dataset, and narrowing down what will be useful in our models. In the United States, college is expensive due to a number of reasons, some being the growing demand, rising financial aid, lower state funding, the exploding cost of administrators, and bloated student amenities packages. These increases in college tuition leave many students in debt, which depending on the job they accept might leave them paying off that loan for their entire life. Our real-world issue is looking at student debt because of colleges and the income they make after they graduate. In the College Scorecard dataset, we are able to see which students are graduating with debt based on the college they attend and the major they studied. We are also able to see the earnings a student makes after graduating college based on their college and major. Overall, students now owe twice as much as they did a decade ago to receive a college education. The data in the College Scorecard Files provides an array of information we will use to create a series of linear regression models. These linear regressions will show the relationship between college students’ cumulative debt at graduation and their earnings one year after graduation. We will also cross examine college rankings and students’ majors to determine which schools have the best turnover in terms of debt versus income one year after graduation. This will also show us which majors make the most profit one year out of college. After analyzing this data, we will be able to make recommendations on which schools would be best to attend and which majors students should consider if they want to make the most money right out of college. The evidence displayed above shows that the college scoreboard is an extremely valuable tool and resource when it comes to analyzing data in today’s day in age where college has a high opportunity cost. The data emphasizing just how important education is on not only earning a job right out of college but also the type of salaries that come along with it are remarkable. By analyzing student debt and student earnings, we can generate a strong, visual model and representation to display to our class.