BUSS6002 statistical knowledge

87 阅读5分钟

BUSS6002 - Individual Assignment Semester 1, 2024 Due Date • Due: before 23:59 1 on Wednesday 22 May 2024 (week 13). • A late penalty of 5% per day applies if you submit your assignment late without a successful special consideration or simple extension. Rubric Overview This assignment is worth 30% of the unit’s marks. The assessment is designed to test your technical ability and statistical knowledge in modelling a real-world dataset, as well as your communication skills in writing a concise and coherent report presenting your approach and results. Refer to the Rubric later in this document for speciffc details. Submission Instructions You must submit: • a written report (.PDF) with the following fflename format, replacing 1234134 with your own student ID: BUSS6002 Report SID1234134.pdf. • a Jupyter Notebook (.ipynb) ffle with the following fflename format, replacing 1234134 with your own student ID: BUSS6002 Notebook SID12341234.ipynb. You may submit multiple times before the due date. Your latest submission before the due date will be marked 2 . If you wish to re-submit after the due date please send an email to buss6002.admin@sydney.edu.au so that markers are notiffed of your new submission. 1You may submit up to 30 minutes late without penalty. 2The fflename on Canvas will change to include “-n” where n is the submission number. You can ignore this. 1Overview On September 19, 代 写BUSS6002 statistical knowledge2023 twitter user @purplepingers (Jordan van den Berg) launched shitrentals.org. The site allows tenants to submit testimonies about landlords, property managers and rental properties. The reviews are then publicly viewable and searchable with the address of the property and the name of the agency visible. You have been given access to the data from shitrentals.org 3 . As a data-scientist-in-training, your task is to create a publishable research report that investigates and reports on the factors that drive the perceived quality of a rental property. The effect of each factor must be captured in a Generalized Linear Model (GLM) of your choice. All analysis and model building must be performed using Python and collated into a single Jupyter Notebook, which is to be submitted at the same time as your report. Report Sections A template for the report is provided in the assignment pack. Your report must contain: • Abstract • Introduction • Methods • Results and Discussion • Conclusion and Limitations • Bibliography You may also include Appendices with additional details, ffgures and tables. Requirements • There is a limit of 2500 words for the report excluding tables, captions, bibliography and appendices. • Assume the reader of your report is a competent and trained data scientist or analyst. They are familiar with the content of BUSS6002. • All plots, computational tasks, and results must be completed using Python. • Do not include any Python code as part of your report. • All ffgures must be appropriately sized and have readable axis labels and legends (where applicable). Latex Using LaTeX is highly recommended, though not required for this assignment. If you do not have LaTeX installed locally we recommend that you use overleaf.com. All students can sign up for an Overleaf Pro+ account via resource portal. If you’re new to Overleaf and LaTeX, help is available via their free introductory course and tutorial video. 3The context for this assignment is real but the data is fake. 2Notebook The submitted .ipynb ffle must • contain all the code used in the development of your report, • be runnable on an Ed environment, and • must be free of any errors. Data Description The dataset contains 1000 property reviews collected between 1/1/2023 and 31/12/2023. To simplify analysis the properties have been restricted to: • Flats and Units • 1 and 2 bedroom properties • 3 suburbs close to the university (Camperdown, Redfern and Newtown). Refer to the data dictionary for descriptions of the variables. File Pack A link to download the BUSS6002 Assignment Pack.zip is provided on canvas. The pack contains: • report/ – BUSS6002 Report SID1234134.tex (LATEXtemplate) – IEEEtran.cls (LATEXstyle ffle) – references.bib (BibTeX ffle) • analysis/ – shitrentals.csv – shitrentals dictionary.csv – BUSS6002 Notebook SID12341234.ipynb (Jupyter Notebook Template) Hints The following resources may be useful: • www.statsmodels.org/stable/exam… regression.html • stats.oarc.ucla.edu/r/dae/ordin… 3Rubric Criteria FA PS CR DI HD Abstract and Introduction 10% The abstract is uninformative and does not give readers a clear understanding of the paper’s content. It is missing one or more of the following: clear summary of purpose, methods and results. The introduction does not expand on the abstract by providing a description of the context of the paper or motivation. The abstract is informative, giving readers some understanding of the paper’s content. It contains a mostly clear summary of the purpose, description of methods and results. The introduction expands on the abstract by providing a brief or vague description of the context of the paper and motivation. The abstract is informative, giving readers a clear understanding of the paper’s content. It contains a mostly clear summary of purpose, methods and results. The introduction expands on the abstract by providing a brief description of the context of the paper and motivation. The abstract is concise, informative, giving readers a clear understanding of the paper’s content. It contains a summary of topics, purpose, description of methods and results. The introduction expands on the abstract by providing a description of the context of the paper and motivation. The abstract is concise, informative, and engaging, giving readers a clear understanding of the paper’s content and significance. It contains a summary of topics, purpose, description of methods and results. The introduction expands on the abstract by providing a thorough description of the context of the paper and convincing motivation. Both of which are supported by evidence from literature.Methods 40% The description of the model is either absent or severely lacking, making it difficult to understand its implementation or rationale. Decision making lacks any meaningful support from evidence, with little to no reference to data-based exploration (EDA) or established best practices. External resources are either not cited or improperly integrated into the discussion. The presented model, if any, demonstrates a significant mismatch with the problem context: - the choice of the model is inappropriate or irrelevant to the problem context - variables are either poorly selected or not utilized at all - there is no effort to control for variables not of direct interest to the study - overfitting is not addressed A rudimentary description of the model is provided. Decision making attempts to be supported by evidence, but the justification is minimal and may rely more on intuition than on data-based exploration (EDA) or established best practices. External resources are cited sporadically, with limited integration into the discussion. The presented model: - is appropriate for the problem context but lacks explanation and justification - uses only a few variables to enhance predictive performance, with significant missed opportunities or variables not adequately leveraged - displays limited effort is made to control for variables not of direct interest to the study - gives minimal attention to overfitting, with little validation or discussion provided A description of the model is provided, though it may lack depth or thoroughness. Decision making is attempted to be supported by evidence, but the justification may be limited or not fully grounded in data-based exploration (EDA) or established best practices. External resources are cited, but the integration may be less seamless or comprehensive. The presented model: - appropriate for the problem context - generally plausible for the problem context, but there may be some gaps in explanation or justification. - uses some but variables to enhance predictive performance, but there may be missed opportunities or variables not fully leveraged. - attempts to control for variables not of direct interest to the study, though some could be more rigorously addressed - attempts to not overfit to the data, but lacks thoroughness A detailed description of the model is provided. Decision making is supported by evidence, through data-based exploration (EDA) or reference to course materials or external resources, although some areas may lack thorough justification. External resources are cited